Creating Audiovisual Experiences in Unity

By Jonah Hoekstra

Table of contents

  1. Introduction
    • Motivation
  2. Getting spectrum data
  3. Basic audio visualization
    • Incorporating audio into the scene
  4. Frequency bands
  5. Adding buffers
  6. Advanced audio visualizations
    • Attraction system
    • Attractors
    • Atoms
    • Atom behavior
    • Atom scale
  7. Finished product
  8. Conclusion
  9. Addendum
  10. Resources

1. Introduction

Motivation

Recently, I had the opportunity to visit Amaze in Amsterdam. An immersive audio-visual experience that takes you on a journey through various (interactive) audiovisual art installations. To give you an idea of what the experience is like, the video on the right is from my visit and shows one of the installations (slight spoiler warning).
This experience has inspired me to pursue this topic. I’m interested in the way how audio can be used to shape visually stimulating experiences. What makes audio visualization satisfying to look at or interact with? What ways are there to translate soundscapes into visually striking imagery, that can be experienced through sound as well as sight?

2. Spectrum data

To get started with the audio visualization, we must first retrieve the spectrum data from our audio source. The code below is what we’ll use to make a stream of audio for the visualization. The magic happens inside the GetSpectrumData method, which uses Fast Fourier Transform (FFT) to convert audio into small samples of frequency data and amplitude. We will store this data in samples, represented by an array of float values. The second parameter is the channel which we will set to 0 for the time being. The final parameter is the Windowing function. This sets the algorithm that will be used to calculate the spectrum data. The algorithms available are Blackman, BlackmanHarris, Hamming, Hannis, Rectangular, and Triangle. Rectangular means no correction window at all, which is fast but dirty. Triangle is a triangular window, that cleans reasonably at a very low cost. Hamming, Hanning, Blackman, and BlackmanHarris are progressively slower and cleaner alternatives. The reason I chose BlackmanHarris is that it results in the cleanest FFT window. This could improve the overall feel and responsiveness of the visualizations.


    void getSpectrumAudiosource()
    {
        _source.GetSpectrumData(_samples, 0, FFTWindow.BlackmanHarris);
    }

3. Basic audio visualization

The next step in creating our audio visualization is to use the samples from our spectrum data to animate some objects in our scene. But before we get ahead of ourselves, it is important to understand how the audio spectrum works. Human hearing ranges from 20 to 20.000 hertz (20kHz) and can be divided into 7 categories as depicted by the image below (Longman, 2022). These categories represent the frequency ranges where different elements of audio (such as bass, vocals, and melodies) live. The music used for this project is sourced from uppbeat.io.

By running the song I would like to use for this project through a spectrum analyzer, I can estimate the number of samples the GetSpectrumData method will return. Which in my case is roughly 21.100 (this may vary depending on the song you use). This number of different samples is a bit much to work with practically. For this reason, we will divide these into 512 samples, which results in 21.100 / 512 = 41.2109 hertz per sample. We’ll store these values in our array. This way the start of the array contains information about the low end (bass) and the last index will be around 20kHz which contains the high end of the music. Notice how the distance between the frequencies gets larger as we progress through the spectrum. There’s 180hz between the start of the low end and the upper bass range, whilst the high end contains 15kHz. We’ll have to keep this in mind later when we get into frequency bands.

Incorporating audio into the scene

With that out of the way, we can start working on the visualization. The goal here is to create a circle of cubes that scale up and down to the music. Each cube will represent one sample, so we’ll get 512 cubes in total. This way we can visualize the samples from our spectrum data. Resulting in an animated audio spectrum similar to the .gif to the right, where the range of frequencies can clearly be seen.

After creating a simple cube prefab, I instantiate 512 cubes in a big circle all facing towards the center.


    void Start()
    {
        for (int i = 0; i < 512; i++)
        {
            GameObject _instanceCube = (GameObject)Instantiate(sampleCube);
            _instanceCube.transform.position = this.transform.position;
            _instanceCube.transform.parent = this.transform;
            _instanceCube.name = "SampleCube" + i;
            this.transform.eulerAngles = new Vector3(0, -0.703125f * i, 0);
            _instanceCube.transform.position = Vector3.forward * 100;
            sampleCubes[i] = _instanceCube;
        }
    }

Next in the update method, we loop through the array and check if the cube is not null. Then we set the local scale of each cube to a new vector where the y component is the sample value multiplied by a scalar. The scalar is necessary because some of the sample values can be very low and wouldn’t be visible otherwise.


    void Update()
    {

        for (int i = 0; i < 512; i++)
        {
          
            if (sampleCube != null)
            {
              sampleCubes[i].transform.localScale = new Vector3(1, (AudioVisualizer._samples[i] * _maxScale) + 1, 1);
            }
        }
    }

And hey presto! We got ourselves our first simple audio visualization.

However, you may notice that the bars don’t move very smoothly. To fix this issue we’ll implement audio buffers later on. But first, let’s look at frequency bands.

4. Frequency bands

As mentioned before in Chapter 3, the audio spectrum can be effectively broken down into 7 different frequency bands; starting with the sub-bass at 20 to 60hz up to the brilliance at 6 to 20 kHz. Each frequency range houses different sonic elements of audio. The sub-bass for instance provides the first usable low frequencies in most music. The deep bass produced in this range is usually felt more than it is heard. Have a listen yourself below.

Sine wave example at 50 Hz

For the purposes of our audio visualization, I want to create 8 frequency bands. Each sample contains about 41 hertz. The sub-bass ranges from 20 to 60hz, so by allocating 2 samples we get a total of 82hz for this frequency band. It will bleed into the bass range slightly, but this shouldn’t be too noticeable. The next band is the bass from 60 – 250hz. Here we’ll allocate 4 samples resulting in 164 hertz. This gives us a range of 83 to 247hz. The low-midrange (250 to 500hz) uses 8 samples giving us 247 to 575hz.

If we repeat this five more times doubling the number of samples each step, we have our frequency bands! The code below is responsible for making the frequency bands. The sample count is calculated by 2 raised to the value of the iterator multiplied by two. For instance:

  • 20 * 2 = 2
  • 21 * 2 = 4
  • 22 * 2 = 8 and so on.

The total sum of all samples actually only goes up to 510, for that reason we’ll add two more samples in the final iteration to fill out the array.

Inside the nested for loop, we calculate the average amplitude of all samples combined. We loop over the samples, add the value of each sample within the frequency band together and increment the count by one. We calculate the average by dividing by the count. Finally, we set our frequency band to the average. Since the average value will be slightly below zero we’ll multiply it by 10 to get a positive value. This process is repeated eight times.


    void makeFrequencyBands()
    {
        int count = 0;
        float average = 0;

        for (int i = 0; i < 8; i++)
        {
            int sampleCount = (int)Mathf.Pow(2, i) * 2;

            if (i == 7)
            {
                sampleCount += 2;
            }
            for (int j = 0; j < sampleCount; j++)
            {
                average += _samples[count];
                count++;
            }

            average /= count;
            _freqBand[i] = average * 10;
        }
    }

Let’s test it! First I added 8 cubes to the center of the scene. Next, I created a new script and added it to each cube. The _band variable corresponds to the frequency band the cube will respond to. This is set from 0 to 7 starting at the leftmost cube. This means that the sub-bass and bass are represented by the cubes on the left, the mids by the center cubes, and so on.


public class ParamCube : MonoBehaviour
{
    public int _band;
    public float _startScale, _scaleMultiplier;

    // Update is called once per frame
    void Update()
    {
    transform.localScale = new Vector3(transform.localScale.x, (AudioVisualizer._freqBand[_band] * _scaleMultiplier) + _startScale, transform.localScale.z);
    }
}

Let’s see what the result looks like!

As you can see, each bar represents its own frequency band and moves in sync with the music being played. Nice! However, the movement of the cubes is still pretty rough and abrupt sometimes. To address this we’ll next implement buffers to smooth it out and give a cool effect.

5. Adding buffers

The point of the buffer is to improve the way the audio is processed for visualization. To start I added two more arrays:


    public static float[] _bandBuffer = new float[8];
    float[] _bufferDecrease = new float[8];

The first array will hold the buffered values of each frequency band. The second one holds the value at which the buffer should be decreased over time. We loop over each unbuffered frequency band. If the new value is greater than the previous band buffer, the buffer will be set equal to that frequency band. This means that the cube’s height will go up almost immediately.

If the new value of the frequency band is lower than the previous buffer value, we decrease the band buffer. This means the cube’s height will decrease to the current frequency band slowing down the closer it gets. Resulting in the cubes moving less erratic and scaling more gently.


    void bandBuffer()
    {
        for (int g = 0; g < 8; g++)
        {
            if (_freqBand[g] > _bandBuffer[g])
            {
                _bandBuffer[g] = _freqBand[g];
            }

            if (_freqBand[g] < _bandBuffer[g])
            {   
                _bufferDecrease[g] = (_bandBuffer[g] - _freqBand[g]) / 8;
                _bandBuffer[g] -= _bufferDecrease[g];
            }
        }
    }

This took quite some trial and error to get right. Especially the buffer decrease would cause problems in the beginning. The first time I set the decrease to a fixed float value and increased this value by about 20 percent each frame. This had the opposite effect (starting slowly and going faster near the end). This would cause random ‘spikes’ where some cubes were stuck in a tall position. The code above fixes this issue by decreasing the buffer by a fraction (1/8) of the difference between the freq band and the buffer. This value gets increasingly smaller each frame, resulting in a slower descent.

The next thing to do is to make the cubes use the buffer instead of the unbuffered samples. This requires swapping out the old array for the band buffer in the cube’s update logic. The surrounding cubes also get their own buffer. For this, we can reuse the above code by changing the length of the for loop to 512 and listing to all samples instead of just the frequency bands by saying AudioVisualizer._samples[g].

To show the difference, I added a row of unbuffered green cubes right behind the blue row. The outside ring can also be seen in action with a buffer.

The result so far looks great. But I want to expand the scene a bit more. Thus far we have seen how audio visualization works, but how far can we push the envelope? The possibilities are pretty much endless. The goal of this project is to immerse the viewer in an engaging and visually stimulating 3D environment where the audio and visuals synchronize with each other. To achieve this I want to add more in-depth and complicated imagery to the environment.

6. Advanced audio visualizations

The scene at the moment looks nice, but it is still too empty for my taste. To make the environment more appealing to the viewer I want to use the eight frequency bands to animate small spheres/particles that move and change color based on their respective frequency band.

6.1 Attraction system

From my research into this topic, I found that tutorials on this particular subject can be hard to come by. Luckily I found a video tutorial series on YouTube that proved a great help in achieving what I had in mind. Along with other online sources and many trips to StackOverflow and the Unity answers forum, I ended up creating the effect that I was looking for (sources can be found below). To get started on the attraction system I created a new temporary scene. When the effect is done I will combine both scenes together.

The first thing that we’ll set up is the attraction system. I created two prefabs, one large sphere (the attractor) and a smaller sphere (atom) which will respond to the music and create the effect. The direction vector for each atom points to the position of the attractor. We add a force to the rigid body in the direction of the attractor with the specified strength of attraction. The higher the attraction, the stronger the force. If the magnitude exceeds to maximum magnitude, we cap the velocity at the value of the max magnitude. This prevents the atoms from moving too fast.

public class AttractTo : MonoBehaviour
{
    Rigidbody _rigidbody;
    public Transform _attractedTo;
    public float _strengthOfAttraction, _maxMagnitude;

    void Start()
    {
        _rigidbody = GetComponent<Rigidbody>();
    }

    void Update()
    {
        if (_attractedTo != null)
        {
            Vector3 direction = _attractedTo.position - transform.position;
            _rigidbody.AddForce(_strengthOfAttraction * direction);

            if (_rigidbody.velocity.magnitude > _maxMagnitude)
            {
                _rigidbody.velocity = _rigidbody.velocity.normalized * _maxMagnitude;
            }
        }
    }
}

Let’s see this code in action. I added one attractor and duplicated a bunch of atoms manually.

Looking cool already! But we are far from finished. I want to spawn 8 attractors in a circle around the center, one for each frequency band. And the atoms should “pulse” to the beat of the music. This means scaling up the atoms based on the value of the current frequency band. They should also change their appearance to add to the overall effect.

6.2 Attractors

I create 8 attractors in a circle around the center. The next step is to have each attractor spawn the atoms around it.

I would like the atoms to change color, increase scale and of course, move to the music. To give each set of atoms its own color I created a gradient that is evaluated at a different position by each attractor. This is then stored in the _sharedColor array that contains the different colors. The _sharedMaterial is the base material all atoms will receive. Later on, we will change the color of this material dynamically in sync with the music.

6.3 Atoms

Now, I know there is a lot to unpack here. But don’t worry I’ll guide you through the code. This code lives inside the previous for loop from above. So it will run for each of the eight attractors. All variables prefixed with an underscore are declared at the top of the file and can be set within the Unity editor. For the sake of brevity, I will omit most of these variables in the explanation. First, we instantiate an atom prefab, the _countAtom variable is used to keep track of which atom we’re talking to. All atoms are stored inside the _atomArray. The first GetComponent call sets the atom’s point of attraction to the current attractors transform. The starting position of the atom is set randomly between the negative and absolute value of _randomPosDistance. The atom is given a random scale and finally, the atom’s material is set to the base black material.

Let’s look at what we have created so far. As you can see, we have eight attractors surrounded by atoms! To create the audio visualization we need to update the scale and color of the atoms dynamically to the music.

6.4 Atom behavior

For this visualization, the atoms should change color with each ‘pulse’. They should only do so when a certain threshold is met. Otherwise, the slightest changes would already cause an emission. If the value from the frequency band is higher than the threshold we create a new color. This color takes the RGB values from the attractor’s position on the gradient. The r, b, and g components are multiplied by the band buffer at its respective index (0 through 7) and scaled by a multiplier. This multiplier is there to buff the value to a much more noticeable degree. If the threshold is not met the color of the atoms is set to black. This will cause the atoms to get brighter and more colorful the higher the amplitude in their frequency band.

 void AtomBehaviour()
    {
        for (int i = 0; i < _attractPoints.Length; i++)
        {
            if (_audioBandEmissionthreshold[_attractPoints[i]] >= _thresholdEmission)
            {
                Color _audioColor = new Color(_sharedColor[i].r * AudioVisualizer._bandBuffer[_attractPoints[i]] * _audioEmissionMultiplier,
                    _sharedColor[i].g * AudioVisualizer._bandBuffer[_attractPoints[i]] * _audioEmissionMultiplier,
                    _sharedColor[i].b * AudioVisualizer._bandBuffer[_attractPoints[i]] * _audioEmissionMultiplier,1);
                _sharedMaterial[i].SetColor("_EmissionColor", _audioColor);
            }
            else
            {
                Color _audioColor = Color.black;
                _sharedMaterial[i].SetColor("_EmissionColor", _audioColor);
            }
        }

    }

6.5 Atom scale

Now we will focus on the way the atoms scale to the audio. We loop through each attractor and its atoms. Next, we set the local scale of the atoms to a new vector3 by addressing the atom array. All atoms have a random scale between a min and maximum value. This information is stored in a separate array called _atomScaleSet. We increase the scale by adding the corresponding buffer multiplied by a scalar to the atomScaleSet. Then increment the count to move to the next atom and so on.

    void ScaleAtoms()
    {
        int count = 0;
        for (int i = 0; i < _attractPoints.Length; i++)
        {
            for (int j = 0; j < _amountOfAtomsPerPoint; j++)
            {
                _atomArray[count].transform.localScale = new Vector3
                (_atomScaleSet[count] + AudioVisualizer._bandBuffer[_attractPoints[i]] * _audioScaleMultiplier,
                _atomScaleSet[count] + AudioVisualizer._bandBuffer[_attractPoints[i]] * _audioScaleMultiplier,
                _atomScaleSet[count] + AudioVisualizer._bandBuffer[_attractPoints[i]] * _audioScaleMultiplier);
                count++;
            }
        }
    }

7. Finished product

It is finally time to put everything together! First I combined both scenes into one. I also added some simple UI to change around some of the values such as the scale multiplier and whether the frequency bands are buffered or not. Have a look ( and a listen ;)) yourself! In the video you see me changing some of the values; notice how the atoms react differently to the music!

8. Conclusion

Working on this project has been a very interesting experience. I hope this project can help those who are also interested in this subject get started creating their own audiovisual experiences. I must admit that the project had its ups and downs. Especially the precise goal of what it is I wanted to achieve wasn’t clearly defined at the start. Looking back, this project has been an exercise in creating audiovisual imagery. It doesn’t have a predefined use but could be used for a vast number of different applications. Such as background scenery for a rhythm game or VR experiences. The knowledge I gathered during the past weeks may very well contribute to future projects.

9. Addendum

The previous iteration of this project was unfortunately not yet at a sufficient level. For that reason, I will be updating the project this block (05/2/2024 – 12/04/2024) with aditional work to bring it to a sufficient level.

One of the problems with the previous iteration was that the audiovisual elements did not look or feel cohesive enough. I had spent a lot of time creating the audio visualizations without taking into account what it’s purpose or use case should be. Because of this, the end product didn’t have any noteworthy practical applications. In this addendum I will outline the steps I took to correct the issue.

9.1 Use Case

The goal of this iteration was to create a use case for the audio visualizations I had made previously. The visual elements make a fun display by themselves, but this alone isnt interesting enough. Ideally the visuals should be apart of something more complex. Such as a part of the core mechanic. The concept I came up with was to create a game centered around the audiovisual components, AudioEvade. The player must navigate an endless tunnel whilst evading obstacles that move based on the music. Creating a dynamic where the player must react to the music as he is hearing it.

9.2 Process

First of all, I needed a protagonist; I went with a small spaceship (IV Art, 2020) that I found on the Unity Asset Store for free. The prefab can be seen below. The main camera is attached to the spaceship so we can follow the ship wherever it goes. The ship also contains an audio source which will emit the music during gameplay.

The spaceship moves forwards through the tunnel by itself. The player can control the ship on the vertical and horizontal axes. The function below handles the movement and collision detection using Physics.CheckBox().

The concept calls for an environment that the player will traverse. Keeping in style with the Sci-Fi theme of the spaceship, I created a octagon-shaped tunnel. I made the sides transparent so the player could see the background to add to the overal theme of the game. The textures where procured from the Unity Asset Store (Yves Allaira, 2018) as well. The prefab consists of eight rectangles that where shaped to look like the individual panels of the tunel. The prefab also contains a list with 10 transforms located on the inside of the walls. These will act as places for the obstacles to spawn later on.

To create the actual tunnel, I instantiate 15 tunnel prefabs as individial ‘segments’. To safe on performance, I opted to move the segments to the back of the tunnel as the player passed through, instead of deleting and instantiating a new segment each time. The first code block below shows how I instantiate the tunnel.

The code below moves each segment to the back of the tunnel when the player has passed by. I do this by setting the new z position equal to the position of the last segment in the list (plus the length of the tunnel). Then I remove the segment from the list and insert it back in at the end. This cycle repeats and creates the illusion of an infinite tunnel.

This results in a 15 segment long tunnel that keeps expanding as the player flies through. Each segment is teleported outside of the players view. Notice I also added a custom skybox (Dogmatic, 2020) depicting an alien planet!

As of now, the player can traverse an endless tunnel with music playing in the background. Now it’s time to add the obstacles. For the obstacles, I will repurpose the parametric cubes used in the audio visualization scene from last semester (see Chapter 5: adding buffers). The goal is to have several obstacles spawn at random positions inside each segment. The list of transforms I mentioned earlier will be used to facilitate this. Each obstacle reacts to its frequency band (0 through 8). The chosen band will determine the obstacle behavior, namely, which element(s) (bass, vocals, synths etc. ) of the song will cause the obstacles to expand. The code below shows how I instantiate two obstacles at a random position within each segment.

The level now looks as follows, two cubes per segment. But as you can probably tell, the cubes all react to the same frequency band which makes the level quite repetitive and predictable. There is also no variation in the position of the cubes after they have spawned; they remain in the same spot.

To fix the issues mentioned above, I give one of the cubes a different frequency band and new random position each time a segment moves to the back of the tunnel.

With the above code implemented, the game is finished! The last feature I added was a score and penalty for hitting one of the obstacles (top left).

9.3 Reflection

Looking back on this past block, I must say I experienced ups and downs regarding the project and in my personal life. The first three guild meetings didn’t go according to plan because I struggled to find a proper solution direction. I experimented with FMOD and made a simple beat sequencer as well. Both directions proved to be way less work than I had anticipated. Which made it impractical to pursue any further. These solution directions also did not incorporate any of the previous work I had done, which was the main piece of feedback I had received last time. The work presented here came to me after brainstorming about ways to incorporate the audiovisual components as part of a mechanic. This concept was more concrete and applicable than my previous attempts. This really helped me get started because I could define the scope and work towards a prototype. This iteration has, in my opinion, successfully incorporated my previous work to create a use case that has merit, something the previous project lacked considerably. With that said, this project could be improved upon in next iterations. There is much room left to further explore the concept of an audio-based obstacle game. Such as different types of obstacles and more complex types of movement for the obstacles.

10. Resources

  1. Product Documentation – NI. (2023, February 21). NI.com. Retrieved April 6, 2023, from https://www.ni.com/docs/en-US/bundle/labwindows-cvi/page/advancedanalysisconcepts/fft_fundamentals.html
  2. Longman, J. (2022, September 29). Audio Frequency Spectrum Explained. AudioReputation. https://www.audioreputation.com/audio-frequency-spectrum-explained/
  3. Gleeson, A. (2021, September 2). The Audio Frequency Spectrum Explained | Headphonesty. Headphonesty. https://www.headphonesty.com/2020/02/audio-frequency-spectrum-explained/
  4. Jamal Mortimer. (2016, February 10). Audio Buffer and Latency Explained! [Video]. YouTube. https://www.youtube.com/watch?v=94VRFrisKLw
  5. Houben, K. (2021, January 23). Using Processing for Music Visualization. Generative Hut. https://www.generativehut.com/post/using-processing-for-music-visualization
  6. Sample Rate, Bit Depth & Buffer Size Explained. (2022). Focusrite. https://support.focusrite.com/hc/en-gb/articles/115004120965-Sample-Rate-Bit-Depth-Buffer-Size-Explained
  7. Olthof, P. O., . (n.d.). Peer Play. YouTube. https://www.youtube.com/@PeerPlay
  8. Free Music For Creators. (n.d.). https://uppbeat.io/
  9. LANDR. (2019, December 26). Beat and rhythm in music explained [Video]. YouTube. https://youtu.be/F21pS3Wo8ko?si=cNq7g2GxuCf_T8VP