The impact of audio on video game prototypes

A brief exploration of audio affecting player experience and ways for developers to influence this

Brent Schreurs, April 2023

The first time I took part in Game Studio we had to deliver the framework for a game, a kind of giant
prototype if you will. We delivered this prototype, without audio because of scope-related reasons,
and passed the assignment without many problems. I was part of the design team, and one piece of
feedback we got was: “Designing games is designing an experience, and right now I’m missing part of
the experience” (Mulder, 2020). When the time for this current R&D came along, I thought back on
that statement and figured I wanted to find out how big a part of the experience players were
missing.
Now when I say prototype, or prototyping, most of you will probably think of testing certain game
mechanics, UX elements or technical aspects of the game system. Audio will rarely be somebody’s
first thought, and there are good reasons for that. Besides it being a discipline of its own, and thus
should not be studied in this course because its not a school of the arts, audio is by nature
downstream of all other disciplines. It’s difficult to create sound effects for game mechanics you
haven’t thought of, or have people record voice lines for a non-existing narrative. So, the questions
asked here are: should you bother? Does having audio in a prototype add value, or is it just time
better spent on other game aspects? And, if it matters; how do we as developers influence players
with audio?

The boundaries of experience

Before we try to quantify the impact of audio in videogames it’s important we set some boundaries
concerning the psychological concepts of experience and immersion. Let’s start with experience,
more notably experience as a conscious event. In this case experience refers to the event or a series
of events itself, and not the knowledge it produces. During such an event some items are presented
to the person. These items may belong to diverse ontological categories such as objects, properties,
relations or events. We speak of experience in the restricted sense, restricted here refers to the
nature of these items, more specifically them being ‘real’ (Gupta, 2012). This means we disregard
things like hallucinations, dreams and illusions. For this piece of research these items would mainly
be the items making up the game; the visuals and audio. This means our experience starts when the
game (or playtest) starts and ends when our game ends.
In gaming we want this experience to take up as much of the players current consciousness, and the
matter in which this is happening is often called ‘immersion’ (Nacke et. al, 2010). Immersion can be
classified as a scale (classified by Brown and Cairns as ranging from engagement through
engrossment to total immersion [Brown and Cairns, 2004]). We can try to establish the contributing
factors of this immersion using the SCI model (Ermi and Mäyrä, 2005). They divide immersion into
three categories: sensory (S), challenge-based (C), and imaginative (I). For this paper we will focus on
the sensory part, seeing that this is the category that be enhanced/influenced by amplifying a
game’s audio-visual components. These components are further divisible through their function in
the game.

Sound, form and function, ludology and narratology

Sound can serve many different functions to a player. These can be roughly divided into three
categories: inform, entertain and immerse (Andersen et al, 2021). These distinctions are not hard
lined, a sound can serve multiple functions. Informing sounds are mostly direct feedback from player
actions such as gun sounds when shooting or footstep sounds while moving a character. Entertaining
sound holds the attention of the player and provides them a form of fun, while immersing sound
serves to heighten immersion with the given scene in game. Some see this division as the age-old
ludology vs narratology in gaming, informing sound being the ludic side, but more recently this
division is becoming less distinct (Kokonis, 2015). In their article ‘The Reality Paradox’ Richard
Stevens and Dave Raybould add another functional space in their framework to categorize sound:
the range of listeners (Stevens and Raybould, 2015) (see Figure 1). They explain the range of this
framework as follows:

  • Narrative: Audio that draws us into a fictional world or narrative.
  • Ludic: Audio that provides information to help the player achieve, or motivate the player towards
    achieving, mastery.
  • Social: Audio that is heard by all agents/entities in the game.
  • Personal: Audio that is heard only by the player.
Figure 1: soundtype framework

Most of these categories encompass only ‘authentic sound’, meaning sounds that are identifiable
with the real world. From the article: “…the ambient sounds (socio-narrative), voice-over-type
instructional speech directed at the player from their fictional superiors (ego-narrative), playergenerated sounds (such as footsteps) that serve a ludic function to others in alerting them to the
player’s location (socio-ludic), and of course, the instructional, notification, feedback, and
orientation sounds, which are heard only by the player, and which we can now identify as being “ego
ludic”. They argue that it’s the presence of these ego-ludic sounds that makes games sound like
games. This would mean that these are the sounds with the highest impact on player experience,
and thus the sounds to focus on in our test game.

The prototype

To test how big of an impact sound has on the experience of the player I made a small prototype to
test some assumptions. I chose to make a prototype because I had no other games available in a
playable state. A different solution I briefly explored was to have testers test some of their favourite
games at home; with and without specific audio categories active. Most games give the player some
control over different categories, with separate sliders for the background music, SFX and others.
This would also give me a baseline of sound/no sound differences, but I opted against it because of a
few reasons. The first being that these games would have vastly different soundscapes. A
multiplayer game might invest more of its sounds in the socio-ludic sphere whereas a single player
game might not have any socio-ludic sounds at all. This, combined with the fact that every game
volume-slider may impact sounds differently, led me to believe making my own prototype would
give me more uniform results.

Version 1 of the prototype

The first version of my prototype was a 2D brick breaker clone. The reason I chose this format was
because its quick to make and the learning curve is very shallow; meaning people will know what to
do in-game very quickly. This makes for fast prototyping and easy testing.

Fig. 3: Soundsettings for prototype 1
Fig. 2: Screenshot of the first prototype

Fig.4: The method used to play audiofiles

While making the soundscape for this prototype and having my teammates playtest it for a little bit I figured it might not be the best fit for my research. The main reason being the maximum number of sounds available without adding a lot of mechanics; the ball bounce, paddle movement, block breaking and ball reset bringing the total up to four. Other sounds are possible (e.g. combo sounds, different types of blocks) but require more time spent programming for minimal added options. This defeats the main purpose of fast prototyping.

Other reasons were the lack of direct feedback moments for player action (the player has only one mechanic available to them: moving the paddle) and the fact all moral connotations were challenge based. This means that everything that is ‘good’ or ‘bad’ is based on completing/failing the challenge of clearing the blocks. While this is a minor problem when testing for immersion, testing for any positive or negative affect will be harder.

The last issue is the practicality of this system. Because this barebones system of playing audio requires every sound file to be hardcoded in a script to be played. This would mean a lot of redundant work when changing audio files or expanding the prototype. It also needs every sound to be adjusted manually in the project for the equalization to be correct. This means that instead of changing the sound properties in a prefab or separate object in our assets, we would need to go the object that plays the sound to change it.

Version 2 of the prototype

The prototype I chose to test with is a 2D platformer. I thought this would give me the best combination of different types of sound, timing and context. Adding new sounds can be done quite easily and quickly by adding new enemies or environmental components.

Figure 5: the prototype level

To incorporate audio in this prototype I used Unity’s ‘Open Project 1’ assets for their highly reusable scriptable objects (SO). This allows me to have quicker access to audio mixing and placement in the scene by pooling the configurations in a single place. Using one of these Scriptable Objects makes it possible to save multiple configurations of sound properties and assigning them to different sounds. This saves me batches of time going into every sound and altering the pitch or volume when testing or finetuning.

Fig. 6: Scriptable Object example
Figure 3: audio settings used for the prototype
Figure 4: the sounds used

Fig. 7: The method for playing audio

I tested this first prototype in a classic A/B style, where players only play one version of the prototype. Version A has no sounds whatsoever, version B has ego-ludic sounds with little to no connection to ‘real’ world objects; meaning no sound effects try to emulate things in nature.

Playtest questions

After testing the prototype players are giving a set of questions. These questions come from the Game Experience Questionnaire (Ijsselstijn, 2013) from a team of the Technische Universiteit Eindhoven. They devised a set of questions that all correlate to different components: Competence, Sensory and Imaginative Immersion, Flow, Tension, Challenge, Negative and Positive Affect. Immersion and Flow will be our focus points and measure of success, being so closely related it hard to tell the difference (Stevens and Raybould, 2015) (Michailidis et al, 2018). The results of the other components are not irrelevant however, so we will take these into account as well. They can act as a baseline for the other test results by comparing their increase and thus eliminating some randomness. Some of the components (namely Negative and Positive Affect) might be useful later in the research process.

 I opted to use the short form (In-game GEQ, 14 questions) instead of the long one (37 questions) because of the length of the prototype. Asking people 37 questions for a prototype playable in 1-2 minutes seemed a little excessive. The questions are multiple choice, on a scale of 0 (not at all) to 4 (extremely). In the original playtest session of the non-audio version I used the In-game GEQ as is, which yielded too little useful information in hindsight. In the short form only 2 questions relate to the component of immersion: “I was interested in the game’s story” and “I found it impressive”. The result of this is a quite minimal amount of feedback on the component I am most interested in. Especially given the fact the prototype has no story. To combat this I added 2 questions from the full version to my questions: “It felt like a rich experience” and “I felt that I could explore things”. A full list of questions can be found at the bottom of this post.

Playtest results

The results of the questionnaire are compiled in the bar graph of figure 8, full results are seen at the bottom of this post. As seen in the results of the playtest, the scores for the prototype with audio are noticeably higher when looking at the scores for almost all competences. This means an increase in the immersion players experience while playing the game if there are ego-ludic sounds incorporated. In the case of my prototype adding ego-ludic sounds heightened immersion by about 40%.

Fig. 8: Playtest results

If we take these results at face value, it seems we can answer one of our questions; ego-ludic sounds constitute a sizeable part of a players experience, even while playing a prototype. 

When taking a closer look at the averages of the answers there are some interesting tendencies and contradictions to be found. The first interesting thing is that almost all categories increased, except negative affect (which you could say means Positive Affect doubled). For example both Immersion and Flow increased by a very similar amount (Immersion: 0.938 à 1.333, Flow: 0.688 à 1.056, 30% and 35% respectively). This result supports the argument of earlier mentioned research that their differences are negligible.

Competence and Challenge also increased in score (12% and 18% respectively), which could mean players both felt more challenged and more competent in solving these challenges when presented with audio-cues. These increases might also have impacted the increase of other competences; a higher Challenge score (and thus a feeling of a bigger/better challenge) might have influenced Immersion through their relation as seen in the SCI-model. The same could be said for the increase and decrease of Positive and Negative Affect respectively. When looking at Challenge and its relationship to Competence, which could be seen as a basic human need (Vahli and Karhulahti, 2020), there might have been a trickle-down effect on the players mood by fulfilling a part of this need and thus influencing the scores of Affect.

When looking at the scores for Positive and Negative Affect in isolation, while much higher than my expectations, their increase/decrease are somewhat in line with each other (+25% and -35%). Whether this is a result of adding audio that makes players feel good (through positive valence, see next paragraph), a result of them feeling more challenged through audio cues, or a more basic rule that more feedback is always better regardless of the type, is hard to say.

Having evaluated the averages we can take a closer look at some of the individual answers. These show that some testers were a bit contradictory to their own answers. In the no-audio version of the prototype tester 3 stated they were ‘Fairly (3)’ bored, while also being ‘Moderately (2)’ challenged. Tester 5 felt ‘Not at all (0)’ successful and ‘Slightly (1)’ frustrated but also scored amongst the highest of the testgroup on the question “I felt good” (Moderately (2)). The audio version had similar results, with tester 7 feeling ‘Fairly (3)’ successful and impressed, but also ‘Moderately (2)’ frustrated and irritated. Because of the lack of follow up questions in my questionnaire I have no way of finding out what the reasons were for these contradicting feelings, something that would have been interesting.

This is a personal takeaway for future research: give people the chance to elaborate. This chance to add thoughts could have helped me in drawing conclusions from these results. For example: people were slightly interested in the games story in the no-audio version, which increased by almost 40% in the version with audio. Both versions have no apparent narrative, so it’s unclear to me where this interest (and especially the increase in interest) comes from.

Validity

There are however some caveats given with these results. The first being an issue of control. The questionnaire was a bit different the second time around, as was the location of prototyping. Version 1 was tested in a variety of settings, both people in their own homes as well as people playing in public spaces like caffes or bars. Version 2 was mostly tested in a more private setting, at home or more secluded public spaces where players could focus on the audio. This means that it’s difficult to assess whether changes in answers stem from the manipulation on the prototype or the outside factors in the environment having an influence.

The contradictory answers from singular testers further prove that either not every tester was paying full attention when answering the questionnaire, or that there was enough space for them to feel such contradictory feelings without any space for them to explain them. The latter is definitely at least partially a fault of my testing methods (open-ended questions or a short interview would have mitigated some of these issues), while the former is harder for me to explain.

To minimize error and bias a new playtest should be performed where testers get either prototype in the same conditions, whether that be a private or public setting. After testing they will be presented with a list of questions, both closed and open ended, and a short conversation with the test-givers. This would eliminate most uncertainties, but is also subject to error based on human intervention.

A different method could be testing the prototypes while the testers have their physiological responses measured. This would entail measuring their heart rate, tracking eye-movement, measuring blood pressure and breathing. (Anyaki et al, 2017). 

Taking these factors in account I don’t feel the percentual increase is representable as such, and thus should be seen as a culmination of these factors. Finding out how big a part this all played in the final scores is impossible, but even with a conservative stance it would still net us an increase in player immersion. This makes it worthwhile enough to incorporate audio into prototypes in my opinion.

Further research

The original plan for this R&D was to test a third version of the prototype, using a set of different soundscapes. A soundscape being the total package of sounds in the game, or according to Wikipedia: “… the acoustic environment as perceived by humans, in context.”. There is a lot of research regarding emotions and sound (Grekow, 2018) (see paragraph below), but not much regarding sound-effects. There have been some researchers investigating the physiological responses to environmental noise (Gomez & Danuser, 2004), but they only incorporate sounds from the physical world (e.g. peoples reaction to an audience clapping, or the sound of a passing train). This means that there is plenty to be won by testing the emotional and physiological response of people playing videogames with different types of sound-effects, especially if those effects are minimally related to the physical world.

My plan was to built some soundscapes around certain emotional themes (positive and negative) and incorporate them in the current prototype. If there is no visual manipulation or change in setting, and the emotional response between these varies it would mean certain audio cues evoke certain emotional responses. Ways to quantify these responses are explained in the paragraph below. Time constraints did not allow me to playtest this properly, but the knowledge might benefit developers looking to add some more targeted audio to their prototype.

Valence, cadence and vibes

In the earlier research we looked at the different functions of sound, but not at how to influence these functions. Merely adding sounds heightens immersion, but it doesn’t ‘guide’ that immersion per se. To find out how to guide this experience in the way we want we first need to look at the role of emotions, and its correlation to sound in videogames.

Now, emotions are a psychological field of study on their own, so for the purpose of this text I use the definition given by Inger Ekman regarding emotions in videogames: “Emotions are thus evaluations of a specific (visceral and urgent) type, which signal events of critical importance and relevance to the perceiver. “(Ekman, 2008). These emotions often have a moral connotation, or valence (good vs. evil), and an intensity, or arousal. This is called the dimensional model of emotion. Because this model lacks any context as input, we can further classify emotions as basic and complex. The type of game being played also affects these emotions, abstract games such as Tetris rely more on basic emotions whereas horror games rely more on complex emotions. (Lankoski, 2007). The fear of failure (when dropping a block the wrong way in Tetris) and the fear of a monster are intrinsically different (Power and Dalgleish, 1997). According to Lankoski, empathy plays a big role in the power dynamic of these basic and complex emotions, games with anthropomorphic characters deploy a wider range of emotions because people can relate to them.

To put this in simpler words: the more abstract your game, the less you must care about steering complex emotions. Or; the more abstract your game, the harder it is to use sound to make players feel anything but basic emotions. If you’re creating a horror game based on geometric shapes it might be hard to make a player feel the classic disgust/fear we know when seeing ghosts or monsters coming our way using sound.

This however does not mean that the more realistic a game is, the more realistic your sounds should be. This is a paradox of some sorts, where sounds feel more ‘real’ without actually being real most of the time (think of a foley artist walking on corn-starch because it sounds more like walking on snow than actual snow (Ekman, 2008)).

Recommendations and further reading

To conclude this blogpost I have some recommendations and reading material for game developers thinking of incorporating audio in their prototypes. If you’re thinking of prototyping a game consider the following:

  • Add sound, but don’t stress about it. Not having it might have players give a more negative response, which can then be perceived by the developers as the prototype having bad mechanics. Add some audio to negate this, but don’t spend hours finding/creating the perfect sounds.
  • Not all sound is created equal: categorize the game space where your sounds operate and engineer accordingly. Everybody loves a good soundtrack to accompany your game, but if all your ego-ludic feedback sounds are messed up chances are people won’t care about the background music.

If you’re interested in sound in videogames and want to know more: check out some of the sources at the end of this post or have a look at the books/movies below that I didn’t use in this post but are still relevant (links in the Resources):

  • The Oxford Handbook of Sound and Imagination, Volume 2
  • Beep: A Documentary History of Game Sound
  • A Composer’s Guide to Game Music

Resources

Andersen, F., Danny, King, C. L., & Gunawan, A. A. (2021). Audio influence on game atmosphere during various game events. Procedia Computer Science, 179, 222–231. https://doi.org/10.1016/j.procs.2021.01.001

Aljanaki A, Yang YH, Soleymani M. Developing a benchmark for emotional analysis of music. PLoS One. 2017 Mar 10;12(3):e0173392. doi: 10.1371/journal.pone.0173392.

Brown, E. & Cairns, P. (2004), ‘A Grounded Investigation of Game Immersion’, in CHI’04 Extended Abstracts on Human Factors in Computing Systems, pp. 1297–1300.

Ermi, Laura & Mäyrä, Frans. (2005). Fundamental Components of the Gameplay Experience: Analysing Immersion.. Worlds in Play: Int. Perspectives on Digital Games Research.

Ekman, Inger. (2008). Psychologically motivated techniques for emotional sound in computer games. 20–26.

https://www.researchgate.net/publication/233406205_Psychologically_motivated_techniques_for_emotional_sound_in_computer_games

Gupta, Anil (2012). “An Account of Conscious Experience”. Analytic Philosophy. 53 (1): 1–29. doi:10.1111/j.2153-960X.2012.00545.x.

Gomez, Patrick., Danuser, Brigitta. (2004) Affective and physiological responses to environmental noises and music, in International Journal of Psychophysiology, Volume 53 (2):  91-103. https://doi.org/10.1016/j.ijpsycho.2004.02.002.

Grekow, Jacek. (2018) Audio features dedicated to the detection and tracking of arousal and valence in musical compositions, in Journal of Information and Telecommunication, Volume 2 (3): 322-333. https://doi.org/10.1080/24751839.2018.1463749

IJsselsteijn, W. A., de Kort, Y. A. W., & Poels, K. (2013). The Game Experience Questionnaire. Technische Universiteit Eindhoven

https://pure.tue.nl/ws/portalfiles/portal/21666907/Game_Experience_Questionnaire_English.pdf

Kokonis, Michalis. (2015). Intermediality between Games and Fiction: The “Ludology vs. Narratology” Debate in Computer Game Studies: A Response to Gonzalo Frasca

https://web.archive.org/web/20180725220608/https://www.degruyter.com/downloadpdf/j/ausfm.2014.9.issue-1/ausfm-2015-0009/ausfm-2015-0009.pdf

Lankoski, Petri. (2007). Goals, Affects, and Empathy in Computer Games.

https://www.researchgate.net/publication/200010277_Goals_Affects_and_Empathy_in_Computer_Games

Michailidis, L., Balaguer-Ballester, E., & He, X. (2018). Flow and Immersion in Video Games: The Aftermath of a Conceptual Challenge. Frontiers in psychology, 9, 1682. https://doi.org/10.3389/fpsyg.2018.01682

Nacke, Lennart., Stellmach, Sophie & Lindley, Craig. (2010). Electroencephalographic Assessment of Player Experience: A Pilot Study in Affective Ludology, in Simulation & Gaming. 42.

Power, Mick, and Tim Dalgleish. Cognition and Emotion: From Order to Disorder.Hove: Psychology Press Ltd, 1997.

Stevens, Richard & Raybould, Dave. (2015). The reality paradox: Authenticity, fidelity and the real in Battlefield 4. The Soundtrack. 8. 57-75.

https://www.researchgate.net/publication/286640535_The_reality_paradox_Authenticity_fidelity_and_the_real_in_Battlefield_4

Jukka Vahlo & Veli-Matti Karhulahti. (2020) Challenge types in gaming validation of video game challenge inventory (CHA), in International Journal of Human-Computer Studies, Volume 143. https://doi.org/10.1016/j.ijhcs.2020.102473.

Unity Open Project One

https://github.com/UnityTechnologies/open-project-1

https://en.wikipedia.org/wiki/Soundscape

The Oxford Handbook of Sound and Imagination, Volume 2

https://lib.uva.nl/permalink/31UKB_UAM1_INST/gq32c0/alma9940223786105131

Beep: A Documentary History of Game Sound. Directed by Karen Collins./Beep: Documenting the History of Game Sound

https://lib.uva.nl/permalink/31UKB_UAM1_INST/1hfh82p/cdi_proquest_journals_1990828583

A Composer’s Guide to Game Music (The MIT Press)

Appendix 1: Game Experience Questionnaire

Questions for version 1:

1 I was interested in the game’s story
2 I felt successful
3 I felt bored
4 I found it impressive
5 I forgot everything around me
6 I felt frustrated
7 I found it tiresome
8 I felt irritable
9 I felt skillful
10 I felt completely absorbed
11 I felt content
12 I felt challenged
13 I had to put a lot of effort into it
14 I felt good

Added questions for version 2:

15 It felt like a rich experience
16 I felt like i could explore things