Creating an Interactive Audio Environment
November 14, 1997
Audio in today's interactive entertainment media has progressed far beyond the bleeps of early video games. An object or an environment within a game exhibits a number of complex relationships. A creature may be surprised to see you. A robot's gears get stuck when it tries to move toward you. A diabolical enemy is afraid of the dark. When encountering these elements in a game environment, we expect them to communicate to us through audio in subtle and different ways. Aspects of emotion such as surprise, frustration, admiration, and fear could easily be conveyed through an enhanced and well thought-out object vocabulary.
Our lives are full of an ever-present collage of audio cues that we take for granted. For example, at this "quiet moment," I can hear the cascading sound of a fountain in a pond, the intermittent quacking of ducks and geese, a baby in the background, someone pouring a bucket of water outside, and a plane flying overhead. All of these cues, though subtle and seemingly unimportant, create the ambience of a particular scene, imbuing it with identity and significance. Without these background sounds, or ambiences, our lives would sonically resemble a lunar landscape. A collection of sound cues such as this within a game environment refers to the noncausal relationship of a player to the game. The sound space isn't triggered by the player's direct action. Instead, the sound is affected by and reacts to the environmental aspects of the scene that is being conveyed.
When we go to a movie, our emotional response is directly related to the music. The music swells, our anticipation grows, and our adrenaline rushes. The music ebbs, and we feel a calming sensation. This is very easy to convey in a linear medium, where the ending and the progression of events in a movie is predetermined; but how do we compose a soundtrack to a game if it can follow many paths and endings? An adaptive soundtrack that responds well to game events is one of the best ways to envelop the player in a game experience.
Audio Object Vocabulary
An audio object vocabulary is a method by which game objects (not necessarily just speaking ones) talk to each other and the player. The methods of communication vary from object to object and from context to context. There are three types of object interaction: direct, indirect, and environmental.
An object communicates directly as a cause of direct action on its part. When the ball hits the paddle in the old arcade game Pong, it makes a bleep. This is direct object interaction. Unfortunately, most games haven't explored far beyond this simplistic level of object interaction. Direct communication is important when you want to convey specific audio cues, such as a scream of pain when you shoot a monster, or the creak of a wooden rocker when you push back a rocking chair. In Monolith Production's Claw, I found it important that every character had something different to say when you interact with him or her (or it), even if it's in combat. For example, a melodramatic character, while dying, would say "I'm dying... I'm dying... I'm dying... I'm dead," with an animation to suit. A more primitive character would emit a squawk, and a more substantial enemy would yell out, "I curse you, Claw" as he falls to his death. When you hit a lounge-lizard-turned palace-guard-merman, he would say, deadpan, "Ouch that hurt quite a bit."
As always, a variety of audio cues are paramount in ensuring that a set of quotes doesn't become repetitive. From a programming standpoint, that may require a bit more intelligence to pick out the quotes. A buffer with an index to the most recently used quotes helps a lot because it shields the player from experiencing the same "random" set of sounds in rapid succession.
This is an indirect method of object interaction. That is, by causing something to happen in the game, something else responds sonically. A typical example of this is a "sighting" state for an enemy. When an enemy sees you, and his or her AI changes, a sonic cue that signifies that change may be appropriate. In Monolith's Blood, for example, cultists scream in a terrifying foreign language (created for the game drama) a series of epithets when they spot the player (Figure 2). In Claw, every enemy has something different to say in the "sighting" state. A female boss taunts Claw in a mildly suggestive manner when they come into contact. A goofy bear sailor exclaims "I don't like you" when he sees a player.
Other sonic cues may convey indirect object interaction. Your character may begin breathing heavily when he or she is tired (health is less than some coefficient). Your metal body suit emits a rubbing, squeaky noise that signifies rusting. In addition to sonic cues that help convey complex visual phenomena, certain characters within the game display behaviors that can be conveyed easily through sonic cues, even if they aren't represented visually. Indirect cues can be based on a number of different motivating factors, the rules of which can be determined at the game design stage. For example, in Blizzard's Warcraft II, clicking on an ax-throwing troll more than once causes it to respond with annoyance, even though no animation is being shown. This is highly effective character enhancement.
A character or object in the game may generate a system of audio cues on its own, irrespective of its communication to the player. This is purely a function of a character's existence in its environment. It may be busy chatting to itself or other characters. It may generate a sound or a series of sounds on its own. Our goofy bear sailor from Claw will comment on how hungry he is or where his pet rat might be when he's in an idle state (Figure 3). Depending on where he is in the game, Caleb (the character you play in Blood) may pick from a variety of different show tunes to sing while he's taking a break from the carnage. A thespian tiger from Claw recites different Shakespearean passages as he muses on his own omnipotence.
Environmental communication need not be comic, nor does it need to be vocal. A swishing blade and a humming motor sound signifies an industrial fan in Blood, while a phone may be ringing intermittently. A character may pass by an alien hive, with pods emitting a terrifying whine.
Environmental communication is paramount in reinforcing a character or object's existence in the game environment. The character literally comes alive as a personality or physical entity. But as with all different types of object interaction, it's important to remember to keep a consistent set of sounds from character to character. In Claw, I made a decision to use three different idle cues (environmental communication), four different sighting cues (indirect communication), and between eight and nine sounds (direct communication) to describe each character sonically. In the end, most characters used more and some less than that average. However, planning the audio object vocabulary ahead of time helped to maximize the use of memory allotted to sound in the game.
The nature of a game object must be relayed in the character of its "voice." It's very easy to screw up the integrity of a character by giving different visual and aural personalities. However, giving the right "voice" can greatly enhance a character's personality. A weak character may be depicted through the use of a humorous voice. A stronger character's dramatic personae can be highlighted through the use of a deeper and more resonant voice, as well as a script that relates without question his or her authority.
TIPS AND TECHNIQUES.
- Always use professional voice actors. Trained voice actors are professionals who specialize in giving your character the voice it deserves. Whether it is a cartoony or a deep resonant voice, a single talented actor may help develop your ideas for multiple characters and realize them in ways that you haven't conceived. When in doubt of how to find a good voice actor, look to talent agencies and talent search services for help. Moreover, making a trained voice work with the rest of your mix is quite a bit easier than trying to amplify or equalize a weak voice. All sound engineers can attest to this.
- Spend a little more time in sound design. As in all cases, don't just pull sound effects off of a CD. Create sound effects from your own sampled sounds as much as possible. A portable DAT recorder and a good microphone in the field will take you much further than a commercial CD sound library ever could. Nothing kills a unique audio environment more quickly than the phrase, "I've heard that somewhere else before...."
- Collaborate with professional scriptwriters. Writers would jump at the opportunity to write a couple of hundred lines of dialogue for some game characters. The results will definitely be worth the investment.
- Don't be afraid to inflate the vocabulary. Minimize silent time. If you have the space for audio, use it. Set the limit with the programmers and designers early as to your memory budget for audio, and use it wisely.
Ambient sound refers to the sound world that is generated from a player's location in the game space. It is a system of indirect and environmental cues that immerse a player in a particular setting. As in my real-world example, we are surrounded by ambience all of our lives - a complex web of sound. However, ambience is the most underdeveloped side of sound design in interactive media. A game with little or no ambient sound presents little or no connection to how we perceive the outside world with our ears. An ambient sound world might be as simple as a single looping track of forest sounds or a system of sound-producing objects all linked together by their location within a given game environment.
The environment can communicate to the player information important for the game-playing experience. For example, a raven flies by in a forest, making a screeching sound that informs the player that he or she has ventured too far. A swamp makes a menacing gurgling sound, informing the player that he or she shouldn't go there. The sound of a portal opening and closing in the distance informs the player that he or she is close to the level's exit.
Environmental ambiences fully transport a player into the world presented by the game. In Claw, each level has a distinct set of ambient sounds based on the terrain that the main character is encountering. Within a terrain, a single (environmental) looping sound is used (such as the sound of a forest), along with a set of sounds (indirect cues) that are triggered either by Claw's location on the map or by random chance. For example, the sound of a character whistling in a window matches the animation of the character shaving and the background ambience of village noise. When Claw moves through another terrain, the looping ambient sounds would cross-fade, and another set of ambient trigger sounds would be selected that corresponds to the new terrain.
In Blood, I used ambience to enhance the atmosphere, as well as to connote physical environments. In a temple, distant chanting is heard (though the source of the chant is never discovered) (Figure 4). In a narrow hallway, whispers surround the player from all sides. The inclusion of atmospheric elements adds to the spooky and scary nature of the game's look and feel.
Tips and Techniques
- Try to use consistent reverb settings. All sounds within a given environment should have a similar set of reverb settings that place the entire sound world within a consistent acoustic space. There are foreground and background elements that do stand out from within the ambience, but not so far as to mistake these sound elements for characters or objects that a player must encounter.
- Make your loops seamless. The looping ambiences in the game need to be smooth and unnoticeable. Large variations in pitch or amplitude will make the loop quite recognizable and annoying after a while. A rhythmic pattern works well (like the sound of crickets), if it's cut perfectly. Also, a longer sound sample will help mask the loop point.
- Avoid loops. Though seamless loops are not an impossibility, it's best to use trigger ambiences whenever possible. Trigger ambiences help mask the loop point, as well as provide overall variety in the ambience. In Kesmai's Air Warrior, I used trigger ambiences to convey the sound world of a World War II airfield. During any given time, an airplane fly-by sound, a vehicle drive-by sound, and an airplane startup sound would be selected and played from a set of 50 or so trigger ambiences. Since these trigger ambiences were selected randomly and played at random times, the sound world was always changing and seldom repetitive. Another method of avoiding loops is to queue similar sounds one after another. A set of three or four sounds that fit seamlessly end-to-end will work well if they are selected to play on a single channel randomly. This helps break up the pattern created by a single looping sound.
- Try to create fine gradations of ambiences. Say we're walking from a forest into a mountain pass. We start out in a deep forest then walk through a leafy forest then into a meadow before reaching the mountain pass. If we have a single sound for the forest ambience, no matter how the forest changes, the ambience will remain the same until we change scenery drastically when we reach the mountain pass. However, if we subdivide the forest into three gradations (deep, leafy, meadow), we'd be better able to convey to the listener the transition of environments from forest to mountain pass.
The nonlinear medium of computer gaming can lead a player down an enormous number of pathways to an enormous number of resolutions. From the standpoint of music composition, this means that a single piece may resolve in one of an enormous number of ways. Event-driven music engines (or adaptive audio engines) allow music to change along with game state changes. Event-driven music isn't composed for linear playback; instead, it's written in such a way as to allow a certain music sequence (ranging in size from one note to several minutes of music) to transition into one or more other music sequences at any point in time. An event-driven music engine must contain two essential components:
- Control logic - a collection of commands and scripts that control the flow of music depending on the game state.
- Segments - audio segments that can be arranged horizontally or vertically according to the control logic.
In Kesmai's Multiplayer Battletech, control logic determined the selection of segments within a game state and the selection of sets of segments at game state changes. Thus, the control logic was able to construct melodies and bass lines out of one to two measure segments following a musical pattern. At game state changes, a transition segment was played, and a whole different set of segments was selected. However, this transition segment was played only after the current set of segments finished playing so as not to interrupt the flow of the music. I selected game states and also tracked game state changes based on the player's relative health vs. the health of the opponent. Overall, I composed 220 one to two measure segments that could all be arranged algorithmically by the control logic. What resulted was a soundtrack that was closely coupled with the game-playing experience.
Tips and Techniques
- Music comes first. Remember that no matter how closely your music follows the game play and how interactive it is, if it doesn't gel as a musical composition, you're better off writing a linear score. Always explore all possibilities of transitions from one game state to the next, and see if the music reacts the way you meant it to react. Make sure that you write transition sequences and that the engine is intelligent enough not to change game states midmeasure or midphrase.
- Decouple segments horizontally and vertically. Compose your music so that different segments may be combined end-to-end (horizontally), as well as on top of each other (vertically). This way, you can combine different melody lines with bass lines, use different ornamentation, and so on.
- Don't give away too much information. Sometimes a musical cue might say too much, when it was meant just to highlight the game state change. For example, in a certain game, an upward chord progression always signifies to a player that a starship is on his tail. When working on game state changes, make sure your event-driven music isn't used as an early warning system for the game.
- Define a series of translation tables to track game state changes. For example, in Multiplayer Battletech, a game state change from "winning" to "advantage" implies a losing trend. The music reacts to this change by selecting a different set of segments than it would if the change occurred from "advantage" to "winning." By composing in a nonlinear fashion, and by having the music react to the player's actions directly and indirectly, we introduce a new level of interactivity. Emotionally, the soundtrack carries the person seamlessly along with the action in much the same way as the static, linear media of film. In this fashion, music becomes the gateway to the player's emotional response to the game.
Total Immersion through Sound
As game designers and audio producers, we should be constantly aware of the impact that a well thought-out audio environment can have on the product. It can make a graphically simple and uneventful scene become awe-inspiring. Effective use of an audio object vocabulary can enhance the impact a character may have on the game player. Ambient sounds, in all of their variety, can transform a game scene from a virtual one to a believable one. Surreal textures and atmospheric gestures can generate emotional responses in a player as varied as the soundscapes themselves. As games become more and more complex and graphically spectacular, we must not overlook the role of audio in enhancing and completing that feeling of total immersion.