There is a constant tension in game audio that simply does not exist in film audio, that being the desire to minimize repetitive sounds versus the limitations of the delivery mechanism. If a sound designer working on a film wants every footstep, door slam, gunshot and telephone ring to be completely unique, it's simply a matter of creating the right number of instances of each sound and laying them into the soundtrack. Similarly, a film composer can choose the extent to which each cue is wholly original and balance that against some degree of repetition for aesthetic impact. Sometimes it might make sense to reuse music, but many times each cue needs to be original. Regardless, though, the medium itself does not dictate these choices. (Although the film's schedule, budget, and staff might dictate these choices in film just as they do in games.)
In a game, of course, it would be unfeasible to attempt to make every instance of a sound unique, if for no other reason than it would require too much RAM to store all the possible variations. But even assuming infinite system resources, there remains the runtime problem inherent in the interactive world of a game. It isn't possible to make every gunshot sound unique if you don't know how many gunshot sounds are needed! This is not to say that variety is impossible, of course, but the idea that each instance of a sound can be unique is simply not possible, much less workable. Instead, in games, we need to achieve a sense of liveliness and variety in our sounds with other means.
In Blood Wake, the chain gun on the bow of the ship proved to be a particular challenge for the audio team.
In Blood Wake, a recent nautical combat game for the Xbox that I worked on, the "chain gun" sound presented a difficult problem. The player boat almost always has at least one chain gun (and, depending on upgrades, as many as four chain guns) and the enemy boats often have them too. We needed to make a rapid-firing weapon sound that was powerful and impactful, could be listened to for extended periods without becoming irritating, and was able to support multiple playback instances simultaneously. In addition, since the guns jammed up when fired for too long, the sound needed to be able to sputter out and stop. Loops would have to be too long to avoid sounding repetitious and thus would cause us memory problems, and it would have been tough to get the jammed-up sound with loops anyway. So we had to go with individual shot sounds and find some way to make them sound natural.
Chris Hegstrom, the sound designer on the project, came up with a great solution. First, he created individual gunshot sounds that had the kind of punch and power that I was looking for. He created two groups of shot sounds, player chain gun and enemy chain gun sounds, and made eight variations within each group. I then asked the audio programmer write a system that would call these sounds in a quasi-random order (it was random, but was weighted to be less likely to call the same sound twice in a row) and could adjust the playback rate. We also made adjustments to slightly randomize the pitch and volume, and to vary the timing of each shot so that it was almost, but not quite, perfectly regular. Then we added an additional layer of control: when more than one gun was firing, instead of calling another instance of the same system, we simply increased the rate of the shots and increased the pitch, volume, and time variations slightly. This avoided the horrible "flanging" problem caused by playing the same sound multiple times at very slightly different times and pitches. Hegstrom spent many painstaking days tweaking all the little values, but once he was done, he had created a very convincing multiple weapon sound.
If we had been working on a film, none of that would have been necessary. We would have purpose-built each combat sequence with its own gun sounds, and modulated them to get exactly the character we needed at the moment.
The drive to increase the variety of sounds and reduce repetition often leads to a desire to increase the quantity of sounds. We then face a painful tradeoff: should we include more sounds at lower fidelity (lower sample rate, higher compression, etc.), or fewer sounds of higher fidelity? There are actually pretty good arguments on both sides of this issue, and I've shipped games that favor each approach. But my recent experience gave me new insight in this area.
I have long believed that the only measure of success that matters is the overall sound quality of the final product. The quality of any given sound is arguably irrelevant. Nonetheless, while we were producing The Two Towers, we decided to push hard to make each sound as fine as it could be, then take care of the relationships between the sounds at mix time. This approach offered a couple of benefits. First, it forced us to concentrate on what we were trying to accomplish with each sound. We asked ourselves questions like "Is this sound realistic? Is it stylized? Does it have impact? Should it stand out and hold attention on its own, or should it meld into the ambience and the rest of the soundscape?" These decisions forced us to pay attention to the consistency between sounds in a way that we simply wouldn't have had to do if we were building the sounds directly into the mix. Plus, since we had access to a number of sounds from the film production, we wanted to meet or exceed the quality standards that they set at every point.
The result was a set of sounds that required quite a bit of finessing to work together at runtime -- because if every sound is just as warm and rich and vibrant as the next, the mix is often one big, muddy mess. Sure enough, with all elements present, the mix didn't work well. So we went through every sound and adjusted it slightly, then listened again, found the new problem areas and adjusted them yet again. We did this over and over in dozens of passes. It was very much like a film or music mix, balancing each sound's volume and equalization slowly and subtly with volume and equalizer until they all worked together just right, and yet retained as much of their previous character as possible.
When you take many passes over a game's audio, you find spots where unexpected sounds jump to the fore. This is due to the nature of an unpredictable run-time mix. For instance, perhaps the player has killed all the enemies in an area and chooses to swing his weapon again anyway, and hears the sound in all its exposed glory. Or maybe an AI-driven character chooses to attack with a unique combination of enemies. One way or another, the mix is bound to find a way to surprise you, no matter how much you work to control it.
Having each individual sound produced to such a high degree also had an interesting impact on the production of the game. As we dropped sound clips in, the game began to sound complete very quickly. Because each sound could stand on its own relatively well, even a skeletal set helped bring the game to life sonically much earlier in the development cycle than is typical. This allowed us to get useful feedback early and get the other disciplines bought in to what we were doing.
Finally, making each sound as good as it could be in isolation also lead us to sacrifice quantity for the sake of quality, when forced to make the tradeoff. While there's hardly a moment in The Two Towers when there aren't at least ~130 sounds loaded into RAM, there are always moments when something that should be making a sound isn't. We decided to use this to our advantage by paying extra close attention to the "point of view" the sound created.
The sound designer, editor, mixer, and director of a film have a great deal of power to affect the audience's perceived POV, increase or decrease its scope, and elicit emotion. In a game, this can be more difficult. Again, because of the unpredictable nature of a run-time mix, it's hard to know what is going to seem important at any given moment. In our case, winnowing the number of sounds down for the sake of fidelity made us think deeply about this problem. Footsteps are boring; they're throwaway sounds, right? Actually no, because at least in the case of the player character, they provide a sense of presence in the world that is key to the player's involvement. Big, showy enemy creature vocalizations are the showcase for sound design, right? Well, not always, because when things get busy on screen, these sounds tend to clutter up the mix and provide little emotional benefit. It turns out that the things that we did to get the game audio to work in The Two Towers are precisely the things that cinema's top sound designers call for.
While very little that I've ever predicted about the future has proven to come true, that won't stop me from speculating. Certainly some general trends have developed along the paths I thought they might (though significantly less quickly than I would have liked), and I feel fairly confident I can extrapolate some useful projections. For one thing, the amount of attention paid to, and emphasis placed upon, game sound has increased dramatically over the last ten years. And, predictably, the quality has improved right along with it. It's important to note, though, that this hasn't been because some Hollywood hotshot has come in and dictated the "right" way to do things. Game audio has learned many lessons from film audio -- and has lots more to learn -- but it has also developed into a mature and unique craft of its own. I now believe that film audio can take some lessons from games.
To those who think that the film guys have it easy and that the problems we face in game audio are unique, I highly recommend the following article by Randy Thom (of course, it's also full of great thinking about using audio for storytelling, point of view, etc).
To read more about my team's experiences adapting the Lord of the Rings movie music, sound, and voice into a game environment, check out these articles: