CMP Game Group Presents:

Interactive, Audio
By Rob Ross
May 15, 2001

Interactive Audio.

Say this in a crowd of developers and you'll most likely get one of these three responses:

  1. Suddenly, a seemingly calm group of developers turn into the angry torch-wielding mob, looking at you like you've got bolts in your neck.
  2. Screams of terror erupt as and everyone within earshot flees.
  3. Someone bold says "But it sounds like MIDI" while the rest nod in nervous agreement.

Why is it that these two little words "interactive audio" seem to be near the top of so many developers bad_word.lst file? Doesn't it seem odd that in an industry called "interactive entertainment" one of the key components in every game lacks this interactivity? If you had the ability to add CD quality, interactive audio into your game wouldn't you want to do it? That question is akin to asking if you want tires to come with your new car. If interactive audio is something we want and need in games, then why is it in such limited use? It's not for a lack of desire; we know we need it. In my humble opinion the reason it's the exception instead of the rule boils down to one thing: Fear.

Where are we?

Until fairly recently audio has taken a back seat (the trunk may be more accurate) to other technologies in our industry. I don't believe it's because developers feel audio is unimportant necessarily, I hope by now we all understand the important role audio plays in the gaming experience but there was just never much focus directed towards the development of audio technology because of the PCs predominant use as a business machine. PC power and graphics technology has been driven by the business application need rather than entertainment value. But even the television, which is designed solely for its entertainment value, hadn't seen any significant improvements in audio technology until the 'home theater' craze. For years, our high-end stereo systems sat right next to the TV with little or no thought of interconnection. Music technology has been geared towards and driven by the aural experience alone and almost completely separated from the visual experience, even though we all love a good concert. It's fairly easy to see how audio ended up so far behind in its integration in a predominantly visual industry.

Fortunately for all of us, things have been changing and audio is being recognized as an integral part of the entertainment experience. There are now a plethora of companies working on audio hardware -- sound cards and high quality sound cards are very inexpensive and few PCs ship without one. But CD quality audio had previously required far too much storage and processing power to be used in mainstream games. Today, however, processing speeds, RAM prices, hard drive capacity and the proliferation of the CD-ROM drive, coupled with compression and streaming technologies have finally made the ability to include quality audio in games a reality. Unfortunately, audio technology is still behind the curve. Not the actual hardware technology itself, mind you, but the integration of the latest audio technology at a level commensurate with the rest of the industry technologies.

Actually, there is quite a bit going on in the audio industry. Manufacturers and sound designers are exploring the latest and greatest advancements like personal environmental audio settings, positional 3D and surround sound. These things are important but they are not advancing the core technology, they are only adding bells and whistles to the current technology. We now have the ability to add CD quality audio into our games, but we need that audio to be as interactive as the other pieces.

Indefinitus Definition
OK, so let's define what I mean by "Interactive Audio". I must preface this definition by telling you that this is what I perceive the term to mean as it pertains to the interactive entertainment industry. I do this because you won't find this term in a dictionary. In fact, even the word "interactive" is only listed as an adjective under "interaction".

Interactive audio is a technology designed to allow specifically created audio, placed in a given application, to react to user input and or changes in the application environment.
Sounds simple enough, eh? Perhaps it will make more sense in an example. Let's say you are developing a racing game where you drive through various cities. Generally you would have a particular audio track to represent each different city and various utility screens. These audio tracks play from start to finish and normally loop over and over as long as the user is present in that environment. With interactive audio you could have the music adapt to changes in the environment. Wouldn't it be better if while passing through China Town in San Francisco for instance, some ethnic instruments were added to the audio track and then removed as you leave that section of the city? Or maybe even transition the entire theme to one with an ethnic feel and then transition back as you leave that section of the city. How about decreasing the tempo and changing the instruments and style. You could go from a techno sound to a cool acid jazz as you exit the city and hit the freeway. In movies the music generally takes on a slightly different role (another difference between our industry and the movie industry which I'll expound on below). The intent is to create is a particular mood or atmosphere relevant to what's happening or what is about to happen. In a perfect situation we would build tension or suspense and then transition right into the event, guiding the emotion of the user. The ability to do these types of things and more and do them seamlessly -- this is what I mean by interactive audio. With the current way game audio works the audio changes abruptly, if at all, only at the event. The audio is incapable of being a vehicle to move the players' emotion. The capability of being a vehicle needs to be the next step in game audio advancement.

The term "Adaptive Audio" was coined some time in the not to distant past to describe a method of switching audio tracks in a similar fashion, but the problems associated with this method caused it's own demise. The first of which being the shear number of audio files needed to accomplish the task, which of course added to the space required to store them. Adaptive Audio requires constant loading and unloading of large audio files which causes a great deal of hard drive accessing, slowing the process down and causing stuttering of the game. Also, all of those large files being loaded and unloaded taxes the processor and memory, which the programmers object to since these resources are needed for so called "more important tasks" like, running the game engine and graphics. Can you blame them? Some might say that Adaptive Audio or even Reactive Audio is a better term for what I'm talking about, but I think that these labels do not properly convey the idea that we want the audio to not only react to a given situation or adapt to the changes in the environment but to also to give a portent of things to come. Since Adaptive Audio has been the label on a different technology, we should stick to interactive audio so as not to confuse things further.

Hey big boy, what's the name on your box?

Let's take a look at what marketing has been has been up to. Even with all of our intellect, the insidious nature of marketing still seems to ensnare us. We know that buxom, scantily clad women are not going to flock to us because of the brand of beer we drink, but we might buy a particular brand based on this idea anyway.

Marketing is telling us that gamers will flock to a game if a recognizable band is on the box. I think id Software helped set the stage for making this idea seem reasonable with Quake. Sure, people thought it was really cool that Trent Reznor did the ambient tracks for Quake and yes it was a big seller, but I never heard anyone say they bought Quake or played it any longer because of that. I think Id deserves more credit for the success of Quake than that. How long did it take for you to turn the music off or put in your own selection of tunes? Ah what the heck, it's only money. After all, you have to spend money to make money right? So let's throw a briefcase full of it at a big name band to write some tunes for the next title (by the way don't forget where that money will come from later… your share).

This scheme would appear on the surface to be a reasonable strategy. The marketing guys (or gals) words of wisdom? "They use popular bands in Hollywood movies to get the title more notice". This conjures up an image of your game as a big screen movie with the associated notoriety and revenue. As with most schemes born from the shear will of marketing, it's only superficial. Once you peel back the top layer and shine an emotion free light of clarity on this idea we can examine the true nature of the scheme. A movie costs $8.50 to see ($8.50! when did that happen?) not $50. When was the last time you heard someone say "Gee I'm really not into RTS games but I dropped 50 bucks on it anyway because 'Big hair band' did the music"? Yeah, that's what I thought. It might sell you some soundtracks if you are lucky -- and there is nothing wrong with that -- but it won't sell games.

Let's get back to reality for a moment. Our goal is to get better audio technology to enhance the gaming experience. And I'm sorry but games are NOT like Hollywood movies. Games are interactive and provide a completely different experience and, hopefully, your audience will be involved with your product for more than 90-180 minutes. Currently, the sonic quality available to games is no different than that in a movie or a CD so what can 'Big Hair Band' really offer you? Their name? At what price?

The process of design, delivery and implementation is very different in radio, movies and games. What was the last movie you saw where one 'band' did the entire musical score? There's a reason why you can't recall any. Writing songs for an album is very different than producing a score for a movie or a game. 'Big Hair Band', has spent their entire career focused on perfecting the process of writing music for the radio, not on the very different and specialized art of composing for games or movies. What makes you think they suddenly understand the dynamics of composing the entire soundscape for your game? Besides 'Big Hair Band' probably needs to get another record out to pay back their label for making them a success, so why are they making a detour to spend the required time making game music? Perhaps they are in a slump (why would you want to use them in this case anyway?) and they want a little publicity and some cash. Of course, you'll hear no mention of any of this from those brilliant marketing guys.

And by the way, the only interactive music 'Big Hair Band' knows about is a live performance. So you're going to get some tunes from 'Big Hair Band' that are going to loop over and over and over. There is really no effective way to make it interactive, so the gamer, after hearing the same tracks repeatedly, eventually shuts the music off. At that moment they lose everything that the music was meant to bring to the game, like emersion in the environment and a vehicle for a wide variety of emotion. Most people just can't listen to the same songs repeatedly for any length of time without getting sick of them. Have you ever turned on the radio and there it was again, that damn song that was so cool last month but just got played to death? Did you change the radio station after only hearing the first few seconds of that song? Think about what you're actually going to get for your money and give to your audience for theirs. I know I'd rather have something that enhances the experience instead of some marketing hype.

Ok folks, it's safe to come out now.

So why are developers so afraid of interactive audio? Fear is most commonly associated with a lack of knowledge or understanding. Fear of what we don't understand, don't know or even what we think we know. There are plenty of reasons for developers to be gun shy about interactive audio. Some of you were around to see the failings of Adaptive Audio and everybody has heard MIDI music or played a game with MIDI music. It's left a bad taste in our collective mouths about the quality of the sound. Once it gets out of the safety of our controlled environment and into that cheapo sound card out there, as Sol Rosenberg would put it, "it don't sound so good".

Apart from a very few developers creating their own proprietary music engines, there hasn't been much in the way of interactive audio software available to play around with. You can walk into any music store and find quite a few different audio software packages and some MIDI software products,l but no interactive audio products. We've seen a few attempts at interactive audio software in different incarnations at trade shows -- you know what I'm talking about, those wacky programs that generate music at random or let you piece together riffs to form a song. Composers and developers alike hate them because they make some pretty ugly and generic 'music', but they seem to be back in a booth every year. These programs spring to mind when someone says interactive audio, which adds fuel to the fire. Due to a lack of information on the subject, a lot of developers just don't know anything about interactive audio so we are left with the bad taste of failed attempts, hearsay and misinformation as the basis of the fear.

And the winner is…
The award for the group most responsible for the lack of interactive audio goes to, "envelope please… The Composers!" This is the worst part because we are the ones in the position to drive the technology. It's our job as professionals to keep current on our craft, develop communication with manufacturers to guide the technology in the direction we want and need and inform our employers of new or better technologies. Why are we ignoring other avenues? One problem is that there are few dedicated game audio composers. Most have either been full time employees of the developer or publisher who may have other tasks like programming to deal with as well. If that's not the case, they generally have a tight schedule to adhere to and little time to futz around with new technology. Mostly, however, they work in other entertainment fields like TV, movies or radio. They are not very interested in trying to learn or help develop a technology that is useless in other industries where the audio is linear. As a side note for developers and publishers, you should consider hiring sound designers who care not only about the quality of the audio they produce but also for the advancement of the game industry. We'll all benefit from that.

Another problem is that we are so happy to finally be able to use CD quality music and sound effects in games that we've gotten tunnel vision. Right now we are enjoying this ability but we need to pay attention to where we are going, lest we get lost. Eventually we'll put our eyes back on the road only to find out that we've fallen seriously behind again. Do we, as audio content providers, want to remain hapless victims? Forced to follow along in the wake of the industry instead of helping to steer it? Hey, we all need to eat, but that is the short-term gain at the expense of the long-term benefit. I know it's easier to give the developer what he's asking for rather than try to figure out a new technology and while it may quite satisfying to hear such beautiful sounds coming from dimly lit living rooms, bedrooms and offices, but it's not the prize. Don't get me wrong, we've made great strides with the audio quality in games and there is definitely a place in almost any game for good old Redbook audio. But there is something even better down the road and we're neglecting it. The true reward is to have all of that and the ability to make it interactive.

Are we there yet?
In the past, all we had to work with were little beeps and whatnot from that tiny speaker inside the PC case. Not much need to worry about quality audio. Then came along the soundcard and you could actually play music. And so MIDI was born. Why MIDI? Well, there was no such thing as a CD and a hard drive was as big as the PC itself. MIDI was developed as a way around the hardware and software limitations of the time. MIDI was and still is a marvelous technology. While the MIDI standard was a necessary step in PC audio evolution it still sounded, well… like MIDI. This is because although MIDI became standardized, the quality of the synthesizer and instrument patches used by the device to play it back has had no such standardization. When I say "patches" I'm talking about the sounds MIDI uses to replicate instruments, also referred to as 'instrument banks'. These all vary greatly in quality from sound card to sound card. That quality is usually directly related to the price of the sound card. When you're spending big dollars on a PC, adding another $200 for a quality sound card is a lot harder to swallow than $30. As we've all heard a million times (why doesn't it sink in?), "You get what you pay for".

Other technologies have come along, but they were and still are proprietary in nature, and that puts us right back to the problem of having a myriad of sound cards in use with varying degrees of compatibility and quality. It would seem that MIDI has seen it's day as a viable solution to our interactive audio dilemma. This is unfortunate because MIDI talks to the computer in a language it can understand, making it fast, programmable and flexible enough to be interactive, and its file size is unbeatable. With the industry moving rapidly towards online gaming, file size is again a major concern. A two minute audio track at 16bit/44100hz (CD quality) will be about 20 MB (10 MB per minute of sound) try downloading ten or more tracks, as well as the game, at 56K.

Where's that Knight in shining armor?
What we need is some software company to make a program, which would allow a small, fast audio file to sound the same on every PC, with CD quality and the ability to be interactive. Then of course they'll have to give it away for free to everyone who owns or buys a PC. Right, that'll happen when Satan is wearing ice skates. Would someone please get the Lord of the Abyss a pair of leggings to go with those red figure skates? Microsoft to the rescue! (I can't believe I actually put that phrase in print). Who else could pull it off?
Microsoft has developed just what we needed: A program to ensure compatibility among sound cards, The Microsoft Synthesizer, and a program to create the audio content, DirectMusic Producer.

The Microsoft Synthesizer is a DownLoadable Sound (DLS) compatible software substitute for synthesizing hardware. Many sound cards are already DLS compatible and DLS-compatible software synthesizers are becoming available through other companies as well. So if your sound card isn't already hardware DLS compatible, the MS Synthesizer will substitute to ensure compatibility and it comes free with the DirectX API. The Microsoft Synthesizer is also installed automatically as a part of Internet Explorer so chances are pretty darn good that most PCs and all game players already have it installed on their systems.

Microsoft also produces a program called the DirectMusic Producer, which uses MIDI and the DLS standard to compose interactive audio. This too is provided free. So we now have the ability to create and implement interactive, CD quality audio at a fraction of the system resources required by linear Redbook audio. Kudos to Microsoft (now if they would only make the interface understandable to musicians, hint, hint…).

I did say CD quality didn't I?
So what is DLS you ask? DLS is a standard adopted by the MIDI Manufacturers Association in January of 1997. A DownLoadable Sound is basically a MIDI instrument created by taking a sample (a WAV file) of the sound from any source, be it a drum, a dog barking or an entire orchestra. That sound is stored in a DLS bank, which can be used exactly like and in place of the General MIDI instruments we all know and love (to hate). This means the same sounds you would currently hear in your Redbook audio tracks can be used in a MIDI composition. Instead of a two minute WAV of a violin solo taking up 20 MB of space you take a short sample of that violin sound which will most likely be less than 512K and make a DLS instrument out of it. A MIDI note then triggers that sound in the composition and the result is the same two minute violin solo at a fraction of the size. DLS combines the advantages of digital sampling with the compactness and flexibility of MIDI and functions independently from any on-board MIDI instrument sounds already in a sound card. If your sound card isn't already DLS compatible from the manufacturer, the Microsoft Synthesizer handles the processing. You simply send along the DLS collection of instruments with the MIDI composition and the song sounds the same on every PC. DirectX 8 makes use of the DLS2 standard, which adds many features. You can read more on DLS and DLS2 at the Midi Manufacturers Association website.

You might also notice that I have been using the term interactive audio and not interactive music. The reason is because a DLS instrument can be comprised of any sound, which means sound effects and voices as well as musical instruments. One of the demonstrations I saw from Microsoft was a sports game sound effect set where the crowd cheered when your team gets a hit and booed when the other team gets a hit. At the same time there was an announcer speaking, a vender hawking his wares and a general crowd ambience. All of these sounds layered on top of each other as needed by the game events without every having to switch tracks or getting the stutter you experience from loading and unloading an audio track.

Where have you been all my life?
You might wonder why, if this ability has been around since 1997, everyone doesn't use it. That's a valid question. As I pointed out earlier, the fear factor has kept developers from being interested in learning about it even if there was information easily accessible -- which there isn't. Since the DirectMusic Producer is a free program, all of the attention has been given to its creation has been in the technology and not the user interface. This means it is difficult to learn and use. Musicians are rarely programmers (although when I look around my studio I wonder how I got all of this gear to work together with three PCs) and therefore not inclined to deal with the problem solving required to figure it all out. In addition, it's not useful in other areas of the music industry, which means it's gotten little attention in the music community. Interactive audio also requires a whole new way of thinking about composing. You can't approach a composition in the traditional linear structure because changes in the game will dictate that your composition must change. If your entire life you've been taught, listened to and created music one way it takes serious dedication and focus to learn to look at audio in a completely different way. With the steep learning curve, it's difficult to justify the loss of productivity while you try to get a handle on it. Who'll pay the rent? Then, after you learn it you have to sell the developers and publishers on the technology. As a free program it generates no revenue, which means it gets no advertising funds. With little available information, it's a hard sell. It's much easier to go with what you know, and what you can sell.

Add up all of these things and you see why interactive audio hasn't taken the industry by storm. The bottom line however is that the ability to produce interactive audio is available and it's an exciting frontier for pioneering musicians and developers who are willing to explore beyond the boundaries. We owe it to our audience and ourselves to move in this direction and there is really no excuse not to be doing it. Yes it is more difficult to learn but I'm sure that learning a programming language or putting down the pencil and learning to draw with a graphics program was no piece of cake at first either. So now that you know CD quality, interactive audio is possible can you afford not to have it? Right now your competitor is thinking about it.

Copyright 2004 CMP Media Inc. All rights reserved.