
Interactive
Music...er, Audio
By
Rob
Ross
Gamasutra
May
15, 2001
URL: http://www.gamasutra.com/resource_guide/20010515/ross_01.htm
Interactive
Audio.
Say this in
a crowd of developers and you'll most likely get one of these three responses:
Why is it that these two little words "interactive audio" seem to be near the top of so many developers bad_word.lst file? Doesn't it seem odd that in an industry called "interactive entertainment" one of the key components in every game lacks this interactivity? If you had the ability to add CD quality, interactive audio into your game wouldn't you want to do it? That question is akin to asking if you want tires to come with your new car. If interactive audio is something we want and need in games, then why is it in such limited use? It's not for a lack of desire; we know we need it. In my humble opinion the reason it's the exception instead of the rule boils down to one thing: Fear.
Where are we?
Until fairly recently audio has taken a back seat (the trunk may be
more accurate) to other technologies in our industry. I don't believe it's
because developers feel audio is unimportant necessarily, I hope by now
we all understand the important role audio plays in the gaming experience
but there was just never much focus directed towards the development of
audio technology because of the PCs predominant use as a business machine.
PC power and graphics technology has been driven by the business application
need rather than entertainment value. But even the television, which is
designed solely for its entertainment value, hadn't seen any significant
improvements in audio technology until the 'home theater' craze. For years,
our high-end stereo systems sat right next to the TV with little or no thought
of interconnection. Music technology has been geared towards and driven
by the aural experience alone and almost completely separated from the visual
experience, even though we all love a good concert. It's fairly easy to
see how audio ended up so far behind in its integration in a predominantly
visual industry.
Fortunately for all of us, things have been changing and audio is being
recognized as an integral part of the entertainment experience. There are
now a plethora of companies working on audio hardware -- sound cards and
high quality sound cards are very inexpensive and few PCs ship without one.
But CD quality audio had previously required far too much storage and processing
power to be used in mainstream games. Today, however, processing speeds,
RAM prices, hard drive capacity and the proliferation of the CD-ROM drive,
coupled with compression and streaming technologies have finally made the
ability to include quality audio in games a reality. Unfortunately, audio
technology is still behind the curve. Not the actual hardware technology
itself, mind you, but the integration of the latest audio technology at
a level commensurate with the rest of the industry technologies.
Actually, there is quite a bit going on in the audio industry. Manufacturers
and sound designers are exploring the latest and greatest advancements like
personal environmental audio settings, positional 3D and surround sound.
These things are important but they are not advancing the core technology,
they are only adding bells and whistles to the current technology. We now
have the ability to add CD quality audio into our games, but we need that
audio to be as interactive as the other pieces.
Indefinitus
Definition
OK, so let's define what I mean by "Interactive Audio". I
must preface this definition by telling you that this is what I perceive
the term to mean as it pertains to the interactive entertainment industry.
I do this because you won't find this term in a dictionary. In fact, even
the word "interactive" is only listed as an adjective under "interaction".
Interactive audio is a technology designed to allow specifically created
audio, placed in a given application, to react to user input and or changes
in the application environment.
Sounds simple enough, eh? Perhaps it will make more sense in an example.
Let's say you are developing a racing game where you drive through various
cities. Generally you would have a particular audio track to represent each
different city and various utility screens. These audio tracks play from
start to finish and normally loop over and over as long as the user is present
in that environment. With interactive audio you could have the music adapt
to changes in the environment. Wouldn't it be better if while passing through
China Town in San Francisco for instance, some ethnic instruments were added
to the audio track and then removed as you leave that section of the city?
Or maybe even transition the entire theme to one with an ethnic feel and
then transition back as you leave that section of the city. How about decreasing
the tempo and changing the instruments and style. You could go from a techno
sound to a cool acid jazz as you exit the city and hit the freeway. In movies
the music generally takes on a slightly different role (another difference
between our industry and the movie industry which I'll expound on below).
The intent is to create is a particular mood or atmosphere relevant to what's
happening or what is about to happen. In a perfect situation we would build
tension or suspense and then transition right into the event, guiding the
emotion of the user. The ability to do these types of things and more and
do them seamlessly -- this is what I mean by interactive audio. With the
current way game audio works the audio changes abruptly, if at all, only
at the event. The audio is incapable of being a vehicle to move the players'
emotion. The capability of being a vehicle needs to be the next step in
game audio advancement.
The term "Adaptive Audio" was coined some time in the not to distant
past to describe a method of switching audio tracks in a similar fashion,
but the problems associated with this method caused it's own demise. The
first of which being the shear number of audio files needed to accomplish
the task, which of course added to the space required to store them. Adaptive
Audio requires constant loading and unloading of large audio files which
causes a great deal of hard drive accessing, slowing the process down and
causing stuttering of the game. Also, all of those large files being loaded
and unloaded taxes the processor and memory, which the programmers object
to since these resources are needed for so called "more important tasks"
like, running the game engine and graphics. Can you blame them? Some might
say that Adaptive Audio or even Reactive Audio is a better term for what
I'm talking about, but I think that these labels do not properly convey
the idea that we want the audio to not only react to a given situation or
adapt to the changes in the environment but to also to give a portent of
things to come. Since Adaptive Audio has been the label on a different technology,
we should stick to interactive audio so as not to confuse things further.
Hey big boy, what's the name on your box?
Let's
take a look at what marketing has been has been up to. Even with all
of our intellect, the insidious nature of marketing still seems to ensnare
us. We know that buxom, scantily clad women are not going to flock to
us because of the brand of beer we drink, but we might buy a particular
brand based on this idea anyway.
Marketing is telling us that gamers will flock to a game if a recognizable
band is on the box. I think id Software helped set the stage for making
this idea seem reasonable with Quake. Sure, people thought it
was really cool that Trent Reznor did the ambient tracks for Quake
and yes it was a big seller, but I never heard anyone say they bought
Quake or played it any longer because of that. I think Id deserves
more credit for the success of Quake than that. How long did
it take for you to turn the music off or put in your own selection of
tunes? Ah what the heck, it's only money. After all, you have to spend
money to make money right? So let's throw a briefcase full of it at
a big name band to write some tunes for the next title (by the way don't
forget where that money will come from later
your share).
This scheme would appear on the surface to be a reasonable strategy.
The marketing guys (or gals) words of wisdom? "They use popular
bands in Hollywood movies to get the title more notice". This conjures
up an image of your game as a big screen movie with the associated notoriety
and revenue. As with most schemes born from the shear will of marketing,
it's only superficial. Once you peel back the top layer and shine an
emotion free light of clarity on this idea we can examine the true nature
of the scheme. A movie costs $8.50 to see ($8.50! when did that happen?)
not $50. When was the last time you heard someone say "Gee I'm
really not into RTS games but I dropped 50 bucks on it anyway because
'Big hair band' did the music"? Yeah, that's what I thought. It
might sell you some soundtracks if you are lucky -- and there is nothing
wrong with that -- but it won't sell games.
Let's get back to reality for a moment. Our goal is to get better audio
technology to enhance the gaming experience. And I'm sorry but games
are NOT like Hollywood movies. Games are interactive and provide a completely
different experience and, hopefully, your audience will be involved
with your product for more than 90-180 minutes. Currently, the sonic
quality available to games is no different than that in a movie or a
CD so what can 'Big Hair Band' really offer you? Their name? At what
price?
The process of design, delivery and implementation is very different
in radio, movies and games. What was the last movie you saw where one
'band' did the entire musical score? There's a reason why you can't
recall any. Writing songs for an album is very different than producing
a score for a movie or a game. 'Big Hair Band', has spent their entire
career focused on perfecting the process of writing music for the radio,
not on the very different and specialized art of composing for games
or movies. What makes you think they suddenly understand the dynamics
of composing the entire soundscape for your game? Besides 'Big Hair
Band' probably needs to get another record out to pay back their label
for making them a success, so why are they making a detour to spend
the required time making game music? Perhaps they are in a slump (why
would you want to use them in this case anyway?) and they want a little
publicity and some cash. Of course, you'll hear no mention of any of
this from those brilliant marketing guys.
And by the way, the only interactive music 'Big Hair Band' knows about
is a live performance. So you're going to get some tunes from 'Big Hair
Band' that are going to loop over and over and over. There is really
no effective way to make it interactive, so the gamer, after hearing
the same tracks repeatedly, eventually shuts the music off. At that
moment they lose everything that the music was meant to bring to the
game, like emersion in the environment and a vehicle for a wide variety
of emotion. Most people just can't listen to the same songs repeatedly
for any length of time without getting sick of them. Have you ever turned
on the radio and there it was again, that damn song that was so cool
last month but just got played to death? Did you change the radio station
after only hearing the first few seconds of that song? Think about what
you're actually going to get for your money and give to your audience
for theirs. I know I'd rather have something that enhances the experience
instead of some marketing hype.
Ok folks, it's
safe to come out now.
So why
are developers so afraid of interactive audio? Fear is most commonly
associated with a lack of knowledge or understanding. Fear of what we
don't understand, don't know or even what we think we know. There are
plenty of reasons for developers to be gun shy about interactive audio.
Some of you were around to see the failings of Adaptive Audio and everybody
has heard MIDI music or played a game with MIDI music. It's left a bad
taste in our collective mouths about the quality of the sound. Once
it gets out of the safety of our controlled environment and into that
cheapo sound card out there, as Sol Rosenberg would put it, "it
don't sound so good".
Apart from a very few developers creating their own proprietary music
engines, there hasn't been much in the way of interactive audio software
available to play around with. You can walk into any music store and
find quite a few different audio software packages and some MIDI software
products,l but no interactive audio products. We've seen a few attempts
at interactive audio software in different incarnations at trade shows
-- you know what I'm talking about, those wacky programs that generate
music at random or let you piece together riffs to form a song. Composers
and developers alike hate them because they make some pretty ugly and
generic 'music', but they seem to be back in a booth every year. These
programs spring to mind when someone says interactive audio, which adds
fuel to the fire. Due to a lack of information on the subject, a lot
of developers just don't know anything about interactive audio so we
are left with the bad taste of failed attempts, hearsay and misinformation
as the basis of the fear.
And
the winner is
The award for the group most responsible for the lack of interactive
audio goes to, "envelope please
The Composers!" This
is the worst part because we are the ones in the position to drive the
technology. It's our job as professionals to keep current on our craft,
develop communication with manufacturers to guide the technology in
the direction we want and need and inform our employers of new or better
technologies. Why are we ignoring other avenues? One problem is that
there are few dedicated game audio composers. Most have either been
full time employees of the developer or publisher who may have other
tasks like programming to deal with as well. If that's not the case,
they generally have a tight schedule to adhere to and little time to
futz around with new technology. Mostly, however, they work in other
entertainment fields like TV, movies or radio. They are not very interested
in trying to learn or help develop a technology that is useless in other
industries where the audio is linear. As a side note for developers
and publishers, you should consider hiring sound designers who care
not only about the quality of the audio they produce but also for the
advancement of the game industry. We'll all benefit from that.
Another problem is that we are so happy to finally be able to use CD
quality music and sound effects in games that we've gotten tunnel vision.
Right now we are enjoying this ability but we need to pay attention
to where we are going, lest we get lost. Eventually we'll put our eyes
back on the road only to find out that we've fallen seriously behind
again. Do we, as audio content providers, want to remain hapless victims?
Forced to follow along in the wake of the industry instead of helping
to steer it? Hey, we all need to eat, but that is the short-term gain
at the expense of the long-term benefit. I know it's easier to give
the developer what he's asking for rather than try to figure out a new
technology and while it may quite satisfying to hear such beautiful
sounds coming from dimly lit living rooms, bedrooms and offices, but
it's not the prize. Don't get me wrong, we've made great strides with
the audio quality in games and there is definitely a place in almost
any game for good old Redbook audio. But there is something even better
down the road and we're neglecting it. The true reward is to have all
of that and the ability to make it interactive.
Are
we there yet?
In the past, all we had to work with were little beeps and whatnot
from that tiny speaker inside the PC case. Not much need to worry about
quality audio. Then came along the soundcard and you could actually
play music. And so MIDI was born. Why MIDI? Well, there was no such
thing as a CD and a hard drive was as big as the PC itself. MIDI was
developed as a way around the hardware and software limitations of the
time. MIDI was and still is a marvelous technology. While the MIDI standard
was a necessary step in PC audio evolution it still sounded, well
like MIDI. This is because although MIDI became standardized, the quality
of the synthesizer and instrument patches used by the device to play
it back has had no such standardization. When I say "patches"
I'm talking about the sounds MIDI uses to replicate instruments, also
referred to as 'instrument banks'. These all vary greatly in quality
from sound card to sound card. That quality is usually directly related
to the price of the sound card. When you're spending big dollars on
a PC, adding another $200 for a quality sound card is a lot harder to
swallow than $30. As we've all heard a million times (why doesn't it
sink in?), "You get what you pay for".
Other technologies have come along, but they were and still are proprietary
in nature, and that puts us right back to the problem of having a myriad
of sound cards in use with varying degrees of compatibility and quality.
It would seem that MIDI has seen it's day as a viable solution to our
interactive audio dilemma. This is unfortunate because MIDI talks to
the computer in a language it can understand, making it fast, programmable
and flexible enough to be interactive, and its file size is unbeatable.
With the industry moving rapidly towards online gaming, file size is
again a major concern. A two minute audio track at 16bit/44100hz (CD
quality) will be about 20 MB (10 MB per minute of sound) try downloading
ten or more tracks, as well as the game, at 56K.
Where's
that Knight in shining armor?
What we need is some software company to make a program, which
would allow a small, fast audio file to sound the same on every PC,
with CD quality and the ability to be interactive. Then of course they'll
have to give it away for free to everyone who owns or buys a PC. Right,
that'll happen when Satan is wearing ice skates. Would someone please
get the Lord of the Abyss a pair of leggings to go with those red figure
skates? Microsoft to the rescue! (I can't believe I actually put that
phrase in print). Who else could pull it off?
Microsoft has developed just what we needed: A program to ensure compatibility
among sound cards, The Microsoft Synthesizer, and a program to create
the audio content, DirectMusic Producer.
The Microsoft Synthesizer is a DownLoadable Sound (DLS) compatible software
substitute for synthesizing hardware. Many sound cards are already DLS
compatible and DLS-compatible software synthesizers are becoming available
through other companies as well. So if your sound card isn't already
hardware DLS compatible, the MS Synthesizer will substitute to ensure
compatibility and it comes free with the DirectX API. The Microsoft
Synthesizer is also installed automatically as a part of Internet Explorer
so chances are pretty darn good that most PCs and all game players already
have it installed on their systems.
Microsoft also produces a program called the DirectMusic Producer, which
uses MIDI and the DLS standard to compose interactive audio. This too
is provided free. So we now have the ability to create and implement
interactive, CD quality audio at a fraction of the system resources
required by linear Redbook audio. Kudos to Microsoft (now if they would
only make the interface understandable to musicians, hint, hint
).
I did
say CD quality didn't I?
So what is DLS you ask? DLS is a standard adopted by the MIDI Manufacturers
Association in January of 1997. A DownLoadable Sound is basically a
MIDI instrument created by taking a sample (a WAV file) of the sound
from any source, be it a drum, a dog barking or an entire orchestra.
That sound is stored in a DLS bank, which can be used exactly like and
in place of the General MIDI instruments we all know and love (to hate).
This means the same sounds you would currently hear in your Redbook
audio tracks can be used in a MIDI composition. Instead of a two minute
WAV of a violin solo taking up 20 MB of space you take a short sample
of that violin sound which will most likely be less than 512K and make
a DLS instrument out of it. A MIDI note then triggers that sound in
the composition and the result is the same two minute violin solo at
a fraction of the size. DLS combines the advantages of digital sampling
with the compactness and flexibility of MIDI and functions independently
from any on-board MIDI instrument sounds already in a sound card. If
your sound card isn't already DLS compatible from the manufacturer,
the Microsoft Synthesizer handles the processing. You simply send along
the DLS collection of instruments with the MIDI composition and the
song sounds the same on every PC. DirectX 8 makes use of the DLS2 standard,
which adds many features. You can read more on DLS and DLS2 at the Midi
Manufacturers Association website.
You might also notice that I have been using the term interactive audio
and not interactive music. The reason is because a DLS instrument can
be comprised of any sound, which means sound effects and voices as well
as musical instruments. One of the demonstrations I saw from Microsoft
was a sports game sound effect set where the crowd cheered when your
team gets a hit and booed when the other team gets a hit. At the same
time there was an announcer speaking, a vender hawking his wares and
a general crowd ambience. All of these sounds layered on top of each
other as needed by the game events without every having to switch tracks
or getting the stutter you experience from loading and unloading an
audio track.
Where
have you been all my life?
You might wonder why, if this ability has been around since
1997, everyone doesn't use it. That's a valid question. As I pointed
out earlier, the fear factor has kept developers from being interested
in learning about it even if there was information easily accessible
-- which there isn't. Since the DirectMusic Producer is a free program,
all of the attention has been given to its creation has been in the
technology and not the user interface. This means it is difficult to
learn and use. Musicians are rarely programmers (although when I look
around my studio I wonder how I got all of this gear to work together
with three PCs) and therefore not inclined to deal with the problem
solving required to figure it all out. In addition, it's not useful
in other areas of the music industry, which means it's gotten little
attention in the music community. Interactive audio also requires a
whole new way of thinking about composing. You can't approach a composition
in the traditional linear structure because changes in the game will
dictate that your composition must change. If your entire life you've
been taught, listened to and created music one way it takes serious
dedication and focus to learn to look at audio in a completely different
way. With the steep learning curve, it's difficult to justify the loss
of productivity while you try to get a handle on it. Who'll pay the
rent? Then, after you learn it you have to sell the developers and publishers
on the technology. As a free program it generates no revenue, which
means it gets no advertising funds. With little available information,
it's a hard sell. It's much easier to go with what you know, and what
you can sell.
Add up all of these things and you see why interactive audio hasn't
taken the industry by storm. The bottom line however is that the ability
to produce interactive audio is available and it's an exciting frontier
for pioneering musicians and developers who are willing to explore beyond
the boundaries. We owe it to our audience and ourselves to move in this
direction and there is really no excuse not to be doing it. Yes it is
more difficult to learn but I'm sure that learning a programming language
or putting down the pencil and learning to draw with a graphics program
was no piece of cake at first either. So now that you know CD quality,
interactive audio is possible can you afford not to have it? Right now
your competitor is thinking about it.
http://www.gamasutra.com/resource_guide/20010515/ross_01.htm
Copyright © 2004 CMP Media Inc. All rights reserved.