Welcome to the second of a series of articles on game
technology from the XNA group at Microsoft. Last month’s article by Frank
Savage discussed features of the newly released XNA Game Studio 2.0, the XNA group’s managed
code SDK for community-built games. This month, we turn our attention to the
work the XNA team has been doing for native
code development in the area of audio technology as we discuss the new
cross-platform audio library for Windows and Xbox 360: XAudio2.
A Bit of History
A bit of history first: The year is 1995. An Intel Pentium
processor running at 133MHz is bleeding-edge and ISA is the standard PC
interface. “Google” means “10 to the hundredth power,” Bill Clinton is starting
his second term and “Braveheart” wins the Oscar for best picture. In September
of that year, Microsoft releases a new technology for Windows games called
Since then, DirectX, and its audio component DirectSound,
have allowed game programmers to write to a single API set that works across
any hardware with a DirectX-compatible driver. The DirectX model continues to
be used today and is by far the most popular native-code game development
Back when DirectX began, DirectSound provided two important
functions. First, it provided a simple software mixer, allowing a game running
on Intel 486 and Pentium-class CPUs to play and mix sounds in software. Second,
it provided a means for games to easily take advantage of any
DirectX-compatible audio cards to off-load the CPU for audio processing,
particularly 3D audio processing. The buffer metaphor used by DirectSound and
the on-board sound memory reflected the slow ISA bus and the architecture of
the sound cards of the day.
The Industry Evolves
Over the following dozen years, some major changes occurred
in game audio. First and most obvious, the power of the main CPU in PCs and
game consoles increased dramatically. In addition to the clock speed increases
from 133 MHz to 3 GHz, the nature of the CPUs themselves changed. With the
introduction of MMX and SSE/SSE2, the number of instructions per clock cycle
also increased. Audio processing is particularly well-suited to take advantage
of these parallel vector processing architectures.
The cherry on top, from a
processing-power perspective, is the move to multi-core, hyper-threaded
systems. Add it all up, and the amount of CPU processing available for audio
has increased by nearly a factor of 100 since DirectSound was first introduced.
The end result: today’s CPUs are more than capable of creating extremely
compelling game audio all on their own.
The second change in the game audio industry is the nature
of game audio itself. When DirectSound was introduced, game audio was generally
quite simplistic: when a game event happened, such as a gunshot, the game would
play a gunshot wave that was loaded into a DirectSound buffer. The game could
set pitch and volume, and also define a roll-off curve with distance by using
the DirectSound3D API—but that’s about as fancy as things got. Sixteen or so
concurrent sounds were considered plenty.
Today, sound designers and composers have moved well beyond
the simplistic notion of just playing a wave file in response to a game event. Sounds are now often composed of
multiple wave files, played simultaneously, each with their own pitch and
volume. Imagine an explosion sound that is not just a simple recording of an
explosion, but rather a combination of the initial low boom, a higher-pitched
crack, and a longer tail.
By combining these individual elements at run-time
instead of pre-mixing them into a single wave file, a game can create much more
variety in sound effects by separately varying the pitch and volume of each
component as it’s played back during the game. A more extreme example is found in
modern racing games, which can use as many as 60 waves for each car.
We’ve certainly come a long way from the “one sound equals
one wave file” notion of game sounds! In addition to composite sounds, environmental modeling—the emulation of
audio environments—is now a staple in game sound. This requires flexible
reverberation for room simulation, as well as filtering for occlusion and
Though it supported some of these new audio needs through
DirectSound’s property set extension mechanism, DirectSound’s underlying
architecture didn’t have the flexibility to support them all by itself. To meet
the changing needs of game audio developers, DirectSound was enhanced to
provide basic digital signal processing (DSP) support in the 1999 release of
DirectX 8. Although DirectSound 8 allowed a developer to add software DSP
effects to a DirectSound buffer, the overall buffer-based architecture of
DirectSound remained essentially intact.