Welcome to the second of a series of articles on game technology from the XNA group at Microsoft. Last month’s article by Frank Savage discussed features of the newly released XNA Game Studio 2.0, the XNA group’s managed code SDK for community-built games. This month, we turn our attention to the work the XNA team has been doing for native code development in the area of audio technology as we discuss the new cross-platform audio library for Windows and Xbox 360: XAudio2.
A bit of history first: The year is 1995. An Intel Pentium processor running at 133MHz is bleeding-edge and ISA is the standard PC interface. “Google” means “10 to the hundredth power,” Bill Clinton is starting his second term and “Braveheart” wins the Oscar for best picture. In September of that year, Microsoft releases a new technology for Windows games called DirectX.
Since then, DirectX, and its audio component DirectSound, have allowed game programmers to write to a single API set that works across any hardware with a DirectX-compatible driver. The DirectX model continues to be used today and is by far the most popular native-code game development environment.
Back when DirectX began, DirectSound provided two important functions. First, it provided a simple software mixer, allowing a game running on Intel 486 and Pentium-class CPUs to play and mix sounds in software. Second, it provided a means for games to easily take advantage of any DirectX-compatible audio cards to off-load the CPU for audio processing, particularly 3D audio processing. The buffer metaphor used by DirectSound and the on-board sound memory reflected the slow ISA bus and the architecture of the sound cards of the day.
Over the following dozen years, some major changes occurred in game audio. First and most obvious, the power of the main CPU in PCs and game consoles increased dramatically. In addition to the clock speed increases from 133 MHz to 3 GHz, the nature of the CPUs themselves changed. With the introduction of MMX and SSE/SSE2, the number of instructions per clock cycle also increased. Audio processing is particularly well-suited to take advantage of these parallel vector processing architectures.
The cherry on top, from a processing-power perspective, is the move to multi-core, hyper-threaded systems. Add it all up, and the amount of CPU processing available for audio has increased by nearly a factor of 100 since DirectSound was first introduced. The end result: today’s CPUs are more than capable of creating extremely compelling game audio all on their own.
The second change in the game audio industry is the nature of game audio itself. When DirectSound was introduced, game audio was generally quite simplistic: when a game event happened, such as a gunshot, the game would play a gunshot wave that was loaded into a DirectSound buffer. The game could set pitch and volume, and also define a roll-off curve with distance by using the DirectSound3D API—but that’s about as fancy as things got. Sixteen or so concurrent sounds were considered plenty.
Today, sound designers and composers have moved well beyond the simplistic notion of just playing a wave file in response to a game event. Sounds are now often composed of multiple wave files, played simultaneously, each with their own pitch and volume. Imagine an explosion sound that is not just a simple recording of an explosion, but rather a combination of the initial low boom, a higher-pitched crack, and a longer tail.
By combining these individual elements at run-time instead of pre-mixing them into a single wave file, a game can create much more variety in sound effects by separately varying the pitch and volume of each component as it’s played back during the game. A more extreme example is found in modern racing games, which can use as many as 60 waves for each car.
We’ve certainly come a long way from the “one sound equals one wave file” notion of game sounds! In addition to composite sounds, environmental modeling—the emulation of audio environments—is now a staple in game sound. This requires flexible reverberation for room simulation, as well as filtering for occlusion and obstruction effects.
Though it supported some of these new audio needs through DirectSound’s property set extension mechanism, DirectSound’s underlying architecture didn’t have the flexibility to support them all by itself. To meet the changing needs of game audio developers, DirectSound was enhanced to provide basic digital signal processing (DSP) support in the 1999 release of DirectX 8. Although DirectSound 8 allowed a developer to add software DSP effects to a DirectSound buffer, the overall buffer-based architecture of DirectSound remained essentially intact.