Mixing and Surround Sound

Digital Audio Mixing
By Robert Basler
Gamasutra
December 18, 1998
Vol. 2: Issue 49

MIXING &
SURROUND SOUND
Introduction

Digital Audio Mixing

Dolby Pro-Logic Encoding

Latency Versus Underflow

Things I Learned

If you look at the specification for WAV files, you will find that the format allows a wide variety of sample sizes, sample rates, and compression techniques.

Generally, the bigger the sample size and the higher the sample rate, the better the sound quality. For reference, a standard CD player plays audio at a 44KHz sample rate with a 16-bit sample size. We are only going to handle two formats, 8-bit unsigned and 16-bit signed samples, neither using compression. These are the most common types of WAV file you will find on Intel family PC's. If you run across WAV files that use compression, you'll have to use an audio conversion utility to convert them to 8 or 16-bit uncompressed format.

WAV files are standard RIFF format files and, as such, are divided into chunks. There are a number of possible chunk types in a WAV file. We are only interested in two: the Format chunk and the Data chunk. Skip any unrecognizable chunk types.

The header chunk contains all the information you need about the audio data in the WAV file. One field of the header that didn't work as expected was the sample rate value. Rather than being nice, neat 11,000, 22,000 or 44,000 values, I found that the values in this field varied all over the place. I ended up modifying the WAV loading routine shown in listing 1 to accept ranges for the different playback sample rates the engine allows.

Chunk ID
DWORD
0x61746164
Chunk Size
DWORD
(size of
sample data
following)
Channel 0
Sample 0
Channel 1
Sample 0
Channel 0
Sample 1
Channel 1
Sample 1
Fig 1. Sample WAV Data Chunk Layout


The Data chunk contains sound data comprised of one or more channels of interleaved samples as in figure 1. In this case, I handle one or two channel WAV files. The mixing engine treats stereo WAVes as a single channel, mixing the left and right channels together before doing additional mixing and spatial positioning. To accommodate premixed stereo audio for background music and cutscene soundtracks, I added a special flag value that can be used when opening a new track. This flag activates two special mixing functions that allow stereo data to pass through the engine with volume adjustment only.

All mixing is done in a 16-bit buffer using 16-bit signed samples. In theory, we could mix in 8-bit unsigned format, but using such a limited range is just asking for overflow. Having the addition overflow while mixing is a real problem because when values go out of range, the amplifier and speakers have no way of knowing, so they try to reproduce the waveform that the data stream represents. In the best case this results in occasional pops, but in extreme cases it can generate static that will have you diving for the volume control. I chose the combination of a 16-bit buffer and volume control that lets you know the maximum volume of any effect in advance. This gives good sound quality, uses the full range of possible sample values, and reduces the chance of overflow.

Since I wanted to support 8-bit WAV files for input, and output on old 8-bit sound cards, I needed to be able to convert from 8-bit samples to 16-bit for mixing and then convert the mixed data back to 8-bit for output. You can see an assembly implementation of the 16-bit to 8-bit conversion in listing 2. To convert 8-bit unsigned samples to 16-bit signed you invert the high bit of the 8-bit sample, move that result into the high 8 bits of your new 16-bit sample and zero the low 8 bits as in figure 2. To convert 16-bit samples to 8-bit you just reverse the process: move the high 8 bits of the 16-bit sample into your new 8-bit sample and flip the high bit.

1011 0101 8-bit input
0011 0101 0000 0000 16-bit input
Fig 2. Converting 8-bit samples to 16-bit

One limitation of this sound engine is that all of the imported WAV files have to have the same sample rate. I experimented with real-time conversion of the sample rate (resampling) of WAV files for a project I did porting a speech synthesis engine to the Windows platform. Standard Windows multimedia cannot play 8KHz audio. Of course, Murphy's Law applied and the speech synthesizer was designed for 8KHz playback. I spent a week trying a variety of resampling algorithms. What I found was that the algorithms that were fast enough to work in real time introduced a whispery echo to everything they converted, and the ones that sounded good took 30 seconds to convert each second of audio. I ended up using CoolEdit to resample the speech synthesizer's source data from 8KHz to 11KHz and took a 200K memory hit. This actually turned out to be a better solution than expected since running the audio data through CoolEdit cleaned up a lot of pops and clicks in the original source audio making a better sounding synthesized voice in the end.

Another question one quickly comes up with while designing a sound mixer is what to play when there are no tracks to play and the sound card is still asking for audio data. In theory it could be sent in continuous blocks of any sample value and not get any sound output, but 16-bit sound effects usually start and end at sample values of zero. Thus, sending the sound hardware blocks of zeros would avoid the degradation of sound quality caused by a "pop" when a track starts.

Assuming all of our source WAV files are recorded at full volume, you can adjust their volume by multi-plying each sample value by the fraction of full volume at which we want the sound to be played. We use a linear scale for volume since it is easy to implement, but a logarithmic scale could easily be built on top of the linear scale.

For 8-bit samples, use a lookup table for figuring volume control. 256 values x 2 byte per value x 64 volume levels corresponds to a table 32K in size that magically converts 8-bit unsigned samples to 16-bit signed, volume adjusted samples. We calculate this table when we start the audio system as shown in listing 3. The table is arranged so that all sample values for each volume level are grouped together. This lets the assembly routines that do the mixing use the input sample to index the 512 byte table to mix the volume adjusted sample with the output.

Volume control for 16-bit samples is a bit trickier. The table technique would need 8 megabytes of RAM (65536 values x 2 bytes per value x 64 volume levels), clearly more than the average memory budget will allow. Instead I bit the bullet and did a multiply and shift for each sample. To preserve as many bits of accuracy as possible, I do a 32-bit multiply and shift with the sample in the low 16 bits of a 32-bit register. I had originally thought to use lookup tables for this routine as well, but by the time I added up all the memory lookups and adds, a simple multiply and shift came out faster.

Mixing two or more digital audio tracks is the easy part of the mixing library; Simply add the samples from each track together. Intuition might indicate that one should average the samples, but this just doesn't sound right.

In order to simplify the coding of the audio mixing routine as much as possible, I split the function into two sections, setup, and the assembly optimized mixing routine. To make the assembly section easier to code, I broke it into subfunctions, selecting the appropriate subfunction based on the input and output data formats when the audio track is started. The input formats are 8 or 16-bit mono or stereo, the output formats are 16-bit mono or 16-bit stereo, which gives eight separate assembly-optimized mixing functions. Since these functions mix stereo input channels together before mixing with the output buffer, and I wanted to allow playback of cutscenes with stereo soundtracks, I added two mixing functions for 8 or 16-bit stereo inputs that do not mix the two source channels together before they mix them with the output.

It is a sad fact of life that some computers have the left and right channels backwards. Rather than make the gamer move the speakers around, it is easy to offer the option to swap the left and right channels after mixing. This conversion is a matter of swapping the order of the two channels' samples. Listing 4 shows an assembly implementation of the algorithm. It adds a small performance hit, but is going to eliminate a few support calls. The best part is that this conversion doesn't affect surround encoding.

[In the interest of conserving editorial space, code listings are available for download here.]
Dolby Pro-Logic Encoding  Next Page