|
||||||||||||||||||||||
Mixing and Surround Sound |
||||||||||||||||||||||
| By Robert
Basler Gamasutra December 18, 1998 Vol. 2: Issue 49 |
If you look at the specification for WAV files,
you will find that the format allows a wide variety of sample sizes, sample
rates, and compression techniques.
The Data chunk contains sound data comprised of one or more channels of interleaved samples as in figure 1. In this case, I handle one or two channel WAV files. The mixing engine treats stereo WAVes as a single channel, mixing the left and right channels together before doing additional mixing and spatial positioning. To accommodate premixed stereo audio for background music and cutscene soundtracks, I added a special flag value that can be used when opening a new track. This flag activates two special mixing functions that allow stereo data to pass through the engine with volume adjustment only. All mixing is done in a 16-bit buffer using 16-bit signed samples. In theory, we could mix in 8-bit unsigned format, but using such a limited range is just asking for overflow. Having the addition overflow while mixing is a real problem because when values go out of range, the amplifier and speakers have no way of knowing, so they try to reproduce the waveform that the data stream represents. In the best case this results in occasional pops, but in extreme cases it can generate static that will have you diving for the volume control. I chose the combination of a 16-bit buffer and volume control that lets you know the maximum volume of any effect in advance. This gives good sound quality, uses the full range of possible sample values, and reduces the chance of overflow. Since I wanted to support 8-bit WAV files for input, and output on old 8-bit sound cards, I needed to be able to convert from 8-bit samples to 16-bit for mixing and then convert the mixed data back to 8-bit for output. You can see an assembly implementation of the 16-bit to 8-bit conversion in listing 2. To convert 8-bit unsigned samples to 16-bit signed you invert the high bit of the 8-bit sample, move that result into the high 8 bits of your new 16-bit sample and zero the low 8 bits as in figure 2. To convert 16-bit samples to 8-bit you just reverse the process: move the high 8 bits of the 16-bit sample into your new 8-bit sample and flip the high bit.
One limitation of this sound engine is that all of the imported WAV files have to have the same sample rate. I experimented with real-time conversion of the sample rate (resampling) of WAV files for a project I did porting a speech synthesis engine to the Windows platform. Standard Windows multimedia cannot play 8KHz audio. Of course, Murphy's Law applied and the speech synthesizer was designed for 8KHz playback. I spent a week trying a variety of resampling algorithms. What I found was that the algorithms that were fast enough to work in real time introduced a whispery echo to everything they converted, and the ones that sounded good took 30 seconds to convert each second of audio. I ended up using CoolEdit to resample the speech synthesizer's source data from 8KHz to 11KHz and took a 200K memory hit. This actually turned out to be a better solution than expected since running the audio data through CoolEdit cleaned up a lot of pops and clicks in the original source audio making a better sounding synthesized voice in the end. Another question one quickly comes up with while designing a sound mixer is what to play when there are no tracks to play and the sound card is still asking for audio data. In theory it could be sent in continuous blocks of any sample value and not get any sound output, but 16-bit sound effects usually start and end at sample values of zero. Thus, sending the sound hardware blocks of zeros would avoid the degradation of sound quality caused by a "pop" when a track starts. Assuming all of our source WAV files are recorded at full volume, you can adjust their volume by multi-plying each sample value by the fraction of full volume at which we want the sound to be played. We use a linear scale for volume since it is easy to implement, but a logarithmic scale could easily be built on top of the linear scale. For 8-bit samples, use a lookup table for figuring volume control. 256 values x 2 byte per value x 64 volume levels corresponds to a table 32K in size that magically converts 8-bit unsigned samples to 16-bit signed, volume adjusted samples. We calculate this table when we start the audio system as shown in listing 3. The table is arranged so that all sample values for each volume level are grouped together. This lets the assembly routines that do the mixing use the input sample to index the 512 byte table to mix the volume adjusted sample with the output. Volume control for 16-bit samples is a bit trickier. The table technique would need 8 megabytes of RAM (65536 values x 2 bytes per value x 64 volume levels), clearly more than the average memory budget will allow. Instead I bit the bullet and did a multiply and shift for each sample. To preserve as many bits of accuracy as possible, I do a 32-bit multiply and shift with the sample in the low 16 bits of a 32-bit register. I had originally thought to use lookup tables for this routine as well, but by the time I added up all the memory lookups and adds, a simple multiply and shift came out faster. Mixing two or more digital audio tracks is the easy part of the mixing library; Simply add the samples from each track together. Intuition might indicate that one should average the samples, but this just doesn't sound right. In order to simplify the coding of the audio mixing routine as much as possible, I split the function into two sections, setup, and the assembly optimized mixing routine. To make the assembly section easier to code, I broke it into subfunctions, selecting the appropriate subfunction based on the input and output data formats when the audio track is started. The input formats are 8 or 16-bit mono or stereo, the output formats are 16-bit mono or 16-bit stereo, which gives eight separate assembly-optimized mixing functions. Since these functions mix stereo input channels together before mixing with the output buffer, and I wanted to allow playback of cutscenes with stereo soundtracks, I added two mixing functions for 8 or 16-bit stereo inputs that do not mix the two source channels together before they mix them with the output. It is a sad fact of life that some computers have the left and right channels backwards. Rather than make the gamer move the speakers around, it is easy to offer the option to swap the left and right channels after mixing. This conversion is a matter of swapping the order of the two channels' samples. Listing 4 shows an assembly implementation of the algorithm. It adds a small performance hit, but is going to eliminate a few support calls. The best part is that this conversion doesn't affect surround encoding. [In the interest of conserving editorial space, code listings are available for download here.] |
|||||||||||||||||||||
| Dolby
Pro-Logic Encoding |
||||||||||||||||||||||