There are three main components in XAudio: source voices, submix voices, and a mastering voice.
A source voice is most analogous to a DirectSound buffer. You create a source voice when you want to play a sound. You can set parameters on a voice, such as pitch and volume, and specify the volume levels for each speaker for surround effects. You can also dynamically place arbitrary software-based DSP effects on a source voice.
Once you create a source voice, you point it to a piece of sound data in memory and play it. (This data can be in a variety of formats: PCM, XMA, or xWMA for Xbox; PCM, ADPCM, or xWMA for Windows.). A source voice can also point to no source data, if it contains software for direct generation of audio data. By default, the output is sent to the speakers via the mastering voice, but a source voice can also send its output to one or more submix voices as well.
A submix voice is much like a source voice, with two differences. First, the sound data for a submix voice is not a piece of sound data in memory, but rather the output of another source (or submix) voice. Secondly, a submix voice can have multiple inputs—each of the inputs will be mixed by the submix voice before processing.
As with source voices, you can insert arbitrary software DSP effects into a submix voice—in this case, the DSP will process the aggregate mix of all the inputs. Submix voices also have built-in filters, and can be panned to the speakers just like a source voice can. Submix voices are very useful for creating complex sound effects from multiple wave files. They can also be used to create audio submixes—for example a sound effects mix, a dialog mix, a music mix, and so on—in the way that professional mixing consoles have buses. Submix buses are also used for global effects, such as a global reverb.
The final component is the mastering voice. There is only one mastering voice, and its job is to create the final N-channel (stereo, 5.1, 7.1) output to present to the speakers. The mastering voice takes input from all the source voices and submix voices, combines them and prepares them for output. As with source voices and submix voices, software DSP effects can be placed on the mastering voice. Most typically a 5.1 mastering limiter or global EQ is inserted into the mastering voice for that final, polished sound.
The following figure shows a simple XAudio2 graph playing two sounds with an environmental reverb. The top two source voices are playing sound data to create a single composite sound that is routed to the submix voice.
From the submix voice, a 5.1 send goes to the mastering voice and a mono send goes to another submix voice that hosts a global reverb effect. 3D panning for the composite sound is performed on the first submix voice. The bottom source voice is used to play a single sound. Its 5.1 output goes to the mastering voice and also has a mono send to the global reverb. Of course, many more options are possible, but this shows a common case.
DSP effects in XAudio2 are performed using software audio processing objects (xAPOs). An xAPO is a lightweight wrapper for audio signal processing combined with a standard method for getting and setting appropriate DSP effects parameters.
Since xAPOs are cross-platform, it is easy to write software-based audio DSP effects that can be run on both Windows and Xbox 360. Typical software effects might include reverb, filtering, echo or other effects, but can also include physical modeling synthesis, granular synthesis, or any kind of wacky audio DSP you might come up with! You can write processor-specific optimizations for Xbox and Windows, but that’s not required.