Introducing the Interactive XMF Audio File Format
May 29, 2003 Page 1 of 2
The creation and implementation of interactive audio elements is frequently frustrating and stressful. The programmers sometimes wind up making decisions that the audio artists should be making, and worse, sometimes the audio artists wind up [gulp] programming!
Referring to this situation, game composer Michael Land said, "Back in 1995, I was talking with The Fat Man at GDC about the challenges of creating and implementing good game audio, and the metaphor that really summed it up for me was 'good fences make good neighbors'." What Land meant was that when the programmers get to write code and don't have to make artistic decisions, and when sound artists can focus their attention on creating great music and sound design without having to write code, the result is a harmonious union of technological and artistic efforts. This is how truly great game audio is produced.
Interactive XMF is an emerging standard that will help put creative and technical control in the hands of the right people. Developed by the MIDI Manufacturers Association (MMA), XMF (eXtensible Music Format) is a new, low-overhead, meta file format for bundling collections of data resources in one or more formats into a single file. The Interactive Audio Special Interest Group (IA-SIG) has taken on the task of defining an Interactive XMF file format, and is also overseeing the development of a platform-independent software entity called a Soundtrack Manager that utilizes this format to facilitate the development and implementation of highly interactive audio.
The Evolution of Game Audio
In the beginning there was "beep". Some programmer wrote that one. It was well implemented, too. It did the job. It helped make that dot hitting those paddles seem more realistic.
Then came two "boops", a "beep", and a "pffft", what game composer David Govett used to call "three string players with bad tone and a really good one-armed drummer." And that led to actual music composition, sometimes even by composers, but also frequently done by someone who could both compose and program, a.k.a. the "comprogrammer" or "programposer". But in all cases, a programmer was needed to code the music. Game music in those days was usually delivered as note lists or on manuscript paper. This resulted in the composer being completely at the mercy of the programmer (a scary notion). But the upside for the composer was that the skills he needed to create this music were solely conventional composing skills and the ability to work within very specific parameters. A composer needed no special software or recording gear. The boatload of labor involved in getting the sound into the game was all done by the programmer.
These days game composers and sound designers (which I'll call "audio artists") create linear audio using (more or less) the same methods that are used for film, television, and music CDs. This has been the case since the early '90s. But lately the quantity and fidelity of these linear audio snippets has grown to the point that audio artists no longer feel restrained by game technology. WAV files and their like have, for the most part, replaced the need to deliver audio work via manuscript paper, various flavors of MIDI-related formats, and console-specific formats. As a result, audio artists are breathing a collective sigh of relief because they can now be fairly confident that their work will play back within games as they intended it to sound. Quite a bit of programming effort and attention has been devoted to achieving this audio consistency and realism.
But consistency and realism were relatively easy to achieve - a linear path of progression could be defined and followed. Start with "beep" created by a programmer and end with something that sounds like it came off a new CD. The end product, CD-quality audio, was a known quantity. Everyone knew where they were going, and everyone went there. The same cannot be said about interactive audio implementation. There is no existing example in any other field that can be cited as a goal.
But the development of tools to implement non-linear audio has received little attention, and what work that has been done in this area has primarily been proprietary. So in one sense, the evolution of the art of composing music for games has barely begun. Game composer George Sanger, a.k.a. The Fat Man, put it this way: "With all respect to the creators of great innovative interactive scores, I feel confident in saying that our industry shows generally a sense of feeling its way through the dark passages and unlit corridors and dim tunnels and other such analogies of these, our early years. To get any sense of what great sound can be made, one would have to cross the line into 'legitimate music' and read the writings of John Cage. There is no equivalent in this industry's body of literature to Schoenberg's analyses of melodic writing, of repetition and variation, of surprise and satisfaction, that takes into even slight account what happens to music when a twelve-year-old boy is constantly shuffling the pages of the score."
In every game that's ever been made, snippets of linear audio have been triggered by game related events, and sometimes vice-versa. On the surface, coming up with a method for implementing interactive audio seems like fairly simple and straightforward. But it's proven to be a tough nut to crack. A handful of game development houses have spent large amounts of time and money creating very nice proprietary tools let their in-house audio teams implement interactive audio in games. The Lucas Arts iMUSE system is one example. But three or four people only ever used iMUSE, and at this point it is no longer that company's main audio implementation tool.
George "The Fat Man" Sanger
Sanger says, "This situation is reflected to greater and lesser degrees at a handful of studios, some large, some small, always more or less following the same path, always more or less winding up in the same tar pit. The path goes like this: The development team realizes that there is a problem with audio in that, fundamentally, there is no system or tool to implement [interactive audio]. They look around, maybe, to see if a commercial tool exists, and it doesn't. They build their own [tools] without leveraging any of the work that's been done a dozen times before, without experiencing any of the benefits of the other million dollars and ten years that have been invested in this issue. Once their tool is made, a few composers experience the benefits of the tool (after what certainly would be a hellish debugging period - the reader should pause to reflect on just how bad this might be), make a few games, and leave the company. Regardless of his background, the next composer needs to be trained or his work cannot benefit the games that this company is making. And because the company's administrators insist that the tool be proprietary, the game producers live in a constant state of frustration that nobody outside the company can be trained in its use and there is always a shortage of qualified sound designers and interactive composers."
At the Game Developers Conference this year, some solutions to this problem were finally presented. At least four new audio integration tools were shown by Microsoft Xbox, Sony, Creative Labs, and Sensaura. Unfortunately, only one of them is cross-platform though, and there still exists no standard file format to let these tools exchange information and to leverage audio artist and audio programmer experience.
The Evolution of Interactive XMF
In the beginning there was RMF. That was the proprietary file format that the Beatnik audio engine used, primarily for web audio applications. It was packaged as the Beatnik Player, a plug-in for web browsers. Because the folks at Beatnik wanted an open standard replacement and a next generation version of the RMF file format to get into other markets (such as the mobile device market, which is standards-based), they proposed an XMF working group to the MIDI Manufacturers Association (MMA). Beatnik had two reasons for choosing the MMA:
- All of the applications that they envisioned at the time used standard MIDI files
- The MMA had experience defining such a standard, as it had developed the open-standard DLS formats for portable Wavetable instrument definitions.
The MMA working group, which consisted of representatives from Beatnik, IBM, Sun, Line 6, Yamaha, and many others, expanded on the basic concept of RMF, incorporated existing open technologies and invented some new ones, and called the result XMF. They created a more flexible file structure, made the Metadata system more robust and flexible, and created a standard mechanism for block operations like resource encryption and data compression. Chris Grigg of Beatnik, who is the father of XMF, explains, "When you put all that together you basically have a container technology that can be the basis for any standardized or even proprietary file format. It's like a file format construction kit."
At the time it was developed, there were two immediate needs that XMF addressed. The first was the need to replace RMF with a format that combined MIDI scores and custom instruments, so that audio sounded exactly as the composer intended. The second need that XMF initially addressed was that of providing an open standard format for web applications and mobile devices. Developers and the Internet/open-source community wanted this so they could write their own implementations.
The MMA published the XMF specification in October of 2001. Since then, several companies have adopted the technology for their own proprietary file formats. For example, Creative Labs used it in their interactive audio tool, ISACT. These implementations are really using just the container technology part of XMF. The other part of XMF, standardized file formats, has taken longer - standards efforts typically do. But now the pot is starting to boil. And certainly one of the most interesting applications bubbling to the surface now is the development of IXMF by the IA-SIG.
The notion of a standard file format for audio integrator tools started as a burr in the britches of a couple of legendary game composers, Michael Land and George Sanger. A few years back at Project Bar-B-Q (the annual think-tank event for game audio), Land and Sanger discussed the grim situation created by proprietary integrator tools. "That discussion led us to see that there was a single element missing that might allow the destructive pattern of the past to change. The missing link was identified as a standard file format that would contain sounds, compositions, and rules of interactivity," says Sanger.
In subsequent years, Project Bar-B-Q work groups focused on integrator tool issues. But it wasn't until 2001 that a standard file format was formally addressed. That was the year that the XMF specifications were finalized. On the first workgroup day of Project Bar-B-Q that year, Chris Grigg presented XMF to the attendees, and afterwards discussed it with Sanger. For Sanger, the planets aligned that day. It became apparent to him that XMF could be used as the basis for the interactive audio standard file format for which and Land had been pining all those many years. Fortunately, Grigg had planned that for XMF all along, and he designed the file format accordingly. As a result, the initial development moved at lightning speed.
"After spending a day and a half in some group or other [at Project Bar-B-Q], I had a dream about how XMF could be used for interactive applications," said Grigg. "So I went off and formed what's called a 'rogue group' and was joined by some fairly amazing people, like Larry the O, The Fat Man, Rob Rampley, Bob Starr and Steve Horowitz. In less than a day, we banged out a rough concept and mocked up an editor. On the strength of that work, an IA-SIG working group was formed, and we've been going ever since."
The members of the IXMF IA-SIG working group (IXWG) consist of game developers, tool developers and audio artists. The members are Chris Grigg, George Sanger, Martin Wilde, Linda Law, Michael Land, Peter McConnell, Brad Fuller, Kurt Heiden, Ron Kuper, Clint Bajakian, Guy Whitmore, Peter Clare, Brian Schmidt, Andrew Ezekiel Rostaing, Steve Horowitz, and Alistair Hirst.
"Work on the IXMF specification is not complete, but it's getting close," says Grigg. "The IA-SIG working group is reviewing a detailed design that we completed over the winter and we should have something for developers and artists to look at this Fall. The spec mainly focuses on the file data format, but more importantly, implicit in that is a model for an advanced, data-driven, run-time soundtrack manager."
How IXMF Files Are Used
It's probably easier to understand the IXMF file format after first gaining an understanding of the system in which it is used. So here we go.
There are three aspects to game audio that must somehow work together in order for everything to function in a game as the designers and audio artists intended: the platform, the game (or audition application or editor), and audio content. With IXMF, the audio content is bundled with all of the information that describes how that content is to be used in the game.
Since IXMF is a cross-platform solution, some platform-independent middleware is needed. This middleware is called the Soundtrack Manager. The Soundtrack Manager manages the performance of the soundtrack and all of the audio content resources that combine to create the soundtrack. The Soundtrack Manager can be specific to a single game, group of games, development house, ad infinitum. It supports the same advanced interactive audio feature set on any platform, while also allowing access to platform-specific features.
The Soundtrack Manager receives high-level requests for interactive audio services from the game and handles them by coordinating the operation of multiple, platform-specific, low-level media players. It supplies these players with sound media stored in the IXMF media files, and controls the players via a small set of simple audio commands that are passed to system-specific Playback APIs via an Adapter Layer. It can also send information back to the game via callbacks or shared variables.
For each platform that will host the game, an Adapter Layer for that platform must be written to communicate between the Soundtrack Manager and the platform's native APIs. So the Adapter Layer code is platform specific, while the Soundtrack Manager code and audio content are platform independent.
At this point, some terms that are used in conjunction with IXMF should be defined. These are "media chunk", "cue request", and "cue". A media chunk is any piece of playable media data. It can be an entire audio file, a defined contiguous region of an audio file, a Standard MIDI File, or a defined contiguous region within a Standard MIDI file. The continuous soundtrack is built by stringing media chunks together, and sometimes by layering them. A cue request is an event that the game signals to the Soundtrack Manager, and to which the Soundtrack Manager responds with a corresponding action designed by the audio artist at authoring time. That action is called a cue. A cue can contain any combination of services or operations that the Soundtrack Manager can perform. In most cases a cue will contain a playable soundtrack element but it may also be used to perform other Soundtrack Manager functions that don't result in something audible, such as setting a variable, loading media, or executing a callback to the game.
The Soundtrack Manager controls the audio playback by providing, at a minimum, the following functionality in response to cue requests:
- Responding to game sound requests by playing appropriate sound media, sometimes influenced by game state
- Constructing continuous soundtrack elements from discrete media chunks, whether via static play lists or dynamic rules
- Dynamically ordering or selecting which media chunks get played, sometimes influenced by game state, sometimes to reduce repetition
- Mixing and/or muting parallel tracks within media chunks
- Providing continuous, dynamic control of DSP parameters such as volume, pan, and 3D spatial position, sometimes influenced by game state, sometimes to reduce repetition
- Controlling how media is handled, including how it is stored and how it is played back
- Handling callbacks.
While a game is running, the flow will go something like the following. An event will occur or a condition will arise in the game, and the game will recognize that it needs to send a cue request to the Soundtrack Manager. The Soundtrack Manager will access the appropriate playable sound media, along with its interactivity data, and play the media according to its artist-specified playback parameters. It does this by passing instructions to the Adapter Layer, which will in turn pass instructions through to the playback API. The interactivity data associated with the media that just played may also include instructions for the Soundtrack Manager to pass data back to the game, which the obedient Soundtrack Manager dutifully performs.
Page 1 of 2