DirectMusic is a complete overhaul of the way that Windows plays music. It replaces the basic code that Windows applications use to get MIDI data out of a file, through the computer, and to the output device. It's a completely rewritten and rethought system, all the way down to what noises come out and how. Has it arrived too late? Now that so many games are using Red Book CD audio and other streaming mechanisms, is DirectMusic enough to make MIDI relevant again to game developers? I think so - no matter how you currently handle music in your games, DirectMusic is definitely worth checking out.
DirectMusic starts out by addressing the major problems of Windows' old MidiOut API, such as shaky timing and limited real-time control. It offers consistent playback of custom sound sets using an open standard, Downloadable Sounds Level 1 (DLS1). On top of that, DirectMusic opens more than one door to achieving adaptive musical scores in games.
Like other SDKs from Microsoft, DirectMusic will try to cover many bases, not all of them related to games. Most observers agree that it has some great solutions for background music on web sites. Are DirectMusic's approaches relevant to game development? Its depth makes for a massive API, with hundreds of pages of documentation. Is it too complex to use on a project with a deadline?
This article is an overview of a big piece of work that is still in alpha, so don't look at it as a review. It's more of a look at what DirectMusic is and what it isn't, to help you get some idea of whether it fits your needs. The SDK should be in beta as you read this, and will be released as part of DirectX 6.1 late in the year.
History: Way Back When Down South...
Back in the late 1980s and early 1990s, there lived in Atlanta, Ga., a team of imaginative, talented music programmers called the Blue Ribbon Soundworks. They made a MIDI sequencer for the Amiga called Bars & Pipes which was so innovative that some people still keep an Amiga around just to run it.
By 1994, Blue Ribbon's main focus was a technology called AudioActive, part of which saw the light of day in music-generating programs such as SuperJam and Audiotracks Pro. AudioActive was an API and toolset that generated MIDI music performances on the fly by using data types called styles and personalities. At AudioActive's heart was a toolset for breaking compositions into their component parts and an engine for putting them back together.
To design this system, Blue Ribbon examined the way that real performers in various musical genres make the decisions that affect the progress of a piece. The system bore some conceptual similarity to musical Markov chains, in which each note has a weighted probability of going to each other note. But AudioActive was quite a bit more complex, subtle, and, in true musician form, more subjective. In some cases, it was able to create very convincing performances.
From Atlanta to Redmond
This pedigree was the first thing that I, and many other developers, heard about Microsoft's new music system. It made us skeptical from the outset. Microsoft's developer-hype literature still emphasizes DirectMusic's real-time music generation aspects to such a degree that it looks like AudioActive: The Sequel. Despite automatic music's enormous gee-whiz factor for us computer music types, I couldn't help but feel that its real-world use would simply be one more way for a skinflint producer to avoid paying for professional composition.
But over the past three years, plenty has happened. The trademark "AudioActive" is now used for an MPEG-2 audio player from Germany. Blue Ribbon Soundworks was purchased by Microsoft in late 1995, and its principals moved from Atlanta to Redmond. Its development lead, Todor Fay, continues to spend many days in trade group meetings small and large, listening to what people who make and support music for interactive products want and discussing his team's ideas.
Fay apparently doesn't like to say no. DirectMusic incorporates a truly frightening number of features, including some of the features that developers have been asking for in a music API. It includes an evolved, 100-percent rewritten version of what used to be AudioActive. It also includes hooks for replacing, adding, or modifying any component in the entire system with whatever music generator or filter you or a third party might come up with on your own. It gives applications access to MIDI and other control data in real time.
The fact remains that this open architecture was written with a certain approach to adaptive music at its core. It's an odd thing to find in a Microsoft API: a highly involved music recombiner and regenerator - neat thing, but not the right solution for everybody. Sometimes, in poking through the SDK with a different need in mind, a developer will be mystified and frustrated at some of the approaches and some of the omissions. However, DirectMusic does try to offer solutions for those who don't wish to use the System Formerly Known as AudioActive. Thus far, Microsoft's publicists have done themselves and developers a disservice by giving the impression that the interactive music engine is the core reason to look at DirectMusic. This is definitely not true.
In its alpha stage, DirectMusic is such a big package that many people's first impression is that it's just too complex to use on a typical project. One of the things I set out to do in researching this article was to see if there were reasonably simple paths to solutions for common problems buried in the over 300 pages of API documentation. Microsoft needs to do this if they want to sell DirectMusic to the game development community; as DirectMusic approaches beta (early July), the company is rewriting the documentation with this approach in mind.
DirectMusic's headlines for most people who make games are DLS support for hardware acceleration and MIDI with over a million channels and rock-solid timing. It's a big package, consisting of several major parts that operate on different levels. It's not necessary to use or even understand all of the components to make good use of parts you need.
For starters, DirectMusic replaces Windows' MidiOut technology with a new model. DirectMusic's MIDI support has subsample timing accuracy, allows flexible selection of output ports (including third-party creations), and lets applications inspect, filter, and modify MIDI data as it comes out. The release version will also multiply MIDI 1.0's 16 channels by a healthy 65,536, for a total of 1,048,576 discreet channels (called pChannels within DirectMusic).
The biggest single claim that DirectMusic has to making MIDI relevant again is its support for DLS. According to its developers, the bundled Microsoft Software Synthesizer was using 0.12 percent of the CPU per voice on a Pentium II 266MHz MMX as of late June. These numbers will get a bit worse when reverberation is added (reverberation wasn't included in the API as of this writing, but is scheduled to happen before final release). Under the Win32 Driver Model in Windows 98 and Windows NT, this is open to hardware acceleration by PCI-bus sound cards.
DirectMusic includes a Roland-made General MIDI/GS sound set. However, the really great thing about DLS is that it opens up MIDI in games to a variety of techniques for using samplers that electronic musicians have built up over the years. These range from basic wavetable-style techniques (but with any choice of sample data) to sampling entire musical phrases and triggering them via MIDI commands.
If MIDI is a dirty word for many game developers, it's not because of MIDI itself, which is simply a control mechanism and has no intrinsic sonic quality, good or bad. It's because of the inconsistent, usually low quality, fixed sample sets in the built-in synthesizer ROMs on most sound cards. DLS lets MIDI go back to being a timing, control, and note-triggering mechanism, as opposed to being a synonym for crappy-sounding game music . When MIDI is freed up to do its stuff, it can provide the granularity, malleability, and reaction time needed to make music react to what goes on in an interactive world.
As I mentioned, DirectMusic was conceptually built up from its specialized music-digesting system, the most controversial and confusing part of the SDK. This system is good at its original purpose, but that's not the whole story. It has a big side-benefit: a way to play and control segments of MIDI data, apply tempo maps and data filters, and concatenate them into other segments at musically-appropriate junctures (Figure 1). APIs such as the Miles Sound System, HMI's Sound Operating System, and DiamondWare's STK have already been doing this sort of thing (and more) under Windows despite MidiOut's limitations. All of these SDKs' developers are likely to be able to do more interesting things more reliably under DirectMusic.
Figure 1: DirectMusic Screen Shot
Segments, Tracks and Tools
Figure 2: Track Segment Structure
DirectMusic's essential playback unit is the track. Tracks are contained inside segments (Figure 2). Typical examples of tracks and segments would include:
Figure 3: MIDI Segment Structure
For those who wish to do complex things with music that can't be done with the built-in generation system, DirectMusic is built to be extended. For starters, tracks and segments are an extensible data type. Because they are the core playback unit, they will let Microsoft and third-party vendors address any fundamental complaints from developers.
DirectMusic also incorporates objects called Tools, which are intended to be easy for developers or third parties to write. These sit in what's called a tool graph, which makes all tools present cooperate with one another. A tool can operate on just one logical chunk of music (a segment) or can process the entire output. If DirectMusic catches on, expect to see scads of tools written to plug its holes, such as a MIDI channel and note mute mask, a MIDI echo, a velocity modifier, a quantizer/dequantizer, and so on.
For hardware vendors who want to extend the API to include new capabilities, DirectMusic provides a mechanism called the property set. Each of these is tied to a Global Unique ID (GUID), and each gets its own index of individual properties, indexed from 0. A given attribute index for a given GUID is always the same. For example, let's say that a developer has built an interface and drivers to hook a real siren to the parallel port. In order to integrate the device's API into DirectMusic, the developer would publish the GUID of the "DirectSiren," along with its indexed property set. An application supporting DirectSiren could then use DirectMusic's IKsPropertySet interface to see whether or not the DirectSiren's DeafeningAirRaid property is available.
Programming: A Smorgasbord of COM objects
DirectMusic consists of 24 distinct COM objects. This lets developers use only the portions they need. For example, if you just want MIDI output, you don't need to incur the overhead of DLS or the learning curve of any interactive music code.
It also means that developers can replace entire sections of the system with ones that meet their needs. The idea is to make an architecture robust enough that third-party vendors of related products and tools will have a much easier time, and won't need to reinvent the wheel in order to support the code they really want to provide. For example, Headspace is making a version of its web-based music player/generator Beatnik that integrates the DirectMusic API.
The DMS Loader
At DirectMusic's technological core lies the Loader, responsible for locating, loading, and registering objects. It was designed with low-bandwidth applications in mind, so it strives for efficiency.
To use the Loader, generally the first step is to set a search directory. This isn't required; an object can be referenced by full path name. URLs are not yet supported. Once a search directory is set, the Loader can search for objects using its ScanDirectory method and enumerate them in a database of their names and GUIDs.
The Loader's caching system relies upon this database: if an application asks for the same object twice (even in different locations), and if that object is in the database, it doesn't need to be loaded a second time. Caching is enabled for all objects by default, but can be turned off and on with the Loader's EnableCache method. For a balance between conserving RAM and avoiding repeated loads of the same object, an application must make smart use of the CacheObject, ReleaseObject, and EnableCache methods. Of course, there will be cases where caching is not good - browsing through tons of instruments in a DLS editing application, for example.
Once this database exists, an application can use the Loader's EnumObject method to show all objects of any class or classes in the database and then make an instance of the object (without duplicating data) using the GetObject method.