Film Standards: Internal Consistency
The notion of consistency in levels from the beginning of a film to the end may go without saying. However, in addition to the overall output level of the in-game sound, there is a great deal of internal consistency that game developers need to pay more attention to.
More often than not, in-game cutscenes will have different output levels from those of the in-game sounds, perhaps because they were outsourced and came in at different levels, at the last minute. Not only this, but they may also have completely different surround sound configuration or routing, choosing to send in-game dialogue to left and right speakers, while in the cutscenes, dialogue comes from the centre speaker only.
In film, this would be like reel one having different dialogue levels to reel two, and reel three having dialogue panned left and right, rather than center. Having an internal consistency to the mix of a video game, through all the different modes of storytelling and gameplay, is one of many areas that a full in-game mix pass on the content at the end of production can greatly improve.
It takes time, metering, a calibrated reference level listening environment, a good ear and very careful attention to detail to do this effectively.
Above: An example of internal inconsistency in a video game: This 5.1 surround sound waveform from a recent next-gen title shows the Left, Right, Center, LFE, Left Surround and Right Surround channels from top to bottom over time from left to right -- the first section is a cinematic cutscene, the middle (louder) section an in-game battle, followed by a second cinematic cutscene.
Having examined some of the basic techniques and concepts of film mixing and how they can be mapped over to video game mixing, lets now examine some of the features that are currently specific to mixing video games.
On top of the basic mixing features that linear motion-picture dubbing mixers use, there are vast arrays of completely different techniques that are exposed by video game mixing engines. Some parameters that movie sound mixers don't get access to are the fall-off curves for sounds over 3D distance. These fall-off values, often also called 'max min distance', form as much a part of mixing as volume does.
In film, mixers are always trying to 'fake' the distance a sound is from the camera by using volume automation; with the fall-off values in a game, once these are tuned the volume raises and lowers itself automatically based on the position of the listener relative to the sound emitting object in the 3D space.
Other parameters that are automatically set up for 3D sounds in games are the amount of the sound that is sent to an environmental reverb depending on distance from the listener, again something that film mixers have to carefully fake via automation to aux reverb channels.
At this point it is probably wise to break out two distinct branches of mixing technology and techniques that have emerged, 'passive' and 'active'. Garry Taylor, Audio Manager at Sony Cambridge, usefully delineated these two categories at a lecture he gave at the Develop conference in 2008.
Passive Mixing Techniques
Those values which, once set-up, attenuate parameters, volumes or filters of the content 'automatically', I think of this as 'auto-mixing'. These techniques can either duck given channels by a particular fixed amount each time an event occurs, or can 'read' the volume amounts that are being played through a particular channel and attenuate the other channels by an equivalent amount (similar to side-chain compression used in radio broadcasts).
3D volume fall-off curves of positional sounds and occlusion filtering settings of 3D sounds all factor into passive mixing too. Passive mixing can produce more subtle volume attenuation, depending on how drastic the attenuation values are set and is, at its core, the setting up of rules and parameters that allow the system to react and work within those rules. These systems often get implemented and work well in first person shooter games, which have a single, unchanging point-of-view for the entire game.
Active Mixing Techniques
This describes systems which allow greater control over sound parameters and the ability to completely override a passive system for a specific moment in time. These overrides often take the form of mixer snapshots in which parameters at the channel or bus level are redefined and then returned to normal once the event has finished.
Using these systems, a particular sound, or group of sounds, can be made deliberately louder or quieter at a particular moment, and this allows sound designers the ability to make artistic decisions about sounds outside of a passive system.
This allows for a very different point-of-view to be articulated from that defined by a passive system; changing filtering, pitch, reverberation, DSP values, or moving the listener position and changing any fall-offs outside of a notion of objective reality and more in line with cinematic sound design. These techniques are more likely to appear in games that seek to emulate 'Hollywood' sound design or have special 'sound moments'.
Some developers only need to use passive systems, and this is usually because their games only require a single point-of-view that has no need for any kind of overrides to their model of 'reality'. However, all games are different, and a combination of a passive system with an active system offers deeper creative control and opportunity for aesthetic use of sound in video games: the loudest sound isn't necessarily always the most important.
This is not to say that a passive system cannot be artistically tweaked or weighted, but that there exist greater possibilities for sound perspective manipulation and design by introducing elements of an active system.
Video game mixing then, can make use of a combination of both of these systems in real-time: systems of carefully defined rules and parameters (passive), as well as deliberate overrides of the mix for 'sound moments' based on special game events (active).