Game Mixing: The Next Five (to Ten) Years
I was going to call this section 'The Next Five Years", but looking back over the last five years, I am reminded how glacial progress seems to be in these areas. If all this stuff happened in the next three years, our jobs as sound designers would be awash with exciting new possibilities and endless high quality sonic possibilities, but wait... slow down, slow down....
I would like to explore some of the areas where I see video game mixing heading in the next few years. There are certainly a great many opportunities, and many ways to go about integrating new features and techniques. In many respects it is the types of games themselves that will push these requirements. The richer the visuals and the more control over visuals that is gained over the coming years, we will see an undoubted increase on the focus of sound.
Establishing a Reference Listening Level for Games
Work is already underway in this area. For some years now it has been impossible to know the recommended reference listening level to mix a game at. It could be either 85dB, the same as that of theatrical movie releases, or 79dB, the same as DVD remixes or TV, or just match the output of a competitive game.
The first is designed specifically for films to be heard in a theatrical context, the second is designed to reconfigure the mix to a home environment, specifically to allow dialogue, that while in a theatre and played loud can be clearly heard, is lost and less audible in the home environment. Typically there is slightly less dynamic range in a home entertainment mix for this reason, and a great deal of dynamic range in a theatrical mix, but these two mixing systems depend solely on the playback levels of the content being either 85 or 79.
Common sense would suggest that games should match the same output levels of DVD movies. However, games tend to have much longer moments of loudness, or action, in them than movies, which typically have a story dynamic of dialogue, action, dialogue. With racing games or action games in particular, the narrative dynamic is far more intense for longer, and so it is arguable that 79dB could be established for game reference listening levels.
The higher the reference listening level, the more dynamic range and quieter certain sounds will be to achieve dramatic effect. The lower the level, the louder the output levels will be. Currently, games are incredibly loud and very mismatched in terms of output levels -- not only from console to console, but from game to game. Even games released by the same studio have inconsistent output levels.
As mentioned before, there is even internal inconsistency within the same game of differing levels between cinematics and in-game levels to contend with. Once a recommendation for a standard is published, it will be much easier to know where the output levels of the game need to be.
Enrichment of Software Tools, Both Third-Party and Proprietary
In-house tools and third-party solutions will solidify on a basic feature set that is solid, robust and reliable enough to ship many games. It is onto this basic core feature set that additional systems and add-ons will be developed. Audiokinetic's Wwise has a particular focus for interactive mixing technology and has already proven a solid basic mixing structure with its bus ducking and bus hierarchy.
The more enhanced and developed that third-party tools become, the more pressure there is on in-house tools to compete with these solutions and to have the same, if not more features. This subsequent climate then puts pressure on in-house technology to be more agile and versatile, which in turn results in further innovation, eventually spreading out to the wider industry.
I asked Simon Ashby, product director of Wwise at Audiokinetic, to share some of his thoughts on both the limitations and future directions of mixing for video games:
"The barrier for high quality mix in games is mainly caused by the fact that we are still not really good with storytelling in our games. We have trouble mapping and controlling the emotional response of the player in order to pace the story with the right intensity progression and a larger emotional palette. We still ask the end user to execute repetitive actions and because of this, several games end up offering a monotonous experience. As long as games are produced this way, the mix quality will remain inferior to that which films achieve, despite the quality of the tools.
Wwise offers both active and passive systems for the developer. Passive mixing is achieved by effects such as the peak limiter or auto-ducking system. The active system on the other hand is represented by the 'state mechanism', which operates like mixer snapshots with custom interpolation settings between them. The event system also offers an active system with a series of actions such as discreet volume attenuation, LPF and effect bypass, and these can be applied to any object in the project.
Video game mixes have further complexities, as the game experience can last between four to 10 times longer than movies and they have far more unpredictable assets to mix. The main complexity remains the interactivity, where the mixer has to take into account various different styles of gameplay; the soundtrack emerging out of a single game played by a Rambo-kamikaze gamer is way different than the one from a stealth type of gamer even though it is the same game using the same ingredients.
In terms of new mixing features for Wwise, we usually don't reveal our mid and long term plans since we cannot vouch for the future. That said, you can be sure that we have a series of new features in our roadmap covering both passive and active systems that will help bring mixing technology for games to a previously unforeseen pro level."
Dedicated Mix Time at the End of Production
In many cases, the mix of a game is a constant iterative process that goes on throughout the entirety of development, with perhaps some dedicated time at the end of the project to make final tweaks. That amount of time is often very short, due to the proximity of the game's beta production date to the gold master candidate date, but I expect that time to get bigger as the quality of mixing tech and the understanding of sound needing to iterate after design and art have finished tweaking is better understood.
Taking games off-site to be mixed, or to dedicated in-house facilities is also an area that will increase the amount of time needed to complete a mix. Who should and who shouldn't attend the mix is also not currently fully understood. One thing is for sure, though: it is often a minimum two-person job. Lots of questions and doubts come up in a mix, is something too loud, is it loud enough, and to be able to bounce these questions off another set of ears is very important as a sanity check.