A large part of the consistency puzzle for the mix was on the voice content side, and this was solved by choosing a single dialogue director / studio for all our major story voice assets. Rob King at Green Street Studios did an incredible job at championing a consistent direction, approach and energy level from the actors, not to mention the invaluable technical consistency of using the same mics and pre's wherever we recorded.
This "pre-pre-mix" work was the ideal foundation upon which to get the dialogue sounding absolutely consistent and dynamic from in-game to cutscene (Rob handled both story and in-game dialogue).
All the non-radio-processed dialogue was dropped straight into the game from the recording sessions -- no EQ, no normalization, just as clean as we heard it in the sessions. Rather than having all the dialogue maximized and compressed before it went into the game, we decided to simply mix around the dynamic and RMS levels of the VO recordings using the in-game mixing engine, without the need of obliterating the WAV assets with digital limiters.
This also ensured that the dynamic range was consistent in the recordings and it is something that I feel comes through clearly in the game. This approach catered well to the horizontal consistency across the dialogue within the whole game.
Because of this, whenever we recorded a pickup session, the new content sounded as though it had been a part of the exact same original session. Radio processed dialogue was handled in-house via the dialogue build pipeline (fig 1), which incorporated VST chain processing as part of the dialogue build.
This also helped ensure consistent VST presets were used across in-game and cutscene dialogue (as I could easily use the exact same speakerphone etc presets while building the cutscene sessions up as the ones i'd used in the dialogue tools), and also ensured that all localized assets had exactly the same processing as the English version, this way there was also no pressure on our localization team to re-create the sound of our plug-in processing chains on their localized delivery.
Fig 1. The VST Radio Processing Pipeline in our dialogue tools allowed consistency between not only cutscenes and in-game dialogue (by using the same presets), but also consistency between all the localized assets and the English. (Click for larger version)
During the final mix, a lot of the dynamic mixing in the game is focused on getting the mid-range sound effects, and to a lesser extent music, out of the way of pertinent mission-relevant dialogue. In terms of vertical consistency, we employed a quickly implementable three-tier dialogue ducking structure.
For this, we had three different categories of dialogue ducking for different kinds of lines in different contexts. Firstly, a "regular" dialogue duck, simply called "mission_vo", which reduced fx and music sufficiently to have the dialogue clearly audible in most gameplay circumstances.
Secondly, a more subtle ducking effect called "subtle_vo" which very gently ducked out sounds by a few dB, barely noticeable, this was usually applied in moments when music and intensity is generally very low and we didn't want an obvious ducking effect applied.
Finally, we had an "xtreme_vo" (yeah, we spell it like that!) snapshot, which catered for all the moments when action was so thick and dense, music and effects were filling the spectrum and the regular VO duck just wasn't pulling enough out for lines to be audible. Between applying these three dialogue-centric ducking mixers, we could mix quickly for a variety of different dialogue scenarios and contexts without touching the overall levels of the assets themselves.
Fig 2. AudioBuilder: An example of the state-based dialogue ducking mixer snapshot used to carve out space in the soundtrack for dialogue. Also provides a good general look at our mixing features. Bus-tree hierarchy on the left, snapshots highlighted in blue to the left, fader settings in the center, meters below and panning overrides to the far right. (Click for larger version)
Scott Morgan explains the music mixing process:
The music mixing process consisted of first organizing and arranging the original composition's Nuendo sessions into groups to be exported as audio stems. Stems generally consisted of percussion, strings, horns, synths, string effects and misc effects. Because the score was originally composed with MIDI and samples or synths, each group was rendered to a 24-bit, 48kHz stereo audio file that was imported back into the session for mixing purposes.
Each section of music which corresponded to an event in the game, was individually exported, stem by stem. Rendering the MIDI/sample source out into sample content allowed for additional fine tuning with regard to EQ and volume automation per group and also allowed for fine tuning of the loop points to smooth out any clicks or pops that would have been present in the premixes. Sessions were prepped late in the day before mixing the next morning in the mix studio. A typical morning would see the mix of 2-4 missions worth of music, depending on their complexity.
One mission's worth of music was mixed at a time. A mixing pass consisted of first adjusting the individual layers to predetermined levels and applying standard EQ settings on each stem (e.g a low shelf on the percussion to bring down everything below 80Hz by about 4dB). Once this balancing pass was done, each section got specific attention in terms of adjusting the relationship between the stems. This was done to bring out certain aspects of the music and allow it to breathe a bit more dynamically.
At the end of each mixing session, each section was rendered out and these mixes were then used to replace the existing premixes in the game. The music was then checked in the context of the mission itself and crudely balanced against the sound effects and dialogue. This mission was now considered pre-mixed and ready for a final contextual game mixing pass.