Using our proprietary, in-house audio tool AudioBuilder, designers and coders are able to collaborate on building custom interfaces for tuning all aspects of audio in the game. This is done by way of a Graph/Patch interface that works similar to Reactor or Max/MSP.
Our ambience graph/patch was the collaborative effort of one of our audio programmers, Steven Scherrelies, and myself. Steven deserves a lot of credit for his diligence in designing and implementing the code and the UI of the system.
The ambience interface went through many iterations before arriving at the final system used in Prototype. The end result is a system that is heavily catered to Prototype's open-world, free-roaming nature.
Some aspects of the interface are set by the sound designer in real time, such as overall volume, cross-fade curves and max volumes of any individual streams. A variety of parameters are also exposed to allow tuning of roll-off times, smoothing factors, channel leaks, etc. to allow for subtle "massaging" of the resulting sound.
Other parameters in the patch are controlled at runtime by the audio code such as positional weighting of the sounds in the quadraphonic matrix and cross-fade amounts between zones. The audio code takes input from the AI and other game systems to determine how many objects are in a given quadrant of a sphere around the listener and calculate what the volume level should be per channel.
The same raycasting used for the procedural reverb is used for the ambience to determine how big the spheres should be by detecting walls and surfaces. This is included to prevent crowds from being heard through walls or other obstructions.
The system is very reactionary. It has no memory or sense of direction. It responds to the input from the AI and other game systems immediately with no discretion. Because of this, value smoothing is crucial to the end result being perceived as transparent and fluid. The smoothing algorithm is essentially that of a low pass filter, the basic parameters of which are exposed in the UI of the patch for tuning purposes.
The overall output of the ambience system is then bussed to our mixer system, which allows overall control of ambience levels within the game's main mix state. This mixer-based control allows the fading of ambience to occur during cinematics, or the filtering of the ambient sound during special game modes like sensory powers.
This system works fairly well with a couple noticeable exceptions: pedestrians don't have an "idle" state. This could have contributed a lot, but we were already pushing the limit with the number of channels and making an appropriate sounding idle crowd layer proved more difficult than expected.
Another issue was crowds were not reactionary enough. Because we resorted to fading the layers, crowds never "burst" with fear, they only grew slowly into fear.
Also, interleaving the ambiences means that all elements of the ambience are linked in time -- so if a car honk occurs two minutes into the file and a scream of a pedestrian three seconds later, this pattern will be repeated every time those channels are both turned up at that stage of the loop.
This leads to predictability, which is never a great attribute with ambience. Lastly, this system does not handle interiors very well so interiors were dealt with by using completely different four-channel ambiences and no object grouping.
Advantages of this system include runtime performance; the interleaved, 18-channel file minimizes disk seeks considerably over running the elements of the ambience as independent streams.
A secondary advantage to the interleaving is that zones can cross-fade procedurally, rather than based on a trigger volume event. This means the player or listener's position in the world can be used to determine the cross-fade amount of, say, the park and the city, rather than a zone boundary which triggers a preset cross-fade between two independent streams.
On the whole, the biggest advantage is the dynamics of the system. As cars, people and infected creatures come and go in the game space, so does the sound. This contributes to an ever-present sense that the game world is alive, fluid and bustling like the true representation of New York City should be.
This system is best-suited to an open-world game which includes high densities of objects and characters and can change dramatically and quickly at any given time.
If we were to tackle the same issues again, I would record custom walla with large groups of actors in an outdoor space and record a wider variety of New York sounds from increased distance of populations of cars and people. This would improve the clarity and division of background and midground ambiences and increase the overall quality of the content. This would also allow increased emotional range of the crowds themselves and a true sense of the city as a character in the game.