We decided late in production that, for the system to sound really convincing, the midground groups for pedestrian and infected crowds needed to be divided into two sizes.
Because the sizes of the groups vary dramatically in the game from small to really large, and because volume alone does not represent density very well (a crowd sound of 40 people faded to a quarter the amplitude does not sound like a crowd of 10 people), two layers of crowd sounds, one smaller and one larger, proved the way to go.
For reasons of economy, we elected to have the smaller groups be mono, which get positioned in the quad matrix based on the averaged location of the group's members. The larger groups are stereo and get split left and right but still weighted in each quad speaker depending on the counts in that quadrant.
In terms of placement, the code was not only balancing the overall levels of each layer based on densities of objects in the world but also placing the weight on the volume per channel depending on the averaged position of objects.
This adds a sense orientation to the crowds and gives the listener a sense of directionality of the groups. This can be particularly useful for locating infected hordes in the game as they potentially pose a threat to the player character.
While the main background layers were composed primarily of custom recordings from New York itself, the midground layers were complete constructions. Because we already had a multitude of individual recordings of actors screaming and panicking, I decided to build a custom "walla generator" patch using Cycling '74's Max/MSP.
The patch is very simple. It takes an input folder of sounds and provides some basic options to control timing of the events with a random gap variation range, the number of channels to distribute to, a pitch range, a volume range and an overall EQ. It also has the ability to add a vst plug-in for reverb or any other desired external effect.
Through editing the timer, the density of the grouped content could then be tuned. The output of the patch was recorded and used directly as one of the layers of the 18-channel ambience, either as a stereo pass for the larger groups or a mono layer for the smaller groups. In total, there were about 400 individual reaction files used as the source for the crowd panic layers.
This same process was used for the infected mobs. In this case, the input files were the same infected creature sounds created by our sound designer, Cory Hawthorne, and then mixed with some additional death screams and panic of some of the actors to provide a sense of a mob gone mad with infection.
Increased reverb and EQ was applied to the distant groups versus the smaller groups as well as the run-time, procedural reverb which was applied during the game to all three tiers of ambience.
All the ambience was sent through a procedural reverb system. This is true not only of the midground layers but also of the background. Through a system of ray casting, the physical space of the listener was analyzed in real time, and the reverb parameters set to align with the size of the space that the listener was in.
While entering a tunnel in Central Park, for example, the system detects an enclosed space of a certain size and dynamically sets the reverb parameters. The sound of the park's birds and other ambient sounds is passed through the bigger reverb to give the illusion that the sounds are no longer arriving directly to the listener, but are reflected first, mimicking what would happen in the real world.
Similarly, there is a procedural filter roll-off applied to sounds when the player/listener moves up in the world. When climbing rooftops, the sound from the ground level (crowds, traffic, etc.) are first run through a low pass filter to remove the high frequencies, then cross-faded with the rooftop ambience to give a seamless distance fall-off and transition between the vertical "zones" in the game.
The same system of filtering is applied to groups of pedestrians or infected in the distance. If an infected zone is heavily populated but not immediately close to the listener, the amplitude of the infected group layer may be turned up but filtered to sound populated, but distant.