| |
| | ||||
![]() | ||||||
| | | |||||
|
Audio Prototyping with Pure Data The Speech Patch Stitched (or chained speech) is an area of game audio which requires simple tools to playback a generated sentence. Often, however, a fairly complicated set of logic will determine what sentence to generate.
The Speech patch prototype supports a three-section, stitched sentence, using up to three options for each section. This functionality is obviously insufficient for most real-world applications, but it is kept simple here for the purposes of illustration. This patch could be expanded to allow a variable number of options and sections using similar run-time generation methods as the Sample Bank patch.
The Speech patch allows a composer to quickly check if a stitched sentence sounds "weird" (possibly an incorrect stitch gap length, uneven emphasis, and so on). It works by randomizing the possibilities for each section and playing back the generated sentence. The sentence can be easily tuned using an external wave editor to change the stitch gap length or to substitute different recorded takes in real time. One nice feature of the Speech patch that it outputs the name of each phrase, so it is possible to see if file names have the right data and what the entire sentence should be. It also graphically displays the sample data of each part, so the data can be quickly inspected without requiring the file to be opened in an external wave editor.
This patch works by filling the patch with the appropriate data. It does this by parsing the spreadsheet, which is in the following format: <ID> <section 1 number of choices> <section 1 WAV file option 1> <section 1 WAV file option 2> <section 1 WAV file option 3> <section 2 number of choices> ... <section 3 WAV file option 3>
In Figure 8, the OpenOffice spreadsheet is converted to the "speechlist.txt" text file as: 1 3 Vancouver
Detroit New_York 1 is_ahead_by _ _ 3 one two three The first sentence first chooses between the three choices of "Vancouver", "Detroit" and "New York". The underscore in "New_York" is necessary for Pure Data to treat it as a single symbol. The second section is fixed at a single choice of the phrase "are ahead by", with underscores being used to fill in the other two blank phrases. The final section has the three choices of "one", "two" or "three". When a sentence is selected, each section randomly loads in one of its options into an array. When the sentence is played, each section plays in sequence with the bang follows the chain as each gets started, just like the bouncing ball of yesteryear. Future expansions to this patch could include naming the phrases separate from their file names to adopt a more rigorous file naming convention, but still have human-readable output for the content of each phrase. The ability to test each option of each section in order could also easily be added to allow checking of all the data. Obviously, the number of sections and possible selections for each phrase should be expanded, possibly through the use of multiple spreadsheets to keep track of large repeated lists of options like player names and numbers. Crowd Engine Patch
Complex sounds whose behaviors are highly dependent on the current game state (such as crowd noise or vehicle engines) are one of the most difficult areas of sound generation in game audio systems. With the crowd engine prototype patch, I've produced a simple three-sample crossfading crowd model which uses the intensity of the crowd as the modulating input. Synthesis techniques such as subtractive synthesis or physical modeling could also be used to achieve more realistic effects, but will not be covered here. The volume tables simply describe a crossfade such that it smoothly transitions between the low sample at low intensity and the high sample at high intensity. The pitch is also increased as the intensity increases to add more dynamics to the sound.
The samples are loaded from the message in the "initPatch" subpatch and all the file-size parameters are set when the patch is first opened. The sample data can be changed interactively by reloading the data by clicking on the "read" message. The crossfade tables, pitch curve tables and their ranges can also easily be changed by opening the table and editing the data or parameters by hand. This allows the composer to interactively tune the crowd samples and their behaviors in real time.
It would be valuable to add the notion of one-shot overlays for whistles, chants and shouts to make the crowd more believable. Another improvement would be to add LFO variation to the volume and pitch to make them less static. Granular effects could also be used to make the samples sound less static. The above model could also be used to generate vehicle engine noise (such as a car or boat engine) by swapping the samples with engine-noise samples and exchanging the notion of intensity for RPM. A definite problem with this pitch scaling multi-sample crossfade model is the "chipmunk effect." The goal is to only really shift the fundamental pitch and its related overtones as though the crowd was raising the pitch of their voices as the intensity increases. However, when the samples are shifted up in pitch, all other unrelated frequency information such as formants and reverb gets shifted up as well which can make the crowd sound like a bunch of chipmunks. As the audio processors improve in game platforms, they may support more real-time frequency-domain processing techniques to overcome this and similar problems. Crowd noise is a highly complex sound defined by many individuals in a complex reverberant space. Engine noise is produced by thousands of moving parts which change every fraction of a second. As game audio technology platforms increase in power, the more important prototyping will become to test and refine models which simulate complex acoustic phenomena such as these
________________________________________________________ |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|