This is a love song. A love song to video game music. A love song to video game music that spends a lot of time pointing out that video game music would do well to iron its shirt, shower every day, and would it kill it to maybe shave every once in a while?
This piece is directed toward those who make, compose for, and/or enjoy a cinematic game experience common to most triple-A and an increasing number of indie titles. It touches on elements common to all video games in many places, but the purpose is not to play the nagging Jewish mother to two-man developers about how they should be more like their big brother who graduated summa cum laude and landed a big contract with Activision and will probably cure cancer someday.
The purpose is to help producers communicate with their composers, help composers hone their craft, and help the end consumer become more educated about the potential value of game music.
Too long has video game music been relegated to a dusty corner of gamers' minds. Sure, we all have fond memories of chip-tunes and our favorite melodies, but video game music has typically been viewed as a background soundtrack, not something that plays directly into the visual elements. Just look at all the games that allow you to import or stream your own music while you play.
This is a shame. Music can have a tremendous impact on the mood, feel, and emotion of any visual elements a game can try to convey. A shift in the music can take the exact same visual scene in two completely different directions. (I've always liked this example to show how a different score can change things up:)
Video games come in many forms and serve many purposes as far as the type of entertainment -- Ninja Gaiden in hard mode clearly scratches a different itch than FarmVille -- but I think it is safe to say that the majority of triple-A and otherwise popular games are trying to take a more cinematic, story-focused approach. What was the last FPS you played that didn't have a story component, regardless of how preposterous the premise? The visual techniques reflect this -- effects that emulate real camera patterns like light bloom, lens flare, focal shift and even film grit are all very common in the modern game.
Video games are unique to this A/V field in a number of ways -- one of the most obvious being that the pacing and even the order of events can be dictated by the player. Writing for this sort of uncertainty definitely present problems that any video game developer needs to consider. However, as games become more scripted, planned, and emotionally impactful, game composers would do well to study the centuries of experience other mediums can provide them. Re-inventing the wheel is not something we want to do here.
The focus on cinematic visuals and storytelling becomes increasingly obvious as we look into just how much straight-up non-interactive cinematic storytelling can be found in games. Oh sure, there might be a "press X to not die" moment sprinkled here or there, but when you strip out the real gameplay you are often left with a long sequence of cut-scenes that rivals the length of major movies.
For instance, The Batman: Arkham City cutscene playlist on YouTube is just north of 2 hours and 30 minutes long, longer than the majority of motion pictures. Gears of War 3 is 1:43 in duration. Xenoblade Chronicles? North of five hours, beating even the extended edition of Return of the King in length. Even completely disregarding player-driven gameplay, there are entire movies contained inside today's games.
Unfortunately, video game developers and the players themselves don't often see this connection. Corners are cut, sacrifices made, flat-out wrong practices are repeated time and again, and the gaming media looks upon it and proclaims it good. Games have made great strides lately with a more cinematic approach to storytelling, but it's sad to see a crucial piece of that puzzle so often neglected. The Final Fantasy series, Dragon Age, and Mass Effect have managed to start to understand lighting, blocking, cinematography, and the like, and utilized them to great effect -- but what about the music?
All visual media is more closely related than some would think. Film, TV, advertising, and games all share many similar traits, and music publishers often treat them in a very similar way. Though each presents unique advantages and challenges, all can be summed up with two simple, tiny words:
This is the essence of all visually-oriented music. Video games have long been a valid medium for telling an intriguing story, and the "to picture" approach has been proven over the centuries to be the best companion, as such. Our reaction to the music is often more subconscious and deeper than our visual analysis. At best it enhances and deepens our understanding of what our eyes tell us -- sometimes directly adding, sometimes showing another facet or wrinkle that we didn't see.
With all the cinematic focus on visual elements, why wouldn't we take a cinematic approach to game music?
Before we discuss using musically nerdy cerebral philosophies to guide game scoring, perhaps a quick overview of some basic techniques are in order. Frankly, many games fail to get even these right. The essential problem is that you can't just write music and expect it to work.
Our ears are specifically tuned to speech frequencies, and working around that can be difficult. Guess where melodies (and music in general) sound best to our ears? That's right -- the exact same range as speech. Think about the last time you were trying to hold a conversation when you had the radio tuned to a pop station. Did you notice how much you had to turn it down to be able to hear the other person? Now try talking with about the first 90 seconds of this on in the background:
This piece was scored specifically written to accommodate human voice. Can you keep the volume much louder than the pop music example? You should be able to quite well. It's about space.
For the purposes of visual media discussion, diegesis is anything that is directly represented on-screen as "of the world". So if there's a scene in a smoky cabaret and the music of the scene is being played by a jazz band contained therein, that's diegetic. "Bohemian Rhapsody" is diegetic within Wayne's World, as the characters are obviously aware that the music is coming out of their radio and are reacting to it. (This article talks a bit more about that and other important concepts.) Most all film and video game music is mimetic, not diegetic, meaning that it's not music that is in the world with the characters but has instead been added for the sake of the audience. It's important to understand why that matters: because diegesis is king.
If you ever study opera you will quickly see that the entire orchestra basically exists to support the singer(s). The vocalist is diegetic, or in the world, and must be skirted around carefully by the music, which is mimetic, or outside of the world. Therefore, any composer worth their salt must write around the diegetic part of the story because that's the part that's actually telling the story. In more traditional musical settings like opera this is quite easy to accomplish, as the melody lines are clearly written out in musical notation. In other mediums, it may not be as obvious. This doesn't mean that the concept can be ignored, however.
Even simple conversations have pitch. Great stage actors can have up to a three-octave speaking range; it is how emotion is carried through the voice. Try speaking in a monotone and see how much you are able to convey. Erich Korngold, one of the great early composers of film, was famous for writing diegetic film dialogue out in musical notation and then scoring around that, as he did earlier with opera librettos. While this may not be a necessary step, a basic understanding of the frequency ranges used by male and female speakers and how to avoid writing scores in the same range is not out of the realm of any composer's understanding. As an example, take this clip from The Adventures of Robin Hood, one of Korngold's most famous scores.
Notice that the instrumentation takes up the entire spectrum of sound at the beginning, but then right as the vocals enter at 18 seconds in, they part. The strings become higher, the bass gets a bit lower, and everything that was in between the two drops out. It's a virtual parting of the waters to make room for the voices in their proper register. Just to show it's not a fluke, it does the same thing at 0:43.
I can't think of a single game that really nails this concept, which surprises me. It's not necessarily difficult, one just has to be aware of it. It's sad commentary that the first thing I typically do when I load up a video game is turn the "voice" slider to maximum and the SFX and music sliders down considerably, because they have no concept of how to write and mix around the vocals, instead of barreling over them.
In addition to keeping the frequency range in mind, the composer must also consider other ways they can muddy the text, and avoid them. In the gaming world this most often manifests itself as a score that is so busy it's distracting. Too many notes, too fast a tempo, too much of everything. An expert composer can properly choose the time for the score to become prominent and the time for it to fade into the background, back and forth between the natural breaths of the narrative.
An excellent example of this comes from the trailer for Conan -- a film that was rather terrible from a cinematography and plot standpoint, but had an absolutely outstanding score composed by Basil Poledouris. Watch the whole thing:
Notice how Poledouris actually alters the music for the lines of text? And it works even on its own as a song? Pull out the high strings and choir, throw in some low brass to punch up James Earl Jones' dialogue, it works very well. The actual instrumentation lends itself to the interplay of dialogue and action scenes.
Now for the ugly side of that coin. Sometimes people opt for the lazy way out and dump this on the mixing engineer, who accomplishes such by "ducking" the music when a vocal track is present -- ducking being pulling back the music volume to make room for the vocal to be heard. An example of constant ducking is, well, a lot of trailers, as they tend to have busier, more "intense" music. We'll use Uncharted 3 as a recent example:
Starting at about 40 seconds into the launch trailer, notice the up-down-up-down-up-down as the speech pops in and out. It's distracting and annoying.
Creating audio space is like arranging a bunch of 3D bubbles. Whatever is in the center on the X- and Y-axes and at the front of the Z (depth) axis is going to grab the most attention, and that should always be the vocals. Ducking is a cheap way of pushing the music back on the Z-axis, but that constant shifting is noticeable; the much preferable method is to make space on the front plane of X and Y around the dialogue. It is quite possible to write around dialogue, as thousands upon thousands of hours of Opera and film scoring will attest. Why shouldn't video games do this, as well?
A beat, in scoring terms, refers to a particular visual point of action that should be accented. This can be a hard cut in the footage, a punch, just about anything. This can be accented in the music in many ways, typically depending on the requirements of the visuals. Here's a quick example that runs the gamut:
The low percussion (likely an udu) as the dragonfly lands on her nose is a beat. The sitar note as it flies away is another. The harp gliss for the reveal of Wonderland is another. The cymbal roll for discovering the caterpillar is yet another; the harsh low brass almost immediately after as his expression sours another still. 25 seconds in and we've hit five beats already. This is a fairly common pace for higher-energy sections and trailers. Notice that each of these has a different effect, but all come together to add interest and impact to what's happening on screen.
It is possible to create music for a beat-heavy visual without using beats, but then it's up to the foley and sound designers to pick up your slack. See here for a quintessential example:
The opposite effect, hard transitions and visual beats without any aural punch at all, feels so unnatural that I can't even find good examples of it, because no one does it.
I've used the example of a trailer here because this is something not really seen in games much, despite having many cut-scenes proliferate in the modern game. Occasionally one can find a use of a single beat, say a cue that build to a big crescendo, but considering many scenes can have potential beats that number well into the double-digits it is a woefully under-utilized idea.