The Uneasy Merging of Narrative and Gameplay

By Ara Shirinian

[Developer Ara Shirinian (NightSky, The Red Star) picks apart the gameplay/narrative question by examining how games handle cinematic interactivity, how movies handle fight sequences, and how XEODesign's Nicole Lazzaro's list of gameplay emotions apply to one medium and not the other.]

Video games are incredibly powerful and sophisticated. Despite all of its history and baggage (all those WWII rifles and Pokémon are no insignificant burden) the video game is arguably the singular unique medium that can be considered a container for all other media that came before it.

Consider that back in the '70s, video games were generally conveyances for electronic gameplay and little else. When you bought a game for your Atari 2600, it went without saying that what you got was a system of rules, a goal that challenged you (with rare exceptions), and an interface to play within.

Video games in 2009 still largely feature those same essential ingredients, but technological developments over the past four decades have allowed our games to contain myriad other methods of expression that most of us take for granted.

The changes have not been quantitative -- they are qualitative, and they have exploded the sense of what a video game can be so much that the original point of the "video game" may not even be applicable in many cases.

The games of the '70s could not adequately convey the expansiveness of the landscape unfolding before you in a DiRT 2 rally, the mystery of first setting foot in BioShock's ruined utopia, or the sheer Tolkein-esque volume of lore told through Oblivion's in-game books.

They couldn't express the aural subtleties of Batman: Arkham Asylum, the passive-aggressive manipulations of your host in Portal, or the seething tension between Snake and Ocelot in the Metal Gear Solid series.

In a quiet and unassuming way, for better or worse, the video game of today has evolved beyond just abstract gameplay and into a generalized entertainment medium that can contain imagery, audio, and text of almost any kind. Indeed, we have already surpassed the point where the quality and category of exposition is more limited by how we choose to allocate our resources and our ingenuity than it is by any hard technological constraint.

Strictly speaking, two forms of media that video games are best (and uniquely) suited to express are visual narratives (like film), and gameplay (which specifically is a subset of human-computer interaction). Now most people agree that film is better-suited to expressing straight narrative than a game is. But gameplay is a unique quality of video games, and video games are also quite well-suited to expressing narrative, technically speaking -- they have most all the capabilities that film does.

So video games are the only game in town if you want gameplay, and they are pretty darn good at expressing anything we have done in the medium of film. So it's not surprising that many of the brightest game developers have been trying their darnedest to combine them in elegant ways -- to unify the two media, if you will.

A few years ago there was an outburst of media exposure around the prospect of inducing players to cry. From Neil Young, then EA Los Angeles' General Manager, "One of the things that's really important for us is answering the question that our company was founded on: 'Can a computer game make you cry?' ... That's an answer, he said, [Steven] Spielberg can help EA answer."

Soon after, designer David Jaffe revealed that he was in fact working on the very same problem with one of his game concepts. "One of them is to be the most emotional video game ever made. The end goal is that players at the end of the game are actually choked up -- if not crying -- because we've done our job so well."

Metal Gear Solid 4: Guns of the Patriots

Hideo Kojima, who has concentrated strongly on narrative with his Metal Gear Solid games, has also expressed a desire to integrate those elements into gameplay more effectively than what he and others have been able to accomplish:

"Halo, BioShock -- I see their approach and I think they are brilliant in some ways, but I still feel they still lack a kind of a deeper storyline, or the expression of the feelings of the characters. I do have plans of how I should approach this and get around it."

"In MGS4, yes, I put everything in the cut sequences, which I kind of regret to some extent, because maybe there is a new approach which I should think about. I'm always thinking about it -- making it interactive but at the same time telling the story part and the drama even more emotionally. I would like to take that approach, which I am still working on. "

On the face of it, it's a logical progression and combination. You just watch film. But you play games, and anything expressed in film can also be contained within a game, so the narrative that you actually get to play must be the next holy grail of gaming, right?

But why haven't we achieved that perfect synthesis of gameplay and narrative yet? Why have there always been compromises and stilted combinations of the two? Are we too naive, or just not smart enough as game developers to figure it out? Or is it something else?

To find out, first we have to evaluate what we have already accomplished in this arena, and then we have to look closely at each medium by itself, to see if there is anything that makes the combination a thornier concern than just whipping together peanut butter and chocolate.

You Got Gameplay in My Narrative... or Narrative in My Gameplay?

Narrative and gameplay are arguably the two most prominent forms of expression that can be found in a modern video game. But before we wade into the deep end of the pool, let's get precise about what we mean by narrative and gameplay respectively.

Gameplay is relatively simple to define for our purposes. Any action in the game that is occurs by way of interaction between the player and the game, we call gameplay.

Generically speaking, a narrative is just the description of a series of events. At its most atomic level, virtually any visual component of a game can carry a sort of narrative with it, telling you something about its context and history.

When we normally talk about narrative in the context of games, we mean things like cutscenes, plots, relationships between characters, and stories that are expressed continuously through the course of a game.

However, a scene of two characters arguing about where to go next is as valid a narrative as simply looking at a graphic of a bright blood-stained and nicked sword.

The main difference is one of complexity and sophistication. In the latter example, the narrative is expressed as a single visual: I see a sword, it has recently drawn blood, it has struck a very hard object. In the former example, the narrative is communicated via visuals, like setting and choice of dress, but also via actor performance, which includes all kinds of details like gesticulation, subtle body language, voice, and so on.

There's an important implication here -- if a game has any discernable graphics at all, there is always some sort of story, or narrative, being told to you (or at least being interpreted by you) as you play. Indeed, humans are really good at constructing narratives about what happened to that bloody, nicked sword, even if one wasn't offered or intended. Even pure action/zero story games cannot be wholly divorced from narrative, because the player will always extract meaning, and thereby a story, from graphics.

The point is that there exist very different levels and techniques of narrative expression, and each comes with its corresponding set of prerequisites that must be satisfied before the player (or viewer) is able to apprehend a game's narrative.

You only need to look at the sword for a few seconds to understand that it has likely just killed somebody and that it has been through some rough times. But now consider our example of two characters arguing about where to go next. Before we can understand how and what and why they are arguing, as well as the nature of their relationship, we have to observe them for a much longer period of time, and we cannot interrupt their exchange prematurely.

The kind of narrative being communicated to the player, if the communication is to be successful, carries along with it some requirements. If these requirements aren't satisfied, then the narrative element cannot be successfully conveyed.

This is why most games don't let you kill characters that are essential to the expression of certain narratives. If we jump in and kill one of the two characters who have just begun arguing, then we will never learn why they are arguing with each other. We have just destroyed the narrative's mode of communicating to the player by way of third party conversation.

Since the idea of narrative is just so broad, for the purposes of this discussion we need to be very specific about exactly what kind of narrative we are talking about.

It's clear that games have spent a lot of time in this rudimentary combination of gameplay and visual narrative. When I play Gradius, there is a narrative going on even when there is no substantial story to speak of. I'm controlling a ship, I'm in space, I'm shooting the bad guys. I'm going to shoot all the bad guys I see until I get to the baddest one, and then I'm going to shoot that one too. It's not the most interesting narrative, but there it is.

So the apparent desire of the industry and many of its luminaries is not to strictly combine gameplay and visual narrative -- we have been doing that ever since we could make something look like an embattled sword instead of a line of white pixels. The desire is in fact to elegantly combine the richest expressions of gameplay with the richest expressions of narrative without a compromise of either.

For our purposes, when we talk about narrative henceforth, we mean the kind of narrative that's more substantial than sprite or background graphics, the kind that's slower to digest -- conversational narrative, narrative that involves characters, their relationships and dispositions towards each other, and the like.

Let's take a survey of how some of the best attempts at combining gameplay and substantial narrative have fared thus far. We'll limit our scope to scenarios where gameplay and narrative as we defined it actually occur simultaneously.

How Does Unrestricted Narrative Affect Gameplay?

Back in 1983, Dragon's Lair was one of the very first and best-known attempts to mate a cinematic style of narrative with gameplay. Thanks to the fact that the entire game was essentially run by a laserdisc player, this arcade game was able to produce astonishing visuals which would surpass the abilities of "normal" video games for over a decade.

Unfortunately, the limits of its technology meant that gameplay had to be reduced to an even more rudimentary level than other games of its time. A non-interactive video sequence would play, and at its end the player would have to press some input -- either a direction or "attack."

If the wrong input was pressed, the player would see a death animation. If the right input was pressed, the player got to see the next video sequence. Since the game was designed with little consideration for the player's read of the action, the correct input was usually a guess on the player's part; the only way to successful play was by blind trial and error.

Sixteen years later, the Dreamcast game Shenmue was released, featuring a special game mode called the "Quick Time Event" or QTE. During a QTE, buttons would appear on-screen at various intervals.

If you pressed the corresponding button on the controller quickly enough, your character would do the corresponding action. It's structurally identical to Dragon's Lair, except that explicit cues appear on-screen.

What's gained by designing a game this way? Normally, good design principles say that the developer should take special care to present the action so that the player can see what they are doing, what the enemy is doing, as well as accurately apprehend the spatial relationships between all gameplay-relevant elements in the scene.

With this extreme abstraction of player inputs, the developer affords the freedom to display the action as cinematically as possible with impunity, because the action has become so abstract that the player doesn't need to know what exactly their avatar is doing, or what the enemy is doing, or how a change in the rules in this sequence might impact action in another part of the game. In fact, we get to throw out all of the difficult design considerations that we would have to tackle with conventional gameplay.


Something valuable is also sacrificed as a side effect of this ultimate abstraction of control. Since we threw out all of the difficult gameplay design, we necessarily decoupled the gameplay from the action so much that the player in fact does not need to pay attention to anything other than the on-screen cues of what button to press. If we displayed just a black background, in place of a cinematic sequence, the nature of the interaction would remain unchanged.

God of War, released six years after Shenmue, employs a virtually identical system at certain gameplay moments. While the cinematic quality of these sequences are breathtaking, and certainly are leaps and bounds above what either of our previous examples could accomplish, on the gameplay level we see the exact same kind of decoupling and rudimentary interaction.

Inis' Gitaroo Man, released in 2002, offered highly cinematic narrative as a background, operating in concert with a sophisticated rhythm-action gameplay system in the foreground. This is an interesting example because even though the gameplay feels very connected to the narrative, the two were totally decoupled, in the sense that the narrative didn't contain any information that would help you play the game effectively -- like the previous example, gameplay interaction was unaffected by anything going on in the background.

However, the background was affected by the gameplay, and this resulted in a very nice illusion that made two independent layers feel like one. The gameplay patterns were scripted to match the music, and the background narrative was also scripted to coincide with the music track. The player character and enemies would also pose and act according to your inputs, although you didn't need to notice or observe them at all in order to play the game. The developer's DS game, Elite Beat Agents, uses many of the same techniques.

What we can learn from these examples, at least with this technique, is that the developer gains the freedom to do anything they want with their game camera, and indeed the background action, at the cost of severely confining the quality of gameplay to "press this now" type of interaction.

In other words, when we allow unrestricted cinematic narrative expression during gameplay, the gameplay itself must be either severely limited or decoupled from the narrative.

How Does Restricted Narrative Affect Gameplay?

Suppose we have a narrative we want to convey, and we want to do it during gameplay, but we decide to restrict the means we have to express it -- possibly by limiting it to a single type of media. Let's examine some different ways games have dealt with this type of situation.

In BioShock, narrative expression was frequently limited to the auditory channel, in the form of audio logs that the player would pick up and optionally listen to throughout the game. When narrative is limited to audio, it means that all of the other available channels remain open for communicating gameplay.

Since visual language is arguably the most important method of facilitating gameplay, this means that the player can apprehend the auditory narrative without much interference or limitations on game action.

As long as the audio from the narrative does not interfere with whatever audio cues might be crucial to gameplay, both the narrative and gameplay can live quite nicely alongside each other.

However, we find an important tradeoff even in this exchange. You cannot express the same things in the same ways in pure audio as you can if you also had control of the visual dimension. You only have volume, duration and tone to express the quality of an explosion.

You can't depict a character climbing onto the Hydra's head except by literal narrative account ("I'm climbing onto the Hydra's head!") There are all kinds of details that cannot be adequately expressed when your narrative is limited to audio.

Some games that also limit narrative to auditory information do it in the context of a character that is actually in the game to varying degrees.

In several of the Ace Combat games, narrative is conveyed to the player via radio communications from a special radar aircraft that is essentially uninvolved in the gameplay action, as well as from friendly and enemy aircraft that do engage the player directly.

This case is interesting because the game also relaxes the idea of how radio communication actually works in order to present a more compelling narrative. Regardless of friend or foe or radio source, you get to hear all characters' communications all the time. Unlike BioShock, the auditory narrative takes place in the present rather than from the past.

Elaborating on this theme a step further, In Call of Duty 4: Modern Warfare, narrative takes place in the form of an NPC soldier who operates alongside you, gives you commands, and warns you about the obstacles you're about to face. Because of this, the narrative is meshed with gameplay more tightly than in BioShock or Ace Combat, as now we have a visual, physical source of narrative in close proximity that actually performs in concert with gameplay.

Even in this situation, our narrative control is limited. Because the player has full control over the camera, we cannot author any camera work as part of the narrative expression. When you're engaging the enemy, you don't have much time to look at what your fellow soldier is doing. Call of Duty 4 tends to place more emphasis on listening to what your soldier friend says when there are enemies in your face, and then tends to emphasize the visual channel when there aren't.

Half-Life 2

Similarly, Half-Life 2 allows the player to stay in game perspective and control during cutscenes, instead of arresting the camera away. There are two non-obvious tradeoffs here.

First, the developer has to carefully entice the player to look in the intended direction so they can see all the action. This can be accomplished by limiting the view of interest by the inherent design of the physical space (If you're in a hallway, there are only two interesting directions to look in, and the player will tend to favor the direction they are already heading in).

Second, the game must establish a grammar of trust in the game world by not throwing in a crazy enemy who will backstab the player while he's watching two characters converse. Even though the interface is unchanged from gameplay scene to narrative scene, the player needs a hint as well as this consistency of grammar so they "know" their attention should be on a conversation instead of walking around the room's corners looking for items and enemies.

In either case, you can never guarantee that the player will actually view what they are supposed to.

From these examples, we notice that when narrative expression is restricted in some ways, gameplay can be expressed much more richly.

How Do Narrative and Gameplay Get in the Way of Each Other?

Let's go back to the last example, where I described the value of establishing a grammar of trust in narrative situations where the player actually has some control. There is also quite a bit of value in narrative deception and surprise.

Let's say we really want the player to experience the effect of comfortably watching two characters talk to each other, and then feel the sting of deception as something suddenly attacks them from behind. In film, this is a totally valid and oft-used technique. In games, we need to proceed much more carefully. But why?

In a film, the audience has no control over what happens. The director of the film, if so inclined, can relish the freedom to backstab the viewer over and over with impunity. The viewer is trained to fear and expect the possibility of a backstab, and yet they can do nothing to avoid it.

Now let's consider what happens in a game. We put the player in a fully controllable cutscene that appears to be innocuous, where the only element of interest is two characters talking. The player has their guard down, and views the conversation. Without warning, we surprise the player by producing a backstabbing enemy that may actually kill him.

So the developer gets the intended payoff, but now we have trained the player to be wary of consequential surprises, and this expectation will carry forward with disastrous consequences, all because the player can control themselves in the narrative.

The next time the player enters a cutscene, their attention will be on the possible enemy hiding in the dark, rather than the narrative of the present. Any hope remaining that the player will pay full attention to the narrative has evaporated. The price of this fearful effect on the player is the loss of their trust, and this fear will not diminish until many more cutscenes are experienced without incident.

Most games handle this type of scenario by working in the deception in such a way so that it is unavoidable -- inconsequential in gameplay terms -- and thus restricted to the narrative. In this way the player gets surprised, but then realizes, "oh, I was supposed to die," thereby compartmentalizing the action into bins called "stuff that happens in the cutscene", and "stuff that happens in the game."

Interestingly, this effect works completely against the intent to blur or altogether remove the line between interaction and narrative. But it's less because of our inability to be "smart enough" as developers, and more because of an inherent side effect of training the player, and then giving them control.

Let's assume that you don't like getting punched in the face. If you're going to watch the hypothetical film, John Punches Daniel in the Face 80 Times, The director can say that Daniel will not fight back, even against the viewer's wishes, because the intent is to make viewers know what it feels like to get punched in the face 80 times in rapid succession.

But in the game, John Punches Daniel in the Face 80 Times, where you play as Daniel and have a limited life bar (and some actual net negative consequence in game terms to getting punched), you will undoubtedly do all that you can to avoid getting punched. The only way for us to make you feel all 80 punches in succession is to freeze you there in some way. But if you're frozen, then we're watching a film, and not playing a game.

There are just some things that you cannot do to a person when they have control over the situation. Translated into our specific context, there are just some things you can do in narrative but cannot do in gameplay. So the next logical question would be, "Are there things that work as gameplay that are impossible to accomplish as narrative?"

Not surprisingly, with a little investigation we find that the answer to this question is also yes. In Why We Play Games, Nicole Lazzaro describes eight categories of emotion that are commonly experienced in games. Some of these, like Disgust or Fear, can be experienced with equal intensity whether the medium is gameplay or cinema. However, there are a few that are pretty exclusively in the realm of gameplay, because they involve the exertion of effort as a necessary prerequisite.

We already know that strict narrative requires that the observer has no say in the outcome of events. As we've shown, this limitation is not inherently negative, since the thriller John Punches Daniel in the Face 80 Times could not exist without it. This single distinguishing element also has important implications -- not only can you not control the outcome, you are also not allowed to exert significant effort in any way. Indeed, you don't even have any way to exert effort in the narrative's world, even if you wanted to.

So what does this mean? Two gameplay emotions, "Kvell" and "Fiero," can only exist as a possible consequence of the exertion of effort on the part of the player -- exertion that can never be accomplished within a pure narrative.

Kvell is a verb, but also a feeling. It's a feeling of pride, especially over the accomplishments of one's children. You don't have to make your kids play video games to feel kvell, but when you are proud that you taught your close friend a secret Street Fighter combo that he uses to win a tournament with, that's kvell.

When we saw Mr. Miyagi teach Daniel over the course of The Karate Kid, we felt good when he finally beats the Cobra Kai in the karate tournament, but we can't feel kvell over it, since we didn't teach him. Kvell can also be experienced in games where the player has control over training, instructing and cultivating their minions -- such as Oddworld and Pokémon.

Fiero, on the other hand, is the specific feeling of success after one has undertaken significant effort to accomplish something. When you beat a difficult boss after several failures or finally figure out a challenging puzzle, that feeling of victory you experience is Fiero. Fiero junkies love to play insanely difficult and unfair games, because it's all about the payoff for them. The intense satisfaction gained by surmounting challenges in such games far outweigh any amount of torture the game designer unwittingly springs upon them.

To return to our The Karate Kid example, we might feel happy or relieved when Daniel wins the tournament. But we didn't expend any effort to win any tournaments ourselves, so the meaning of victory for us is not the same as it was for Daniel-san. Narrative can describe Daniel-san's expression of Fiero, but it cannot make us feel the way he did. The only chance you have of giving the player such a feeling is through gameplay.

How Cinema Disrupts Continuity to Enhance Storytelling, and How Continuity Is Essential to Gameplay

Another area where gameplay and narrative find conflict on a regular basis is that of continuity and the precision of expression of action.

The domain of continuity is one fundamental way to look at the differences between narrative and gameplay. In gameplay, the precise continuity of events matters, whereas in narrative, it does not. In fact, in narrative, precise continuity actually gets in the way of effectively expressing the story. Because of this, film editors routinely create impossible sequences of events in order to make the story more compelling.

We're not talking about "impossible" as in unrealistic or fantastic, like dodging bullets, but "impossible" in the sense of continuity and time compression, as we'll see shortly. As viewers we also routinely accept this at face value because over the years and years of exposure to film, we intuitively gain an understanding of the grammar of the camera as a vehicle for storytelling, and how it can jump back and forth in time and space.

Conversely, in gameplay, if continuity is not expressed in an absolutely precise way, it interferes with the player's ability to make decisions and even comprehend what is going on in the first place.

Let's look at the example of a one-on-one fight. In Rocky IV, the climactic battle between Rocky and Drago lasts a grueling 15 rounds. In the course of the battle, first we see Drago totally dominate Rocky, living up to his reputation as an invincible machine.

Gradually Rocky comes back, and the match gets increasingly ugly as the tide swings back and forth again and again until Drago finally relents. There are several elements of interest here, and it's worthwhile to study the entire sequence if you have access to it (try searching on YouTube for "Rocky vs Drago").

First, let's consider the action of the camera. The match starts out with a nearly "live" camera for the first few rounds, where most all of its attention is on Rocky and Drago. For at least the 10 middle rounds, the sequence becomes a montage. Time is compressed, as we see the graphics for each subsequent round fly across the screen not minutes apart, but closer to seconds.

We see the camera cut away to characters in the audience, even some kids watching TV, at just the exact right moment so we can see their reactions. The camera also seems to cut back (even during non-montage moments) to the fight at the exact right instant to see a great punch wind up and hit. We also see dual-camera shots that are separated by both time and space where the boxers are fighting on the right half of the screen, and Rocky or Drago is sitting in the corner between rounds on the left side.

Second, let's consider the action of the boxers. We also do not have a precise sense of their actual physical state or their performance, outside of what the film explicitly tells us for dramatic effect. We don't know exactly how close either fighter really is to toppling, and in fact the story takes advantage of us by making it seem like it must be over now for sure, only to see Rocky come back with inhuman strength again and again.

It doesn't matter how many jabs Rocky took to the face or how bad Drago's cut really was. What does matter is the emotional content of those elements, and that's why they are utilized in film. Consider that many of the camera cuts showing punches traded back and forth could have been swapped for different cuts altogether without changing the feel of the sequence one bit.

All of these techniques make the narrative more compelling, and they all involve slicing and dicing the actual order of events (if they even existed) for sake of storytelling. This is nothing special in terms of cinema, but the key here is at all these techniques are only possible because the "player" has no control over what is happening.

To have the camera cut completely away for several moments, totally blinding your view of an ongoing battle with an opponent is unacceptable in an interactive context (certainly, Rocky and Drago are not surreptitiously resting when the camera isn't on them).

Swapping one series of punches for another works because nobody is keeping track of exactly how hurt each fighter is and how that affects their stamina over the course of the match. The audience doesn't know (or often care) about that atomic level of detail, and thus the filmmaker is allowed to take advantage of it. But in gameplay, the player does care, because each atomic event has a definite consequence according to the rules of the game.

As our gameplay version of the same example, let's consider a typical match in a one-on-one game such as Street Fighter IV. We can't cut away to a crowd in the middle of a match to enhance excitement, because the player has full control over their character.

At best, any such attempt would be disruptive. We can't stylistically "fast forward" through a long match because the player is actually creating the continuity of the match as it happens. And most of all, we do care to keep track of exactly how each hit and miss happens, even the non-dramatic ones, because precise spatial relationships are what determines who gets hit, and each character can only sustain so much damage before going down.

Ironically enough, the very techniques that make cinema compelling end up breaking gameplay, and the very elements that are essential to gameplay are inconsequential in cinema.


The purpose of this piece was neither to cheerlead for ostensible gameplay purists who appear to hate story in their games, nor to root for those who view video games as a wonderful medium for storytelling. If you find yourself in either of these camps, and even if you don't, I hope that I've been able to show that narrative and gameplay are less like peanut butter and chocolate, to be swirled together only if we could find the right temperature and mixer, and more like spoons and knives.

There are some things that both media can do well, but there are also many things that only narrative can accomplish, and still other things that only gameplay can accomplish. Like any specialized tools, the appropriate one should be selected for the job, and selecting the wrong tool can be clumsy at best and catastrophic at worst.

Don't just utilize narrative and gameplay in your products according to typical convention. Equally senseless is the idea to use both simultaneously with the blind hope that the outcome will somehow be better than just having one or the other. Be deliberate in how you select and architect the medium that's best at conveying what you want the player to experience.

From this perspective, the idea of getting a player to cry is not some kind of mystical holy grail; it's done the same way that movies (as well as narrative-heavy games) have been doing it all these years. On the other hand, if you're trying to get the player choked up purely through gameplay, you may have as much difficulty with trying to make a player feel Fiero from watching a movie.

Return to the full version of this article
Copyright © UBM Tech, All rights reserved