Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Gamasutra: The Art & Business of Making Gamesspacer
arrowPress Releases
April 11, 2021
Games Press
View All     RSS

If you enjoy reading this site, you might also want to check out these UBM Tech sites:


Games Of 2020 - The Winners

March 9, 2009 Article Start Previous Page 12 of 21 Next

Room 23

Submitted By: Patrick Delaney

The next major innovation in video game technology will be the computer's recognition of the player's speech coupled with the mathematical decoding of human emotion in vocals and the believable synthesis of emotion and nuance in the human voice.

A relatively recent addition to the arsenal of gadgets owned by gamers is the microphone. Whether clipped to the ear or controller, attached to the side of a pair of gaming headphones, or installed inside a laptop or hand-held gaming device's framework, gaming microphones are now cheap, powerful, and easy to implement into a game's design.

Though right now they are only used to converse with other players or play inane mini-games, there is a rich vein of potential within the art of conversation.

Imagine: dialogue trees no longer mapped out in blocks of text but influenced through variables determined by emotion, inflection, and word choice. Imagine transforming a friendly soldier AI into a living, breathing entity whose propensity towards aiding the player is determined by whether or not the player took the time to talk to him after a vicious firefight.

Imagine shattering the wall between player and game by eliminating typed dialogue options, erasing structural distinctions between character classes, and providing unique interactions based on the individual player's temperament and mood.

While any genre of game could be revolutionized with this technology, the most visible improvement would be within games in which the player's interactions with NPCs played a dramatic and dynamic role in determining the player's experience. Though this applies to several genres, including RPG, Action, and Adventure games, I believe the most effective genre would be survival horror.

Video Game Title: Room 23

Operating System: Any with microphone support and enough CPU power to perform layers of complex calculations on the fly.

Required Hardware: Microphone

Room 23 is a 1st-person survival horror video game fashioned in the vein of such brutal horror games as Condemned: Criminal Origins, Manhunt, and the Silent Hill series. Gameplay consists of 1st person melee combat, high-tension puzzle solving, and dynamic interaction with NPCs. In the game, the players find themselves trapped in a sadistic modern labyrinth of concrete, metal, and human viscera.

Each room yields fresh new traps designed to weed out the weak in the most brutal fashion possible. Gibbering humanoid monsters populate the hallways between rooms. They are the unlucky ones: the labyrinth didn't kill them, it just drove them insane.

There's a horrible logic at work, some greater design controlling this machine. But who cares? Survival is all that matters right now and you'll never make it on your own. To survive you must team up with the people you meet as you descend into the Labyrinth, each with his own personality, secrets, and motivations.

This is where the conversation technology comes in: like the cult-classic sci-fi-horror film Cube, the determining factor of whether the player makes it out alive is not his individual capability but the relationships he forms with the people he meets. Not only does vocal interaction between player and NPC result in a more visceral experience, it increases tension by removing conveniently visible dialogue trees.

In their places are invisible networks of cause and effect, leaving the player in the dark as to all the possibilities. A given NPC may only have two dialogue results: friendly or hostile. Yet with the dialogue branches obscured, the player will always second-guess his choice: did he say the right thing? Was there another solution?

Removing visible dialogue options also helps prevent the player tactic of quick-saving before a conversation and then testing the results of each dialogue option, essentially removing any tension from the conversation.

While this sort of interaction may sound near impossible, it's really quite simple provided you structure the situation properly. Case in point: the player meets an NPC named "Mark" outside a room. Mark is in mild shock and looks ready to snap. To talk to him the player must initiate the conversation by talking into the microphone instead of tapping a button.

If the player is inside the "conversation vicinity" and the computer detects words or phrases such as "hello", "what's wrong", and so on, the camera focuses on him and conversation is initiated. With vocal communication, no dialogue options appear and the screen is left blank.

Mark demands to know who they are and what this place is. As the player responds by talking into the microphone, the voice-analysis software searches for key words and phrases such as "help", "calm", "it's ok", or "shut up", "don't know", "get lost", etc.. At the same time, the player's tone and emotion are also being monitored and converted into computable variables: an angry tone combined with rapid word succession influencing the "emotional variables" in a "negative" way, while a calm, slow tone would influence the variables in a "positive" way.

This positive or negative data, combined with word choice, allows the computer to steer the dialogue in a specific direction. In this case, if the result is positive, Mark will befriend the player and help him solve the trap. If it is negative, Mark will refuse to talk to the player or may even attack him. It is possible for the player to exit the conversation at any point by using phrases such as "good-bye", "got to go", "so long", and so on. At no point does any hint as to the player's full range of responses appear before the player.

It is in Mark's responses to the player that believable synthesis of the human voice is key. For example, if we want Mark to respond in a unique manner to the player's words we will need to make small adjustments to word choice and inflection: if the player says "Don't worry, tell me what's going on", we would want Mark to respond along the lines of "I don't know". Trying to think of every response, much less getting an actor to recite each, would be nearly impossible. Even when it is done, as with the indie game Façade, the result is stilted and unnatural.

Instead, Mark's words and emotion would have to be generated on the fly based on the computer analyzing the player's aforementioned emotional data as well as recognizing key groups of words at the end of the player's sentence. Again, good scripting is essential: it is possible to say "I don't know" in any number of ways, be it angry, sad, flirty, rude, or nonchalant. By controlling the environment as well as the tone of Mark's initial words to the player, a sea of possibilities can be channeled into a handful of manageable options: there's no reason to make Mark cheerful or glib if he's on the verge of a nervous breakdown.

It's all a matter of leading the player to dialogue "choke points" that come down to a specific set of pre-defined pathways. If the environment and initial interaction are properly calculated, synthesized dialogue can actually cut down on development time. Not only can it be changed to accommodate development changes, it eliminates the need to hire new actors for each project or to hire one actor to play multiple characters (as was painfully obvious in Fallout 3).

Synthetic voices and free-form dialogue will probably never replace the human voice, just as computer graphics will never capture the grit of on-site filmmaking. That doesn't mean they can't open up countless plot nuances that would otherwise be impossible to account for.

Even now games try to provide the player with magical, unique experiences that inspire them to fill forums with breathless tales of daring escapades. By their very nature, unscripted conversations conform to the individual. That means more exited players, more hype generated, more units moved, and a stronger argument for sequels.

Unscripted conversations won't reinvent the wheel by themselves, but they will make jaded veterans and doe-eyed newbies excited about the wheel all over again.

Article Start Previous Page 12 of 21 Next

Related Jobs

iD Tech
iD Tech — N/A, California, United States

Online STEAM Instructor/Mentor
Johnny Carson Center for Emerging Media Arts
Johnny Carson Center for Emerging Media Arts — Lincoln, Nebraska, United States

Assistant Professor of Emerging Media Arts (Virtual Production)
Airship Syndicate
Airship Syndicate — Austin, Texas, United States

Senior Character Artist
Sony PlayStation
Sony PlayStation — San Francisco, California, United States

Sr. Product Manager, Player Engagement & Social Experiences

Loading Comments

loader image