Gamasutra: The Art & Business of Making Gamesspacer
View All     RSS
September 2, 2014
arrowPress Releases
September 2, 2014
PR Newswire
View All
View All     Submit Event





If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 
Donít Let Bad Lip Sync Break the Spell
by Ben Serviss on 06/13/13 09:15:00 am   Expert Blogs   Featured Blogs

The following blog post, unless otherwise noted, was written by a member of Gamasutraís community.
The thoughts and opinions expressed are those of the writer and not Gamasutra or its parent company.

 

This article originally appeared on dashjump.com.

Bioshock Infinite Elizabeth
Your helicopter touches down on the deck. You’ve arrived at the outskirts of a magnificent city of the future – gleaming skyscrapers pierce distant clouds, flying robots of all kinds swerve in airborne traffic patterns, and a sunset so real you can’t believe it coats the entire scene in a rosy glow.

A policeman in ultra-sleek SWAT gear approaches to tell you the details of your new assignment here in Next Generation Consoleberg – but all you can focus on is his mouth, his lips moving up and down like some kind of animatronic fish.  In an instant you’re pulled out of the game, and the spell of immersion is broken.In an instant you’re pulled out of the game, and the spell of immersion is broken.

Like most technical advancements in videogames, lip sync has been in a continual state of evolution. Yet even as overall graphic fidelity has evened out in the current generation, the quality of character lip sync still varies significantly between studios and games. The problem may seem minor, but is deceptively substantial: As games’ graphical quality increases, so do consumer standards, making even slightly off lip sync orders of magnitude more noticeable and jarring.

Bioshock Infinite
Elizabeth's disjointed lip sync dampers the immersion in Bioshock Infinite.

Take Bioshock Infinite. Irrational’s luscious, sprawling world is so dense with architectural flourishes and environmental storytelling that whenever Elizabeth opens her mouth to initiate the Player/Companion AI Bonding Process ™, her mediocre lip sync actively detracts from the fantastic world the developers took such pains to create.

In general, the trend seems to be that unless the developer undertook targeted, laborious efforts to address lip sync and avoid these disruptions, the results have a higher chance of lowering the overall quality of the illusion.

But if only the studios who pour time, money and energy into lip sync and performance capture at the level of Quantic Dream with Heavy Rain and Team Bondi with L.A. Noire can secure performances that live up to the standard, should anyone else even bother?

Fortunately, there are plenty of ways to achieve compelling results without turning your studio into an animation house. The key lies in emphasizing your studio’s strengths instead of overreaching your capabilities. Some examples of how games have handled this challenge:

Place dialog sequences strategically. Silent Hill 2 is regarded as a paragon of complex, effective storytelling, yet technical barriers at the time prevented it from achieving realistic lip sync. The developers were nevertheless able to craft a fantastic experience by working around these restrictions whenever possible.

Silent Hill 2 Intro
Silent Hill 2's introductory sequence smartly works around limitations with lip sync.

In the first few minutes of the game, the player’s character James stares at himself in a mirror, rubs a hand over his face, and sighs wordlessly – all in close-up. Only after the game pulls back to a wide, top-down perspective does he begin a longer voiceover that sets the stage for the game.

Use establishing cinematics to plant representations of characters in the player’s mind. The first Silent Hill had even more technical obstacles with the PlayStation 1 hardware, but the developers elegantly circumvented these with brief CG cinematics introducing each character as they appeared – with zero voiceover.

Instead, the short snippets placed images of more fully-formed character models into the player’s head to use as a reference when the game reverted back to the more simplistic, PS1-powered in-game models.

The first Bioshock adopted a similar strategy. By partially obscuring the first few characters the player interacted with, the developers painted an incomplete picture for the player to finish, giving no opportunity for merely adequate lip sync to taint the illusion.

"A lot of the things that we can crunch numbers on in a simulation, we do that on the computer. But a lot of other things that the computer is not well-suited for, we actually run that in the player’s imagination." –Will Wright

Simulate in the player’s imagination, not the game. As Will Wright has famously said, there is much to be gained from choosing which parts of the experience to overtly include in the game and which parts to leave up to the player’s imagination.

For example, Thomas Was Alone nudges players to work with the game in creating realistic personalities for its crude square and rectangle-shaped characters by leveraging humanity’s innate tendency to anthropomorphize when given the slightest cue.

Thomas Was Alone
Thomas Was Alone relies on suggestions from the narrator to help turn shapes into memorable characters.

Of course, older, lesser technically-advanced games have used these techniques for years to draw players into a world. Establishing cinematics and animations, avatar portraits and plain ‘ol text has worked exceptionally well for RPGs and story-driven games long before 3D graphics became the standard.

Please Sync Responsibly

With a new batch of high-powered consoles on the way, developers will be eager to show off what they can do at the helm of the latest technology. Yet the closer the desired quality target for lip sync is to the studio’s capabilities, the better the results will be.

And for smaller studios, there is plenty of success to be had by playing to your strengths. As evidenced by BastionBraid and Thomas Was Alone, it’s entirely possible to create amazing, memorable experiences with nothing but text, maybe some voice, and the player’s imagination.

Ben Serviss is a freelance game designer working in commercial, social, educational and indie games. Follow him on Twitter at @benserviss.


Related Jobs

Sony Computer Entertainment America LLC
Sony Computer Entertainment America LLC — San Diego, California, United States
[09.02.14]

Lead Script Writer
Mobilityware
Mobilityware — Irvine, California, United States
[09.02.14]

Senior UI Artist
Zindagi Games
Zindagi Games — Camarillo, California, United States
[09.01.14]

Software Engineer
Zindagi Games
Zindagi Games — Camarillo, California, United States
[09.01.14]

Lead/Senior Designer






Comments


Robert Marney
profile image
Now I'm wondering if there is a middleware tool that will generate mouth animations based on speech-to-text of your audio files...

Jakub Majewski
profile image
Yes, there is. More than one, in fact, and you can choose between tools that generate animations based on speech, and others that generate based on text (both have their advantages and disadvantages).

Sergio Rosa
profile image
I use Face Robot in Softimage for lipsync, the results are pretty neat, and you have absolute control on the output.
Jump to the 7-ish minute mark http://www.youtube.com/watch?v=4Zugp9W620s

IMO there's one more problem with lipsync. Viewers shouldn't even be looking at the mouth if the rest of the character is animated well enough (eye expressions, body and such). If viewers focus way too much on the mouth, then the mouth is not the problem, the rest of the animation is.

I haven't played Bioshock Infinite, so I can't say anything about it based on the article.

Heather Adams
profile image
Sergio, I'd like to agree with you, that if everything else is beautiful no one should be paying attention to the lip sync, but I think it's important to note a few things about that.

If you have a character with a face people will always be looking to that face, especially when there's a dialogue going on, because that's how we will most relate to the character.
In fact when people see something new for the first time, such as a scribble or a compilation of shapes and/or colors they will most likely find a face somewhere in the mixture.
So, if you're already looking at the face, it will be 10 times more likely that you pick on on poor lip syncing.

Another reason why it's important to keep lip syncing as flushed out and beautiful as everything else you have in the scene is because they eye will always be drawn to the thing that's different.
If you have every part of a character moving (or everything in a scene moving) except one, your eye will be drawn to that one that isn't moving. If everything has a color, except one thing (or vice verse), your eye will be drawn to the odd man out.
Likewise, if you have everything moving beautifully until your character opens his mouth, especially if you're already looking at the face, you're going to notice that the way the mouth is moving is off. Even if your staring at the characters eyes, the mouth will still be within your peripherals. Even if you don't notice it at first, you'll still get the feeling that something isn't quite right.
And really, that's enough. You don't have to be able to pinpoint where the problem is to know that you feel like something's different.

That's just my two cents.

Christiaan Moleman
profile image
@Robert: Procedural lip-sync is only a time saver at best. Still needs to be tweaked by hand if you want it to be good.

@Sergio: Research shows people look at eyes first and mouth second out of all the features on a face, so getting lip-sync right is still pretty important. Really, if any part of the face is off it's difficult to see an NPC as a character and not some kind of nightmare puppet pretending to be human.

Eyes are definitely the key though. No excuse for static unblinking zombie eyes.

Merc Hoffner
profile image
@ Sergio

I completely disagree. Seeing lips and having them match up with audio is pretty important in accurate perception of speech, moreso for adults than children, and moreso in noisy scenarios. See http://en.wikipedia.org/wiki/McGurk_effect

Mismatching visual cues can actually reduce intelligibility of speech. Which is also why it's somewhat harder to understand someone with a thick beard, harder to understand someone facing away from you and harder to understand someone over the phone in a noisy environment. It's oddly common in videogames for subtitles to be enabled by default - the funny thing is, I end up relying on them disappointingly often, perhaps because developers subconsciously (or consciously) know that speech perception is artificially marred in games, or the sound mixing is poor or both. And when I'm reading subtitles, I'm definitely not taking in the finer animation.

Which brings us to another point. Animation is usually NOT where your eyes should be. In general communication we look at the face. Ancillary movement is noticeable by its presence or absence but in a naturalistic communication we don't consciously pick up on the details. Perceptually it should fade into the background as an atmospheric tool - an important one no doubt, and certainly one that's nuanced, but over-the-top movement to draw your attention to where the money's being spent appears hackneyed and jarring.

I call this the crazy hands effect. Actors in talking scenes often don't know what to do with their hands, both as a consequence of self-consciousness and the difference between the tasks real people perform while speaking and the artificial staging in a made up situation. As a result actors often keep their hands inappropriately still, or over-wave for emphasis, neither of which looks particularly convincing - one seeming artificial, the other seeming mad and unprofessional. To combat this judicious cinematographers and editors use strategic close shots and cropping to minimize the effects, and give the actors specific things to do with their hands - like driving, gardening, holding tea etc.

Videogames seem to suffer from this HORRIBLY, often with very wide continuous shots, inappropriate over-movement, waving, nodding and hand signing - as if to say - here's where that mo-cap money went. It's so out of place that it makes the animation distracting rather than atmospheric; more like a pantomime than a film. More Mo-Cap money should be spent on people standing around doing nothing - I swear it's more important to immersion than we think.

Nick Harris
profile image
This information on Supermarionation in Captain Scarlet may be of interest:

http://www.tvcentury21.com/marc/superm-hows-whys.html

Gerard Gouault
profile image
The worst lip-synch I have ever seen is in Impire
None.
The characters don't even open their mouth ever.

Alexander Ageno
profile image
It's sort of an interesting thing when bad lip synching almost makes me never notice it. Uncharted 2 and 3 wasn't perfect, yet I found myself caring about Drake and the gang.

I mean, take a look at Mass Effect. The lip synching is horrendous in those games, yet I never at once doubted any of the characters.

Vytautas Katarzis
profile image
You know, Dark Souls come into mind here. Why? Because it had almost zero lip sync (most characters didn't move their mouths when they spoke), but it didn't break immersion, at all in fact. It gave this surreal feel, if anything. Perhaps bad lip sync is worse that none?

Christiaan Moleman
profile image
It's worth noting that good lip-sync doesn't necessarily mean *detailed* lip-sync. The Muppets have better facial animation than most games. It's all about hitting the right shapes at the right time.

Performance capture has the major downside that it is 100% linear, which is not so useful for interactive purposes. Unless you somehow break it down into atoms of expression that can be recombined dynamically...


none
 
Comment: