|
In my
preceeding article, "Read
My Lips: Facial Animation Techniques" I left off with a nice
short list of the visemes I would need to represent speech realistically.
However, now I am left with the not insignificant problem of determining
exactly how to display these visemes in a real-time application.
It
may seem as if this is purely an art problem, better left to your art
staff Or, if you are a one-person development team, at least left to
the creative side of your brain. However, your analytical side needs
to inject itself in here a bit. This is one of those early production
decisions you read about so much in the Postmortem column that can make
or break your schedule and budget. Choose wisely and everything will
work out great. Choose poorly and your art staff or even your own brain
will throttle you.
Decisions,
Decisions
For
the final result, I want a 3D real-time character that can deliver various
pieces of dialog in the most convincing manner possible. Thanks to the
information learned last month, I know I can severely limit the amount
of work I need to do. I know that with 13 visemes, or visual phoneme
positions, I can reasonably represent most sounds I expect to encounter.
I even have a nice mapping from American English to my set of visemes.
Most other languages could probably be represented by these visemes
as well, but could require a different mapping table.
From
this information I can expect that if I can reasonably represent these
13 visemes with my character mesh, then continuous lip-synch should
be possible. So the problem really comes down to how I construct and
manipulate those meshes.
Viseme-Based
Methods
Certainly,
the obvious method for creating these 13 visemes is to generate 13 versions
of my character head mesh, one to represent each viseme. I can then
use the morphing techniques I discussed in my column “Mighty Morphing
Mesh Machine,” in the December 1998 issue of Game Developer to
interpolate smoothly between different sounds.
 |
|
Figure
1. The “l” viseme
as seen at the start
of the word “life.”
|
Modeling
the face to match the visemes is pretty easy. Once the artist has the
base mesh created, each viseme can be generated by deforming the mesh
any way necessary to get the right target frame. As long as no vertices
are added or deleted and the triangle topology remains the same, everything
should work out great. Figure 1 shows an image of a character displaying
the “L” viseme, as in the word “life.” The tongue is behind the top
teeth, slightly cupped, leaving gaps at the side of the mouth, and the
teeth are slightly parted.
Sounds
pretty good so far. Just create 13 morph targets for the visemes in
addition to the base frame and you’re done. Life’s great, back to physics,
right? Well, not quite yet.
Suppose
in addition to simply lip-synching dialog, your characters must express
some emotion. You want them to be able to say things sadly, or speak
cheerfully. We need to add an emotional component to the system.
Adding
Some Heart to the Story
At
first glance, it may seem that you can simply add some additional morph
targets for the base emotions. Most people describe six basic emotions.
Here they are with some of their traits. (See Goldfinger under “For
Further Info” for photo examples of the six emotions.)
1.
Happiness: Mouth smiles open or closed, cheeks puff, eyes narrow.
2.
Sadness: Mouth cornsers pull down, brows incline, upper eyelids
droop.
3.
Surprise: Brows raise up and arch, upper eyelids raise, jaw drops.
4.
Fear: Brows raise and draw together, upper eyelids raise, lower
eyelids tense upwards, jaw drops, mouth corners go out and down.
5.
Anger: Inner brows pull together and down, upper eyelids raise,
nostrils may flare, lips are closed tightly or open exposing teeth.
6.
Disgust: Middle portion of upper lip pulls up exposing teeth, inner
brows pull together and down, nose wrinkles.
There
are variations of these emotions, such as contempt, pain, distress,
excitement, but you get the idea. Very distinct versions of these six
will get the message across.
The
key thing to notice about this list is that many of these emotions directly
affect the same regions of the model as the visemes. If you simply layer
these emotions on top of the existing viseme morph targets, you can
get an additive effect. This can lead to ugly results.
 |
|
Figure
2. A very surprised
“l” viseme.
|
For
example, let me start with the “L” sound from before and blend in a
surprised emotion at 100 percent. The “L” sound moves the tongue up
to the top set of teeth and parts the mouth slightly. However, the surprise
target drops the jaw even farther but leaves the tongue alone. This
combination blends into the odd-looking character you see in Figure
2.
This
problem really becomes apparent when the two meshes are actually fighting
each other. For example, the “oo” viseme drives the lips into a tight,
pursed shape while the surprise emotion drives the lips apart. Nothing
pretty or realistic will come out of that combination.
When
I ran into this issue a couple of years ago, the solution was tied to
the weighting. By assigning a weight or priority to each morph target,
I can compensate for these problems. I give the “oo” viseme priority
over the surprise frame. This will suppress the effect that the surprise
emotion has over shared vertices.
|