In
this article, I'm going to describe Talking Heads, our facial
animation system which uses parsed speech and a skeletal animation
system to reduce the workload involved in creating facial
animation on large scale game projects. SCEE's Team Soho is
based in the heart of London, surrounded by a plethora of
postproduction houses. We have always found it difficult to
find and keep talented animators, especially with so many
appealing film projects being created on our doorstep here
in Soho.
The
Getaway is one of SCEE's groundbreaking in-house projects.
It is being designed by Team Soho, the studio that brought
you Porsche Challenge, Total NBA, and This
Is Football. It integrates the dark, gritty atmosphere
of films like Lock, Stock, and Two Smoking Barrels
and The Long Good Friday with a living, breathing,
digital rendition of London. The player will journey through
an action adventure in the shoes of a professional criminal
and an embittered police detective, seeing the story unfold
from two completely different characters with their own agendas.
The
Getaway takes place in possibly the largest environment
ever seen in a video game; we have painstakingly re-created
over 50 square kilometers of the heart of London in blistering
photorealistic detail. The player will be able to drive across
the capital from Kensington Palace to the Tower of London.
But the game involves much more than just racing, the player
must leave their vehicle to enter buildings on foot to commit
crimes ranging from bank robberies to gang hits.
So,
with a huge project such as The Getaway in development
and unable to find enough talented people, the decision was
made to create Talking Heads, a system that would severely
cut down on the number of man-hours spent on tedious lip-synching.
Breaking
It Down
The
first decision to be made was whether to use a typical blend-shape
animation process or to use a skeleton-based system. When
you add up the number of phonemes and emotions required to
create a believable talking head, you soon realize that blend
shapes become impractical. One character might have a minimum
of six emotions, 16 phonemes, and a bunch of facial movements
such as blinking, breathing, and raising an eyebrow. Blend
shapes require huge amounts of modeling, and also huge amounts
of data storage on your chosen gaming platform.
The
skeleton-based system would also present certain problems.
Each joint created in the skeleton hierarchy has to mimic
a specific muscle group in the face.
"If
you want to know exactly which muscle performs a certain action,
then you won't find an answer in Gray's Anatomy. The experts
still haven't defined the subject of facial expression. Though
psychologists have been busy updating our knowledge of the
face, anatomists have not." -- Gary Faigin, The Artist's
Complete Guide to Facial Expression
Most
information on the Internet is either too vague or far too
specialized. I found no one who could tell me what actually
makes us smile. The only way forward was to work with a mirror
close at hand, studying my own emotions and expressions. I
also studied the emotions of friends, family, work colleagues,
and people in everyday life. I have studied many books on
facial animation and over the years attended many seminars.
I strongly recommend a book by Gary Faigin, The Artist's Complete
Guide to Facial Expression. If you can, try and catch Richard
Williams in one of his three day master classes; his insight
into animation comes from working with the guys who created
some of the best Disney classics.
Building
Your Head
Only
part of a face is used during most expressions. The whole
face is not generally used in facial expressions. The areas
around the eyes, brows and the mouth contain the greatest
numbers of muscle groups. They are the areas that change the
most when we create an expression. We look at these two positions
first and gather most of our information from them. Although
other areas of the face do move (the cheeks in a smile for
example), 80 percent of an emotion is portrayed through these
two areas.
Neutral
positions. We can detect changes in a human face because we
understand when a face is in repose. We understand the positions
of the brow and the mouth, and how wide the eyes are. These
elements are constant from face to face. This is true if we
are familiar with a person's face at rest or not (see Figure
1).
This
changed the way we built our models, adding greater detail
around the eyes and the mouth. Simulating the muscle rings
seen in anatomy books allowed for greater movement in the
face at these points.
The
proportions of the face are the key to building a good head.
Get this right and you are well on the way to creating realistic
facial animation. Asymmetry is another goal to strive for
when modeling your heads. Do not create half a head and flip
it across to create the other half. The human head is not
perfectly symmetrical.
Study of facial proportions by Leonardo da Vinci.
There
are many rules concerning facial proportions. The overall
shape of the head is governed by a simple rule: The height
of the skull and the depth of the skull are nearly the same.
The average skull is only two-thirds as wide as it is tall.
The human head can be divided into thirds: forehead to brow;
brow to base of nose; and base of nose to chin. The most consistent
rule is that the halfway point of the head falls in the middle
of the eyes. Exceptions to this are rare. A few other general
rules:
The
width of the nose at the base is the same as the width of
an eye.
The distance between the brow and the bottom of the nose
governs the height of the ear.
The width
of the mouth is the same as the distance between the centers of the
pupils.
The angle
between the top lip and the bottom lip is 7.5 degrees.
The bottom
of the cheekbones is the same height as the end of the nose.
The
heads for The Getaway all stem from one model. This
head contains the correct polygon count, animation system
and weighting. We scan actors using a system created by a
company called Eyetronics, a very powerful and cost-effective
scanning process. A grid is projected onto the person's face
whom you wish to scan and photographs are taken. These photographs
are passed through the software and converted into 3D meshes.
Each mesh is sewn together by the software, and you end up
with a perfect 3D model of the person you scanned. At the
same time it creates a texture map and applies this to the
model.
Then
the original head model, the one that contains the correct
polygon count and animation, is morphed into the shape of
the scanned head. Alan Dann, an artist here at SCEE, wrote
proprietary in-house technology to morph the heads inside
Maya. The joints in the skeleton hierarchy are proportionally
moved to compensate for the changes in the head. We are left
with a model that has the stipulated in-game requirements
but looks like the actor we wish to see in the g.
1,500-polygon model used for high-res in-game and medium
resolution cutscenes.
The
Getaway heads are designed with incredible level of detail.
We use a 4,000-polygon model for extreme close-ups in the
real-time cut scenes. The highest-resolution in-game model
is 1,500 polygons, which includes tongue, teeth, eyelashes,
and hair.
The
skeleton hierarchy also contains level of detail; we remove
joints as the characters move further away from the camera.
Eventually only three joints remain, enough to rotate the
head and open the mouth using the jaw.
Creating
the Skeleton
The skeleton
hierarchy was created based on the above study. Two main joints are used
as the controls, the neck and the head. The "neck" is the base,
the joint that is constrained to the skeleton of the character model.
This joint can either be driven by constraints or motion capture data
from the character model can be copied across. This gives us the point
at which we have seamless interaction between the head and body. The "head"
joint would control slight head movements: shaking and nodding, random
head motions, and positions taken up in different expressions. The head
leans forward during anger or downward when sad. This is the joint that
all other joints spring from; it's used as the controlling joint. Wherever
it goes, the rest of the joints go. Other joints which relate to specific
muscle groups of the face are:
Six
joints control the forehead and eyebrows.
Three
control each eye, one in each eyelid and one for the eye itself
Two joints,
one on either side of the nose.
Two joints
control each cheek.
Two joints
on either side of the jaw.
Three
joints in the tongue.
Four
joints control the lips.
Front
and side views of the facial animation system, showing the skeleton
hierarchy.
The idea
behind this mass of joints is that they simulate certain muscle groups.
The muscles of the face are attached to the skull at one end. The other
end is attached straight to the flesh or to another muscle group. This
is different from muscles in the body, which are always attached to a
bone at both ends. As the muscles contract, it should be a simple case
of just animating the scales of our joints to simulate these contractions.
Unfortunately this is not the case, as there are actually hundreds of
muscles which all interact together. To achieve realistic expression we
had to rotate, scale, and translate the joints.
Weighting
How do you
go about assigning an arbitrary head model to this skeleton? The original
skinning of the character took two whole days of meticulous weighting,
using Maya and its paint weights tool to achieve this.
I didn't
wish to do this for every head. Joe Kilner, a programmer here at SCEE
who was writing the animation system with me, came up with a MEL script
(Maya Embedded Language) that would copy weights from one model to another.
The script basically saved out the weights of the vertices using two guidelines:
the vertex's normal direction and UV coordinates. This enabled us to export
weights from one head and import them onto another.
For this
to work, we had to make sure that all of our head textures conform to
a particular fixed template. The added bonus of this is that then we can
apply any texture to any head. The template also made it easier to create
our face textures.
Emotions and the
Face
Research
has shown that people recognize six universal emotions: sadness, anger,
joy, fear, disgust, and surprise. There are other expressions that we
have that are more ambiguous. If you mix the above expressions together,
people offer differing opinions on what they suggest. Also, physical states
such as pain, sleepiness, passion, and physical exertion tend to be harder
to recognize. So if you wish to make sure that the emotion you are trying
to portray is recognized, you must rely on the overall attitude or animation
of the character. Shyness, for example, is created with a slight smile
and downcast eyes. But this could be misinterpreted as embarrassed or
self-satisfied.
Emotions
are closely linked to each other. Worry is a less intense form of fear,
disdain is a mild version of disgust, and sternness is a mild version
of anger. Basically blending the six universal emotions or using lesser
versions of the full emotions gives us all the nuances of the human face.