Gamasutra is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.


Gamasutra: The Art & Business of Making Gamesspacer
Read My Lips: Facial Animation Techniques
arrowPress Releases
December 6, 2019
Games Press
View All     RSS







If you enjoy reading this site, you might also want to check out these UBM Tech sites:


 

Read My Lips: Facial Animation Techniques


April 6, 2000 Article Start Previous Page 2 of 3 Next
 

Science Break

The field of linguistics, specifically phonetics, compares phonemes according to their actual physical attributes. The grouping does not really concentrate on the visual aspects, as sounds rely on things going on in the throat and in the mouth, as well as on the lips. But, perhaps this can help me organize the phonemes a bit.

Sounds can be categorized according to voicing, manner of articulation (airflow), and the places of articulation. There are more, but these will get the job done. As speakers of English, we automatically create sounds correctly without thinking about what is going on inside the mouth. Yet, when we see a bad animation, we know it doesn’t look quite right although we may not know why. With the information below, you will be equipped to know why things look wrong. Now for some group participation. This is an interactive article. Go on, no one is looking. The categories we want to examine are:

Voiced vs. Voiceless. Put your hand on your throat and say something. You can feel an intermittent vibration. Now say, “p-at, b-at, p-at, b-at,” (emphasizing the initial consonant). Looking at the face, there is no visual difference between voiced and voiceless sounds. In some sounds the vocal cords are vibrating together (b-voiced) and in some the vocal cords are apart (p- voiceless). This is an automatic no-brainer as far as reducing sounds into one viseme. Any pair of sounds that is only different because of voicing can be reduced to the same viseme. In English, that eliminates eight phonemes.

Nasal vs. oral. Put your fingers on your nose. Slowly say “momentary.” You can feel your nose vibrating when you are saying the “m.” Some sounds are said through the nasal cavity, but most are said through the oral cavity. These are also not visibly different. So again, we have an automatic reduction in phonemes. All three nasal sounds in English can be included in the oral viseme counterpart.

Manners of Speech. Sounds can also be differentiated by the amount of opening through the oral tract. These also do not offer a visible clue, but are very important for categorizing phonemes. Sounds that have complete closure of the airstream are called stops. Sounds that have a partially obstructed closure and turbulent airflow are called fricatives. A sound that combines a stop/fricative is called an affricate. Sounds that have a narrowing of the vocal tract, but no turbulent airflow, are called approximates. And then there are sounds that have relatively no obstruction of the airflow; these are the vowels.

Figure 2. Side cut-out view of places of articulation.

Places of Articulation. This involves where the sound is being made in the mouth. This is where the visible differences occur. There are several places of articulation (see Figure 2) involving the lips, teeth, tongue, and stuff in the back of the mouth (the palate, velum, and glottis) for the consonants. Vowel placement is based on the relative height of the tongue and whether the tongue is more front or back in the mouth. A differentiating factor not listed in Chart 1 is lip rounding. This is not associated with any particular place of articulation and will be addressed below. Whew.

As I said, there are 35 phonemes in my dialect of American English. You may have more. Chart 1 is a summary of these phonemes. Read the chart from the front of the mouth to the back of the mouth. Try saying each of the words that illustrate the phoneme that is in bold. Have a look in the mirror and see what is going on as well as feel what is going on inside the head. By using the distinction of voicing and oral/nasal, we have already eliminated 11 phonemes. Let’s continue the reduction of phonemes into the usable visemes.

Take It to the Limit

According to the chart, there are three bilabials, which are sounds made with both lips. They are [b], [p], and [m]. According to the Figures 3a, 3b, and 3c they have different attributes inside the mouth. B and P only differ in that the B makes use of the vocal cords and P does not. The M sound is nasal and voiced so it is similar to the B sound, but it is a nasal sound. The cool thing about these sounds is that while there are differences inside the mouth, visually there is no difference. If you look in a mirror and say “buy,” “pie,” and “my” they all look identical. We have reduced three phonemes into one viseme.

Chart 1. American English phoneme summary chart.

While you’re working, remember that you are thinking with respect to sounds (phonemes), not letters. In many cases a phoneme is made up of multiple letters. So, if we go through Chart 1, we can continue to reduce the 35 phonemes into 13 visemes. For the most part, the visemes are categorized along the lines of the Places of Articulation (with the exception of [r]).

Take a look at the following listing of visemes. It describes the look of each phoneme in American English. The only phoneme not listed is [h]. “In English, ‘h’ acts like a consonant, but from an articulatory point of view it is simply the voiceless counterpart of the following vowel.” (Ladefoged, 1982:33-4). In other words, treat [h] like the vowel that comes after it.

Visemes

1. [p, b, m] - Closed lips.

2. [w] & [boot] - Pursed lips.

3. [r*] & [book] - Rounded open lips with corner of lips slightly puckered. If you look at Chart 1, [r] is made in the same place in the mouth as the sounds of #7 below. One of the attributes not denoted in the chart is lip rounding. If [r] is at the beginning of a word, then it fits here. Try saying “right” vs. “car.”

4. [v] & [f ] - Lower lip drawn up to upper teeth.

5. [thy] & [thigh] - Tongue between teeth, no gaps on sides.

6. [l] - Tip of tongue behind open teeth, gaps on sides.

7. [d,t,z,s,r*,n] - Relaxed mouth with mostly closed teeth with pinkness of tongue behind teeth (tip of tongue on ridge behind upper teeth).

8. [vision, shy, jive, chime] Slightly open mouth with mostly closed teeth and corners of lips slightly tightened.

9. [y, g, k, hang, uh-oh] - Slightly open mouth with mostly closed teeth.

10. [beat, bit] - Wide, slightly open mouth.

11. [bait, bet, but] - Neutral mouth with slightly parted teeth and slightly dropped jaw.

12. [boat] - very round lips, slight dropped jaw.

13. [bat, bought] - open mouth with very dropped jaw.

To see how helpful this information can be when animating a face take a word like “hack.” It has four letters, three phonemes, and only two visemes (13 and 9 in the listing).

Say that you don’t have enough space to include 13 visemes and whatever emotions you want expressed. Well, by using Chart 1 and the list of visemes in the listing, you can make logical decisions of where to cut. For example, if you only have room for 12 visemes, you can combine viseme 5 and 6 or 6 and 7 below. For 11 visemes, continue combining visemes by incorporating viseme 7 and 9 below. For 10, combine visemes 2 and 3. For 9, combine 8 with the new viseme 7/9. For 8, combine 11 and 13.

If I were really pressed for space, I could keep combining and drop this list down further. Most drastic would be three frames (Open, Closed, and Pursed as in boot) or even a simple two frames of lip flap open and closed. In this case you would just alternate between opened and closed once in a while. But that isn’t very fun or realistic, is it?


Article Start Previous Page 2 of 3 Next

Related Jobs

Futureplay
Futureplay — Helsinki, Finland
[12.05.19]

Senior Game Programmer
Sucker Punch Productions
Sucker Punch Productions — Bellevue, Washington, United States
[12.04.19]

Camera Designer
Schell Games
Schell Games — Pittsburgh, Pennsylvania, United States
[12.04.19]

Experienced Graphics Engineer
LOKO AI
LOKO AI — Los Angeles, California, United States
[12.04.19]

Senior Unreal Engine Developer





Loading Comments

loader image