In creating our new VR project Séance: The Unquiet, we set out to evolve cinematic storytelling for virtual reality to deliver something as close to the experience of watching a movie as we could but using the medium of VR to its best advantage.
For a hundred years filmmakers have crafted every shot to tell audiences where to look: from framing and composition to lighting and depth of field, every technique of film has been used to better guide the eye to the most important elements on the screen.
What we’ve done with Séance is to completely reverse that: we have crafted each scene dynamically to be aware of where the audience chooses to look. It’s a fundamental evolution in cinematic storytelling that requires the expertise and technologies of videogame development to pull off. (In our case, we used the Unreal game engine.) From a grandfather clock that sounds different when you look at it to an intricately designed crescendo of music and sound design that rises dynamically in pitch and volume with the angle of your head as you turn to discover a ghost behind you, Séance shapes its experience to the audience’s gaze. The result is a new kind of storytelling for a new kind of medium.
In this series of blog posts I'll summarize these and other techniques of cinematic storytelling we have evolved and utilized in our project. If you want to see some of them in action, we have released a free five-minute preview of Séance for the Oculus Rift and the HTC Vive.
Simply put, character presence is the feeling that you are sharing space with another human being -- or at least a human character. It's a key part of creating powerful immersion in VR when using characters in your experience. When we started this project, we took a look at lot of VR demos that featured characters such as Coffee Without Words by Tore Knabe and concluded that this was something we had to get right. As a result we made a signifciant investment in the creation and usage of character presence in VR because we believe this is the single most important aspect of doing VR storytelling well.
Any narrative audience has a need to connect emotionally with the characters and the immersive nature of VR is particularly unforgiving of characters executed at a lower level of production quality. Some VR games that are otherwise really fun and impressive just fall down when it comes to watching an animated character trying to deliver a performance.
We made a handful of choices early on that we believe are paying off in creating character presence: stylized art direction, motion capture, character head tracking, and proximity.
It's pretty well understood in videogames and filmmaking that the quest for photorealistic CGI has led straight into the uncanny valley. Advances in rendering skin, hair, clothing, and eyes have all been impressive but it's still really hard to make credible photorealistic characters, especially in videogames where we're using realtime rendering engines that cannot achieve the same quality as pre-rendered CGI such as that of Pixar movies.
The problem is made worse in VR for two reasons. Performance constraints prevent us from using our best lighting and shadow technology, and seeing an animated character up close on a VR screen really exposes the quality limitations we're used to living with in videogames. An animated character in a mid-range videogame looks fine on your monitor or TV, but the same character in VR standing right in front of you looks like a horribly crude puppet.
We sidestepped this issue by embracing a slightly stylized look for our characters. One of the key inspirations for our character artist, Charlie Baker, was the first Dishonored game. Their approach resulted in characters with visual personality who could deliver dramatic performances while still having a stylized look.
The main character seen in our Séance preview, Colonel Iain Munro, is the result of that thinking:
Our approach is serious enough that Munro can convey emotion and be a credible dramatic character, but stylized enough to look great even in VR, all without risking the uncanny valley.
The art direction of the character only gets us so far. The other challenge for characters in VR is animation and our solution to this is motion capture. There are two main reasons why we took this approach.
The first is that we simply need a very large quantity of animation to do cinematic storytelling. With four characters and an expected running time of about twenty minutes, that's approximately eighty minutes of animation. The only way we could achieve that scale of production on our budget was with motion capture.
But the other reason is that we wanted the level of credible human-motion detail that mocap could deliver even under the immersive proximity you get in VR. We can deliver real facial expressions that communicate emotion. Even the subtle movement of a person shifting their weight while standing is both accurately captured and apparent to the audience in VR.
We have no doubt a skilled team of cinematic animators could accomplish this. But for a small team, motion capture was the way to go.
For Séance we invested in both a body motion-capture solution from Optitrack (using 32 Prime 13 cameras) and a facial motion-capture solution from Faceware (the GoPro Headcams) to ensure we could capture the subtleties of movement and expression. We also invested in wireless lavalier microphones for each headcam to record dialogue. This combined solution enables us to capture four performers simultaneously on a soundstage, letting our actors play their scenes together and really react to each other as they would on a movie set. People speak differently when they move than when they're standing still. They speak differently when talking to someone than when just talking into a microphone. This combination afforded us the ability to capture more natural performances.
We did a test shoot at a local soundstage, shown below. (The various curtains are there to improve the audio quality of our recordings.) Note that we built our prop table out of PVC pipe with a mesh top so the cameras could shoot through it to record the markers on seated performers:
While we need a soundstage for a full shoot, we set up a smaller version in our offices suitable for one or two performers at a time. This room, which we call the Holosuite, has been great for pickups and single-character shoots. (For example, we shot the motion capture for our demon character in here.) Our sound designer, Keith Sjoquist, also acoustically treated the room so it doubles as a sound booth for recording dialogue, as shown below:
The monitor is connected to the computer of our technical animator, Douglas Connors, who designed and assembled our whole motion-capture solution. Motion-capture suits and Faceware helmets are a far cry from costume and makeup. Motion Builder and OptiTrack work together to deliver live feedback on the monitor in our HoloSuite, which is an added bonus. Performers can see themselves on screen driving the models, which can create a wonderful way for them to get into character. We can also make immediate adjustments to better align body motions with character motions.
Rigging a character for motion-capture animations is a complex task. We worked with Centaur Digital in Chennai, India, to rig our characters as their team had good mocap experience. The result was a rig with 93 joints in the face and 82 in the body. We were initially concerned about how performant such a rig would be in Unreal, but it ran smoothly even with multiple characters on screen in VR.
Our control UI in Maya for facial animation
Individual joint controls for the face and the face tracking map
The lion's share of the animation for our characters was handled through Maya. It effectively acted as our hub, as every animation was eventually connected there. For our body motion capture, the animation was mostly handled in Motion Builder and then exported to Maya. Facial motion capture was also handled in Maya, albeit on a separate rig, which allowed us to animate the face on a static body. In the end, body and face animation were married into finalized pieces which then would be sent to Unreal. To say it was a complex process is a mild understatement.
When we recorded our performers' faces, we used Faceware's tracking program. They specifically say their software doesn't need markers on the performer's face. We used them anyway. The markers aren't there for the software, they're there for the user who needs to fine-tune the facial animations to deliver the intended performance. It maintains consistency, especially where there aren't a lot of reference points such as on the cheeks.
Faceware makes a fairly robust program that didn't flinch at the sheer mass of data we fed it. Animating a face is a tricky business. The closer to actual human movement you get, the closer you get to the uncanny valley. The trick seemed to be smoothing out the animations enough to fall just short of being creepy.
While motion capture is our primary solution, it's not the only one. Some key character moments were created from scratch by our lead animator, Travis Howe. Travis animated our ghost and also handled our main character's body movements when sitting at the table, where the hand and finger gestures were particularly tricky. For that scene, Travis animated the body while we used the Faceware system to capture the facial performance and lip synch.
We also animated the animator! For our demon character who appears briefly outside a window, Travis suited up and performed the character in motion capture. Travis is a brilliant character animator and he was able to deliver a strong physical performance for this imposing role.
In the first installment of this series, I briefly described how we developed a feature to have an animated character look directly at the audience when appropriate. Here's some more detail on how we implemented this.
The hard part of this feature was doing it successfully within a longer animation. So if we had a ten second animation, and we wanted the character to look at the audience between :03 and :06 seconds, we had to modify the animation on the fly to achieve the effect.
Our first attempt involved using an aim-blend animation to get the character's head to face the audience. We created a set of keyframes for the acceptable extremes of movement for the head and eyes and then blended between the actual animation and the keyframes that were relevant given the angle to the audience. Unfortunately, this caused the character to tilt his head at a weird angle as the animation played. The problem was that the root of the character in Unreal was offset from the mesh, which was being moved around the environment by the animation. This offset between the root and the mesh resulted in bad behaviors when trying to blend keyframes into the animations. The result was both unusable and rather unnerving. (Since we released our preview, we believe we have found a way to compensate for this offset so that the root and the animated mesh remain aligned. Our investigation into this solution is still in progress.)
Programmer Michael Robbins decided to tackle the problem more directly by overriding the skeleton bone positions manually. During the character’s update, if head tracking is enabled, Michael's code determines where the head and eyes are facing thanks to a trio of sockets he added to the skeleton. Then his code finds out where the audience's headset is located and calculates how far off the head and eyes are from pointing at the headset. Finally, he runs that through an interpolation function and stores those values as the desired pitch and yaw deltas for each eye and the head.
During the animation update those desired values are read out and the deltas are applied to each bone. Since we are interpolating the deltas, there is a smooth transition when the character goes from following the animation to tracking the audience and back again. This feature allows the animators to add events in their animation blueprint to turn player tracking on and off instead of having to modify the animation itself.
Here's a sample of Michael's Unreal blueprint for head tracking. This snippet governs the right eye. Click to enlarge:
Our solution for head tracking only works within fairly narrow limits. If Séance was a roomscale VR project, our character could not reasonably follow all the possible audience positions. While we aren't going to roomscale with this project, we expect that using an IK system to support upper-body movement during head tracking would go a long way to making this feature more robust.
Simply put, our characters look their best close up. Initially we intended the Munro character to make his entrance from the foyer at the far end of the great hall in which our Séance preview takes place. He was to step around the corner and deliver a dramatic monologue, then cross the room still speaking until he reached the table where the audience sits. In a movie you could do this easily, starting with an establishing shot to show the character entering and then a closer shot to better deliver his performance.
Of course, in VR we weren't going to zoom the camera or cut to different shots due to the nausea and discomfort that would result for many people. And when our character was that far away, his entire face was just a few pixels high on the little screen in the VR headset. The impact of the performance was completely lost.
We kept bringing Munro closer and closer until we found a good distance to introduce him, close enough that you can really read his face and his body language, but just far enough away that his entrance can be a surprise and he can use his whole body to communicate.
We then brought him even closer by seating him at the table for another scene. We began that scene by having him leaning into the audience's view as he takes a seat because we found it produces an interesting effect on the audience. People lean back a little bit or are sometimes even startled because the character is almost intruding on their personal space. It's a fascinating and instinctive reaction and we thought it was worth using to build that sense of character presence. When you do something in VR that produces a physical or emotional reaction, you should consider it carefully -- if it's not too unpleasant, it can be really useful in grounding the audience in their virtual body and helping them relate to the characters.
We have tested the Seance preview with hundreds of people. One way we know our investment in character presence is paying off is in how they react to the second scene of Munro sitting at the table across from you. Often the audience will mirror his body posture just as anyone does when talking to people in real life. When he gestures, they look in that direction. These kinds of natural responses to an animated character are how we know this is working.
We believe our preview of Séance demonstrates how you can deliver strong character presence in VR. In our next and final installment, we're going to pull all of our topics together to analyze the most powerful moment in our preview of Séance: the ghost.