Mahalo Daily: MD194
CG Emily, Image Metrics, and the Uncanny Valley
Leah D'Emilio drops by Santa Monica-based Image Metrics, a motion-capture company on the cutting edge of rendering realistic faces, used on feature films like Harry Potter and the Order of the Phoenix, and video games like Grand Theft Auto IV. Leah also dons the headcam used in performance based facial animation and discusses the Uncanny Valley.
Mahalo Daily: MD203
Virtual Reality, HDR, Photogrammetry at ICT
Leah D'Emilio learns about a whole new approach to visual effects in film. Dr. Paul Debevec, innovator of HDR photography and creator of photogrammetry used in "The Matrix," takes us through USC's Institute for Creative Technologies to explore the future in virtual filmmaking and special effects.
Building the Specular Normal Map:
Computing the vector halfway between the reflection vector and the view vector yields a surface normal estimate for the face based on the specular reflection. Here we see the face's normal map visualized in the standard RGB = XYZ color map.
The four images of the specular reflection under the gradient illumination patterns let us derive a high-resolution normal map for the face. If we look at one pixel across this four-image sequence, its brightness in the X, Y, and Z images divided by its brightness in the fully-illuminated image uniquely encodes the direction of the light stage reflected in that pixel.
This image of Emily is also lit by all of the light stage lights, but the orientation of the polarizer has been turned 90 degrees which allows the specular reflections to return. You can see a sheen of, and the reflections of the lights are now evident in her eyes.
This image shows the combined effect of specular reflection and subsurface reflection; to model the facial reflectance we would really like to observe the specular reflection all on its own. To do this, we can simply subtract the diffuse-only image from this one.
Scanning Emily's Teeth:
We did one more piece of 3D scanning for the Emily project: a plaster cast of Emily's teeth provided by Image Metrics, adapting our 3D scanning techniques to work with greater accuracy in a smaller scanning volume.
Lifelike Animation Heralds New Era For Computer Games
'Emily' will set a new precedent for photo-realistic characters in video games and films, says her creator, Image Metrics
By Jonathan Richards
Extraordinarily lifelike characters are to begin appearing in films and computer games thanks to a new type of animation technology. Emily - the woman in the above animation - was produced using a new modelling technology that enables the most minute details of a facial expression to be captured and recreated.
She is considered to be one of the first animations to have overleapt a long-standing barrier known as 'uncanny valley' - which refers to the perception that animation looks less realistic as it approaches human likeness.
Researchers at a Californian company which makes computer-generated imagery for Hollywood films started with a video of an employee talking. They then broke down down the facial movements down into dozens of smaller movements, each of which was given a 'control system'.
The team at Image Metrics - which produced the animation for the Grand Theft Auto computer game - then recreated the gestures, movement by movement, in a model.
The aim was to overcome the traditional difficulties of animating a human face, for instance that the skin looks too shiny, or that the movements are too symmetrical.
"Ninety per cent of the work is convincing people that the eyes are real," Mike Starkenburg, chief operating officer of Image Metrics, said.
"The subtlety of the timing of eye movements is a big one. People also have a natural asymmetry - for instance, in the muscles in the side of their face. Those types of imperfections aren't that significant but they are what makes people look real."
Previous methods for animating faces have involved putting dots on a face and observing the way the dots move, but Image Metrics analyses facial movements at the level of individual pixels in a video, meaning that the subtlest variations - such as the way the skin creases around the eyes, can be tracked.
"There's always been control systems for different facial movements, but say in the past you had a dial for controlling whether an eye was open or closed, and in one frame you set the eye at 3/4 open, the next 1/2 open etc. "For instance, you could be controlling the movement in the top 3-4mm of the right side of the smile," Mr Starkenburg said.
For many years now, animators have come up against a barrier known as "uncanny valley", which refers to how, as a computer-generated face approaches human likeness, it begins take on a corpse-like appearance similar to that in some horror films. As a result, computer game animators have purposely simplified their creations so that the players realise immediately that the figures are not real.
High Resolution Face Scanning for "Digital Emily"
A collaboration between Image Metrics and the USC Institute for Creative Technologies Graphics Lab | Excerpt:
Over the last few years our lab has been developing a new high-resolution realistic face scanning process using our light stage systems, which we first published at the 2007 Eurographics Symposium on Rendering.
In early 2008 we were approached by Image Metrics about collaborating with them to create a realistic animated digital actor as a demo for their booth at the approaching SIGGRAPH 2008 conference.
Since we'd gotten pretty good at scanning actors in different facial poses and Image Metrics has some really neat facial animation technology, this seemed like a promising project to work on.
Image Metrics chose actress Emily O'Brien to be the star of the project. She plays Ms. Jana Hawkes on "The Young and the Restless" and was nominated for a 2008 daytime Emmy award. Emily came by our institute to get scanned in our Light Stage 5 device on the afternoon of March 24, 2008. The image to the left shows Emily in the light stage during a scan, with all 156 of its white LED lights turned on.
Our most recent process requires only about fifteen photographs of the face under different lighting conditions as seen to the right to capture the geometry and reflectance of a face.
The photos are taken from a stereo pair of off-the-shelf digital still cameras, and a small enough number of images is required, everything can be captured quickly in "burst mode" in under three seconds before the images even need to be written to the compact flash cards.
Most of the images are shot with essentially every light in the light stage turned on, but with different gradations of brightness. All of the light stage lights have linear polarizer film placed on top of them.
The top two rows show Emily's face under four spherical gradient illumination conditions and then a point-light condition, and all of these top images are cross-polarized to eliminate the shine from the surface of her skin (her specular component). What's left is the skin-colored "subsurface" reflection, often called the "diffuse" component: this is light which scatters within the skin enough to become depolarized before re-emerging. The right image is lit by a frontal point-light, also cross-polarizing the specular reflection.
Separating Subsurface and Specular Reflection:
Here is a closeup of the "diffuse-all" image of Emily. Every light in the light stage is turned on to equal intensity, and the polarizer on the camera is oriented to block the specular reflection from every single one of the polarized LED light sources. Even the highlights of the lights in Emily's eyes are eliminated.
This is about as flat-lit an image of a person's face as you could possibly photograph. And it's almost the perfect image to use as the diffuse texture map for the face if you're building a virtual character.
The one problem is that its polluted to some extent by self-shadowing and interreflections, making the concavities around the eyes, under the nose, and between the lips somewhat darker and slightly more color-saturated than they should be. Depending on how you're doing your renderings, this is either a bug or a feature.
For real-time rendering, it can actually add to the realism if this effect of "ambient occlusion" is effectively alreaddy "baked in". If new lighting is being simulated on the face using a global illumination technique, then it doesn't make sense to calculate new self-shadowing to modify a texture map that already has self-shadowing present.
In this case, you can use the actor's 3D geometry to compute an approximation to the effects of self-shadowing and/or interreflections, and then divide these effects out of the texture image. This image also shows the makeup dots we put on Emily's face which help us to align the images in the event there is any drift in her position or expression over the fifteen images; they are relatively easy to remove digitally. Emily was extremely good at staying still for the three-second scans and many of her datasets required no motion compensation at all.