#136: Martin Breidt on the Uncanny Valley & Facial Tracking within a VR Head-Mounted Display by Oculus Research

Dr. Martin Breidt is a research technician at the Max Plank Institute for Biological Cybernetics. His bio page says that he’s part of the Cognitive Engineering group where they “develop and use systems from Computer Vision, Computer Graphics, Machine Learning with methods from psychophysics in order to investigate fundamental cognitive processes.”

Martin only had time for a very quick 5-minute chat, but this was enough time for him to give me some pointers to his research about the uncanny valley effect as well as to some work is being done in order to capture facial animations while wearing a VR HMD. This led me to learn a lot more about the research that Oculus is doing in order to capture human expressions while wearing a VR HMD.

Martin named Hao Li as doing some very important work in being able to predict facial expressions with partial information based upon statistical models. Hao is an assistant professor of Computer Science at the University of Southern California, and he has a paper titled “Unconstrained Realtime Facial Performance Capture” at an upcoming Conference on Computer Vision and Pattern Recognition. Here’s the abstract.

We introduce a realtime facial tracking system specifically designed for performance capture in unconstrained settings using a consumer-level RGB-D sensor. Our framework provides uninterrupted 3D facial tracking, even in the presence of extreme occlusions such as those caused by hair, hand-to-face gestures, and wearable accessories. Anyone’s face can be instantly tracked and the users can be switched without an extra calibration step. During tracking, we explicitly segment face regions from any occluding parts by detecting outliers in the shape and appearance input using an exponentially smoothed and user-adaptive tracking model as prior. Our face segmentation combines depth and RGB input data and is also robust against illumination changes. To enable continuous and reliable facial feature tracking in the color channels, we synthesize plausible face textures in the occluded regions. Our tracking model is personalized on-the-fly by progressively refining the user’s identity, expressions, and texture with reliable samples and temporal filtering. We demonstrate robust and high-fidelity facial tracking on a wide range of subjects with highly incomplete and largely occluded data. Our system works in everyday environments and is fully unobtrusive to the user, impacting consumer AR applications and surveillance.

Here’s a video that goes along with the Unconstrained Realtime Facial Performance Capture paper for CVPR 2015

Hao Li is also the lead author on an upcoming paper at SIGGRAPH 2015 titled that is able to capture human expression even while wearing a VR HMD.

Facial Performance Sensing Head-Mounted Display
Hao Li, Laura Trutoiu, Pei-Lun Hsieh, Tristan Trutna, Lingyu Wei, Kyle Olszewski, Chongyang Ma, Aaron Nicholls
ACM Transactions on Graphics, Proceedings of the 42nd ACM SIGGRAPH Conference and Exhibition 2015, 08/2015

Three of the co-authors of the paper work at Oculus Research including Laura Trutoiu, Tristan Trutna & Aaron Nicholls. Laura was supposed to present at the IEEE VR panel on “Social Interactions in Virtual Reality: Challenges and Potential,” but she was unable to make the trip to southern France. She was going to talk about faces in VR, and had the following description about her talk:

Faces provide a rich source of information and compelling social interactions will require avatar faces to be expressive and emotive. Tracking the face within the constraints of the HMD and accurately animating facial expressions and speech raise hardware and software challenges. Real-time animation further imposes an extra constraint. We will discuss early research in making facial animation within the HMD constraints a reality. Facial analysis suitable for VR systems could not only provide important non-verbal cues about the human intent to the system, but could also be the basis for sophisticated facial animation in VR. While believable facial synthesis is already very demanding, we believe that facial motion analysis under the constraints of an immersive real-time VR system is the main challenge that needs to be solved.

The implications for being able to capture human expressions within VR are going to be huge for social and telepresence experiences in VR. It’s pretty clear that Facebook and Oculus have a lot of interest in being able to solve this difficult problem, and it looks like we’ll start to see some of the breakthroughs that have been made at SIGGRAPH in August 2015 if not sooner.

As a sneak peak, one of student Hao Li’s students, Chongyang Ma, had the following photo on his website that shows an Oculus Rift HMD that has a rig with a camera in order to do facial capture.


Okay. Back to this very brief interivew that I did with Martin at IEEE VR. Here’s the description of Martin’s presentation at the IEEE VR panel on Social interactions in VR

Self-Avatars: Body Scans to Stylized Characters
In VR, avatars are arguably the most natural paradigm for social interaction between humans. Immediately, the question of what such avatars really should look like arises. Although 3D scanning system have become more widespread, such a semi-realistic reproduction of the physical appearan ce of a human might not be the most effective choice; we argue that a certain amount of carefully controlled stylization of an avatar’s appearance might not only help coping with the inherent limitations of immersive real-time VR systems, but also be more effective at achieving task-specific goals with such avatars.

Martin mentions a paper titled Face Reality: Investigating the Uncanny Valley for Virtual Faces that he wrote with Rachel McDonnell for SIGGRAPH 2010.

Here’s the introduction to that paper:

The Uncanny Valley (UV) has become a standard term for the theory that near-photorealistic virtual humans often appear unintentionally erie or creepy. This UV theory was first hypothesized by robotics professor Masahiro Mori in the 1970’s [Mori 1970] but is still taken seriously today by movie and game developers as it can stop audiences feeling emotionally engaged in their stories or games. It has been speculated that this is due to audiences feeling a lack of empathy towards the characters. With the increase in popularity of interactive drama video games (such as L.A. Noire or Heavy Rain), delivering realistic conversing virtual characters has now become very important in the real-time domain. Video game rendering techniques have advanced to a very high quality; however, most games still use linear blend skinning due to the speed of computation. This causes a mismatch between the realism of the appearance and animation, which can result in an uncanny character. Many game developers opt for a stylised rendering (such as celshading) to avoid the uncanny effect [Thompson 2004]. In this preliminary work, we begin to study the complex interaction between rendering style and perceived trust, in order to provide guidelines for developers for creating plausible virtual characters.

It has been shown that certain psychological responses, including emotional arousal, are commonly generated by deceptive situations
[DePaulo et al. 2003]. Therefore, we used deception as a basis for our experiments to investigate the UV theory. We hypothesised that deception ratings would correspond to empathy, and that highly realistic characters would be rated as more deceptive than stylised ones.

He mentions the famous graph by Masahiro Mori, who was a robotics researcher who first proposed the concept back in 1970 in Energy. That article was originally in Japanese, but I found this translation of it.

I have noticed that, as robots appear more humanlike, our sense of their familiarity increases until we come to a valley. I call this relation the “uncanny valley.”

Martin isn’t completely convinced that the conceptualization of the uncanny valley that Mori envisioned back in 1970 is necessarily the correct one. He’s interested in continuing to research and empirically measure the uncanny valley effect through experiments, and hopes to eventually come up with a data-driven model of what works in stylizing virtual humans within VR environments so that they’re the most comfortable with our expectations. At the moment, this job is being through the artistic intuitions from directors and artists within game development studios, but Martin says that this isn’t scalable for everyone. So he intends on continuing to research and better understand this uncanny valley effect.