Hao Li is an assistant professor at USC who has been collaborating with Oculus Research on facial tracking while wearing a virtual reality head-mounted display. They presented their initial prototype and research paper “Facial Performance Sensing Head-Mounted Display” at SIGGRAPH 2015. Hao says this prototype proved that it’s possible to extrapolate occluded facial expressions with a combination of strain sensors and machine learning algorithms. They are now moving forward on the next iteration prototypes that should be with more consumer-ready prototypes. I previously covered Hao’s research in my write-up on my interview with Martin Breidt. Hao says that eye gaze is really crucial to having a successful social interaction in VR, and so it’s very probable that Oculus is working on integrating eye tracking in future consumer headsets. Hao talks about some of the next steps in his facial tracking research, and he’s really optimistic about the metaverse given how his research is helping facilitate the future of telepresence and social VR applications.
Listen
Become a Patron! Support The Voices of VR Podcast Patreon
Theme music: “Fatality” by Tigoolio
Subscribe to the Voices of VR podcast.
Rough Transcript
[00:00:05.412] Kent Bye: The Voices of VR Podcast.
[00:00:11.988] Hao Li: My name is Heli. I'm a professor at USC in computer science, working in the field of computer graphics and computer vision. And the research that we're doing since two years and even before is basically everything has to do with finding easy ways to create digital avatars and also to bring motion into the digital world. So basically capturing performances of faces, human bodies, and all these kind of things using really robust algorithms so that they can be deployed someday and not only work in studios.
[00:00:42.130] Kent Bye: So yeah, one of the problems with bringing sort of like emotions and expression into VR is to be able to do facial tracking while you have an HMD on your face. So talk a bit about, you know, some of the research you've done and approaches that you've taken in order to solve that really tricky problem.
[00:00:56.015] Hao Li: Right. So first of all, I was really delighted that Michael Avesh was actually showing a couple of shots of some of our latest SIGGRAPH papers on that. So it's basically research that we started to work on I would say a little bit more than a year ago. Well, actually a year ago we started working on that and pushed it really quickly to basically have the ability to enable potentially someday face-to-face interactions in a virtual world. So it starts with having the ability to integrate facial tracking capabilities on an HMD. The biggest challenge there is basically how can you do all this without any occlusion. So if you look at what they have out there in visual effects, in the gaming world or in consumer applications, the best possible solution is to use optical sensors, basically camera or 3D camera. And it's basically based on it sees your entire face. So it's like the idea of having video chats. The difficulty is if you're wearing those HMDs, then you can't really see what is actually happening underneath, especially when you have contacts between the HMD and your face. So you have a coupled problem here. Not only is your face occluded, but it's also pressing your face, so your facial expression changes a little. So one thing that we did was we found a way to integrate contact sensors that are basically just strain gauges. It's a technology that people were using back in the days for cyber gloves, basically. and we integrate those things onto the foam pads and they basically collect different signal changes while your facial expression is changing. So the tricky thing there is basically how do you map these sparse signals onto a high-fidelity facial expression.
[00:02:33.769] Kent Bye: Part of those SIGGRAPH papers, you also have like this kind of elephant arm coming off of the HMD with a 3D depth sensor camera in order to get a consistent look at no matter where you're turning your head, you're getting a consistent view of that lower face. So is that something that would be, you think, required for a solution like this to do like full facial tracking within, you know, while wearing virtual reality?
[00:02:55.911] Hao Li: Well, definitely not. I mean, the form factor is actually a no-go for that one. Basically, in this research project, the main goal was basically just to show a proof of concept that it's actually possible, that using cheap materials, low-cost hardware, we can actually achieve the goals that we wanted, basically just to get believable facial expressions there. Now that we know that it's possible, we're already working on the next version, also in collaboration with Oculus, to basically have, you know, more ergonomic form factors, different types of senses, which I won't mention yet.
[00:03:27.795] Kent Bye: Okay, great. And is this something that would be potentially within the consumer version of Oculus, or is it sort of so far out that you're still working on it?
[00:03:35.535] Hao Li: Well, from my standpoint, I mean, I'm not working for Oculus, so I can't speak for what they are planning to do. From my standpoint is that there are still significant research challenges ahead to get these things into consumers. But on the other hand, from my experience, things change very quickly, right? And if there is a demand, people would certainly want these kind of capabilities. There's a lot of promising startups like AllspaceVR, High Fidelity is doing some stuff in that space. Basically, We just saw the talk from John Carmack, right? He was mentioning that you want to have like a cinema experience with having a social experience in there, you know, being able to see your friends, you know, as a floating head or something. Without the expressions, it's not going to work.
[00:04:19.955] Kent Bye: So it sounds like you're doing a little bit of regression analysis or some sort of statistical probabilistic look at, given one facial expression, this is what is most likely following it. Maybe you could talk a bit about that data-driven approach to be able to extrapolate what the most likely facial expression might be.
[00:04:36.774] Hao Li: Yeah, so the approach that we had is basically we need to find a mapping between these signals that, I mean, are different between different people. Because once you wear the HMD, the measurements are on different locations. So we need a training process to basically find a mapping between what signals we have and what expression a person has. What's very difficult is that to enable this you have to take off the HMD display and do the measurement and put back the HMD display. So once you put back the HMD display the entire weight changes. So we have to do another calibration there basically to match the weight of your HMD. I can be very open about the limitations. Right now the system, you know, it creates believable expressions in a very controlled environment. So if, for example, you move your head around, the weight still changes and we're not yet compensating for these kind of drifts. So what you're saying is that because when you move your head around the HMD is actually changing your facial expressions Even if you're not changing it because if you move your head up then the HMD because of its weight is actually pressing down It's affecting basically the screen Sensors that are between your face and the foam pad and that causes like some noise in the facial expression So it's not something that's very useful for now, but hopefully we'll find a better solution for that
[00:05:52.101] Kent Bye: And another thing that I think is pretty crucial of doing facial tracking is actually tracking the eyes and where people are actually looking. And so is that something that you would try to approximate, or is this something that you're also trying to do with actual sensors inside to be able to track the eyes?
[00:06:08.689] Hao Li: So you're talking about eye gaze, right? Yeah. So eye gaze, I don't think you want to approximate. You really want to measure where the eye gaze is looking at. There are a couple of solutions out there for, you know, they have integrated cameras that are looking at where your eyes are looking at. Possibly Oculus is working on something. There's another startup company called Fove. They already have a commercial solution or a developer solution where you can have integrated cameras into your eyes.
[00:06:33.316] Kent Bye: So you presented some research at SIGGRAPH this year and so maybe talk about, you know, SIGGRAPH and how it fits in sort of like the research part of virtual reality.
[00:06:42.473] Hao Li: Yeah, I mean, SIGGRAPH is the premier conference for computer graphics. Computer graphics went through a lot of ways of innovations and industry changes. Started off with, you know, design, CAD, computer-aided design. Later on became, you know, really, really popular with all the VFX industry, playing a significant role. Later on with computer games, and later on with more computer vision-related aspects and 3D printing. But I think since last year, virtual reality is becoming a dominant topic at SIGGRAPH. I think it's important because it's also driving the industry in graphics. There's going to be a lot of real-time graphics-related things, display technologies are going to be hot research topics, as well as anything that has to do with computer vision, tracking cameras, tracking persons, and also building digital avatars automatically. So we have a lot of recent research on creating realistic facial avatars' faces, including hair and all these kind of techniques.
[00:07:40.373] Kent Bye: So what's next for you in your research and where you go from here?
[00:07:43.295] Hao Li: Yeah, so I think research-wise, we're definitely pushing more toward the ability to create really realistic avatars and also find better ways to do performance capture. Algorithmically, the direction that we're going is basically using a lot of insights from machine learning. We have the ability to generate a lot of data using traditional computer graphics techniques. So those are the fields of studies that we're looking into. We're working closely with industry basically to deploy our techniques as fast as possible. And yeah, who knows? So I think there's going to be a lot of interesting things, not only in VR, also in AR and anything that has to do with mobile technologies.
[00:08:23.174] Kent Bye: Just to follow up on that point of the realistic avatars, does that mean, you know, in terms of the Uncanny Valley, are you trying to go towards photorealism, or are you trying to stylize it, or maybe a few comments about how you deal with the Uncanny Valley?
[00:08:34.667] Hao Li: Right, so I think realistic avatars might take a little while, but I don't think it's like completely science fiction, it's gonna happen. Obviously, the lower hanging fruits are stylized avatars, but we're talking more about functional 3D models. Non-functional 3D models are, for example, if you're just scanning a surface, so you've probably seen some of the really new research components that we're doing at Oculus Research, where they have the ability to scan deformable people. I think one of the next challenges is how do you get real clothing models, real hair, eyes, and things that are actually functional, so not only just a 3D surface from the person.
[00:09:10.720] Kent Bye: Great. And finally, what do you see as the ultimate potential of virtual reality and what it might be able to enable?
[00:09:16.541] Hao Li: Well, an alternate reality. No, I hope it's going to be something like the internet, right? So where people would, I don't know if people would spend all their days inside, but they could definitely experience something new. And I think it will go probably beyond gaming, movies. These are like the ultimate low-hanging fruits. But I would really like to see how people are interacting with these kind of social worlds and changing the way we do communication, I think. you know, it's something that researchers have been dreaming since 20 years and finally we have we're looking at something where Maybe now it's the time to actually do something about it Great.
[00:09:51.258] Kent Bye: Well, thank you. Yeah. Thanks a lot. And thank you for listening If you'd like to support the voices of VR podcast, then please consider becoming a patron at patreon.com slash voices of VR