#448: Mobile AR & VR Tracking with uSens

uSens is working on gestural input and inside-out tracking for both AR and VR. I had a chance to catch up with uSens’ vice president of product development, Yiwen Rong, at GDC in March to talk about some of their current tracking research and larger goals driving their work. In June of 2016, uSens secured a $20 Million round of Series A Funding led by Fosun Kinzon Capital to continue their advancements of mobil AR and VR tracking, and recently released more information about their Fingo sensors in August.

LISTEN TO THE VOICES OF VR PODCAST

Here’s an earlier version of the demo that I saw at GDC that seamlessly goes from AR to VR. The primary object is growing in size and created a bit of change blindness effect in that I didn’t even notice the transition from AR to VR:

Subscribe on iTunes
Donate to the Fatality & Summer Trip

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. My name is Kent Bye, and welcome to The Voices of VR Podcast. So, when I was at GDC this spring, I had an opportunity to check out a demo that really blew my mind. It was this demo by Usense, where I had this virtual reality head-mounted display that had some pass-through cameras. It started off just seeing a normal pass-through camera view of a scene. It was essentially this statue that was sitting on a table. And then all of a sudden, the statue came to life. It went from a single color to fully colored and just felt like it was a whole vibrant scene that was unfolding. And the thing that really tripped out my mind was that as I was focusing on this object, they had then cut out all the other real-time footage of the scene. And they completely placed me into a VR scene. It's the first time that I had seamlessly gone from an AR to a VR experience. And it was able to trick my mind so much that I hardly even noticed until they pointed it out to me. And then it just was like, oh my god, I have to see that again because I didn't even notice that transition. And so USense has been working on a lot of these tracking technologies to be able to do inside-out tracking and positional tracking from within the headset, which they had working. It was on a tethered device, and they've been working on trying to bring down the power consumption of that. And they also had a number of different hand gestures that they were working on as well. And a few months after this interview on June 1st, Usense actually raised a round of $20 million. And so they've been continuing to develop this technology, but had a chance to talk to the vice president of product development, Yuan Rong at GDC. And so that's what we'll be covering on today's episode of the Voices of VR podcast. But first, a quick word from our sponsor. This is a paid sponsored ad by the Intel Core i7 processor. You might be asking, what's the CPU have to do with VR? Well, it processes all the game logic and multiplayer data, physics simulation and spatialized audio. It also calculates the positional tracking, which is only going to increase as more and more objects are tracked. It also runs all of your other PC apps that you may be running when you're within a virtualized desktop environment. And there's probably a lot of other things that it'll do in VR that we don't even know about yet. So Intel asked me to share my process, which is that I decided to future-proof my PC by selecting the Intel Core i7 processor. So this interview with Yuan happened at GDC that was happening in San Francisco from March 14th to 18th. So with that, let's go ahead and dive right in.

[00:02:53.238] Yiwen Rong: So my name is Yuan. I'm a VP of Product Development at Usense. So, we at U-Sense focus ourselves on human-computer interactions of VR. We focus ourselves on solving problems of 3D hand tracking, positional tracking, and also 3D objects recognition technologies that can help VR to push VR to be a mainstream product.

[00:03:15.243] Kent Bye: Great. And, you know, there's a demo that I just saw where you're in augmented reality, so you have like a VR HMD with a pass-through camera, and you're looking at a statue. And then there's this transition going from just a pure reality into a mixed reality and then into an augmented reality. And that was pretty fascinating for me to transition from an AR to a VR experience. It was sort of like, oh, wow, all of a sudden, I'm in a VR experience. And I almost didn't even see it because I was just focusing on this object. And so maybe you could talk a bit about what you're able to do in terms of using your camera technology to be able to seamlessly go from an augmented reality experience to a VR experience.

[00:03:53.275] Yiwen Rong: Yes, so basically our vision behind it is really, we think the holy grail of mixed reality or augmented reality is really seamlessly mixing the digital world with the real world. So most of the people in this industry align with us. The way we're doing it right now is using cameras to do video see-through to an internal screen. to be able to project the augmented reality information to you. We call this super reality for ourselves. One thing I want to point out is if you want to overlay digital information perfectly with the virtual information, you really have to do the tracking in the right way. When I say tracking, I mean you have to detect your environment correctly, you have to do your positional tracking correctly, and if you want to interact with all the information that is displayed, you have to do 3D hand tracking properly.

[00:04:49.138] Kent Bye: How are you doing the positional tracking? Because we have a tethered HMD that I just looked at and I was able to look around and have positional tracking. So are you doing something like inside-out through the camera positional tracking? Or do you have a camera that's external to the computer that's giving you that position?

[00:05:06.307] Yiwen Rong: So we're doing inside-out positional tracking. We use two RGB cameras to do the positional tracking as well as doing the AR integration for the user.

[00:05:15.330] Kent Bye: OK, so it sounds like you could potentially take this technology and put it onto a mobile HMD and be able to get positional tracking.

[00:05:21.732] Yiwen Rong: Is that right? That's correct. That's actually what we're working very hard to. So I just want to bring up the reason, the rationale behind why our founders found this company. So back in 2013, 2014 time frame, when we started to see Oculus DK1 and DK2, our first reaction is, hey, this is really cool. This can be a game-changing device. Our second reaction is, hey, where's our hand? We want to be able to see our hand and we want to be able to track all the movements a human body can associate with a user in order to really display everything correctly. And if we want to do that and let a lot of people use it, we have to do it on a mobile platform. So from day one, we based our algorithm development on mobile platform. We try to save energy, we try to save computing power, and we try to work really hard on latency. And that's our goal, put everything on the mobile.

[00:06:17.112] Kent Bye: And so what were some of the problems that you were trying to solve in terms of computer vision that you feel like you were able to make some innovations with uSends?

[00:06:25.155] Yiwen Rong: There are two things. I think one thing is really how to do an effective computer vision algorithm in a very power-constrained and computing-constrained environment. That is very important. Because with most of the PCs, it's already very powerful enough to give you perfect hand tracking with a typical old device. That's one thing. And the second problem is really associated with the use case. When you put your HMD on, for example, your hand tracking, the effective hand tracking range is anywhere between 10 to 70 centimeters. And if you look at most of the devices out there today, that gives you sort of a hand tracking like Intel RealSense or Microsoft Keynet. they can work very well beyond 50cm. But within 50cm is really, when you use VR, within 50cm is where you need to work on. Then you have to do a lot of design changes and innovations on how to let your system be able to recognize your hands correctly in the short range.

[00:07:26.405] Kent Bye: And another thing that you had mentioned is that in order to improve the signal to noise ratio to actually detect your hands, you have some LEDs. Maybe you could talk a bit about what you're doing in terms of shining additional light or illuminance and to be able to detect the hands better.

[00:07:41.494] Yiwen Rong: Oh yeah, so we use infrared LED to shine infrared light to your hands. This is exactly the kind of places where we see that new type of design is important. When your hand is close and you shine light on your hands at the proximity, then it's very easy to get very clear field of view of the objects that you want to recognize.

[00:08:00.890] Kent Bye: And one of the things that I noticed just in terms of the ergonomics of one of these HMDs is that it felt like it was heavy enough that I had to actually use my other hand to support it. Otherwise, it was kind of like pushing down on my face. And so in terms of weight considerations, what are some of your challenges that you see there in terms of making this so that it may not weigh down the HMD significantly?

[00:08:22.370] Yiwen Rong: Yeah, that's a good question. So just for the record, for all the demos you try, it's all 3D printed. So they're very heavy to start with. And if you look at the evolution of GFVR, they did two things. Number one is they use light materials, and they use less materials. That's one thing. Another thing is the ergonomics for the head straps. Whether you feel comfortable or not, it's not 100% correlated with the weight. It's really more of correlation with where you feel your weight. If you put your weight somewhere over here, on the back of your head, a human being tends to have a very strong tolerance of the weight. For example, a good example is HTC. HTC Vive is much heavier, even heavier than our current version of the demo. But because we designed very intelligently to put most of the weight on the back of your head, people almost don't feel it. That's also a very important thing. And for us, the fact that we're really a computer vision-based company, we do solutions for gesture-based control and positional tracking. We are working with all the major OEMs to do this, to provide gesture-based solutions for them. And as long as they're making progress on the ergonomics of their HMD, the experience of using our module will improve as well.

[00:09:41.807] Kent Bye: And D4C actually producing hardware, or is this something like the software that you're producing may be going into some of these larger mass-produced HMDs that are already out there?

[00:09:52.283] Yiwen Rong: Yes, that's a good question. So eventually, to start with, when the company gets started, we really want to be able to integrate our technology into the existing camera systems that a cell phone or a standard HMD maker can have. But surprisingly, Most of the HMD makers back then, like two years ago, they don't realize in order to do gesture-based control or positional tracking in an effective way, what are the configurations of your cameras has to be. So that's a very interesting fact. And in order to demo our technology and not be able to find an existing module that can be used for our purposes, then we have to design our own computer vision modules ourselves. So that's the status quo in the past few years and even this year. But one thing we observed by talking to a lot of folks in the industry, we see this trend of people starting to realize that, number one, gesture-based control is important, number two, personal tracking is important, number three, in order to achieve those features, what is the right configuration for computer vision module? I think it still will take two to three years to make the big guys really move and be able to come up with their own designs. I was almost joking when we talked to one of the major headset makers and said, hey, here's our design. Please take it and manufacture it on your own. So I think that's going to eventually happen in the next two to three years. And then we can really leverage the strong supply chain of the big companies and be able to apply our software in a much broader base.

[00:11:27.758] Kent Bye: Yeah, and imagine, you know, one of the big open problems with the Gear VR is that it has no positional tracking. And, you know, coming out later this year, Google has its Project Tango phone that's going to be, you know, the first consumer phone that has these 3D depth sensor cameras. Is that something that you've looked at trying to integrate your technology and your computer vision software and algorithms into a Project Tango phone that may be tethered to a mobile HMD?

[00:11:52.220] Yiwen Rong: I think Google Tango also, they use computer vision technology plus IMU Fusion to do positional tracking. We think the technology is awesome. The last generation, the problem with the last generation is it's still very, too much computing heavy. That means even though they use a mobile platform, but it's Tegra K1. It's a very powerful platform that consumes a lot of power. we actually take a slightly different approach compared to them. We first try to cut or shave the power consumption and then improve the performance. And what they're doing is they first make an awesome product and then try to see where they can save power. But eventually, yes, I think technologies like Google Tango or like our personal tracking will eventually be a very good solution to all the HMD makers to have good personal tracking. We just bought HTC Vive a few days ago. It takes us almost four hours just to set up the personal tracking lighthouse. and that has to change. We believe without technology you don't have to do that. You just put on a goggle and everything works. No calibration, no nothing.

[00:13:00.013] Kent Bye: And in terms of gesture controls, what type of gestures do you have integrated into your system?

[00:13:05.718] Yiwen Rong: Right now, we have very simple gesture, like point click, and push, and turn, and all the basic stuff. And the good thing about our gesture control is there are two things. One thing is you can develop who can really define. Once they get our SDK, they can define the gestures themselves. They can use any gesture as an input in the app development process. Another thing is we have physical collision detection. That means you can have the most natural interaction with all the virtual environment and virtual systems. For example, you can really pick up some virtual objects with your hand. We will be able to detect the position of your hand and the position of the objects. And that is really the holy grail and the second step of the gesture-based control. So I think the first step is already giving millions of possibilities to our developers for them to integrate any gesture they want.

[00:13:59.125] Kent Bye: What were some of the biggest applications that you see people using this? What type of problems do you think that this could solve?

[00:14:05.691] Yiwen Rong: The first thing we see is the video playback. Right now, if you use Gear VR, what you have to do is you have to use a touchpad. And the touchpad plane does not match your actual plane, so it's very hard to navigate. And for us, if you see any contents, any movies, you can use your hand to manipulate them directly. And that's the situation on Gear VR. The situation on Google Cardboard is even better. For Google Cardboard, most of the video playback apps use the IMU to control it. You have to stare at the playback button for 10 seconds to wait until it can start playing. And if you use gesture, you just click it, just point click. And you can navigate, you can fast forward, fast backwards, anywhere you like, just like you use remote control on the TV. And that's one thing. The second big category we see is really education. For education, almost anything else that requires low frequency, high precision control and interaction, that's what we can do best.

[00:15:10.315] Kent Bye: And finally, what do you see as kind of the ultimate potential of virtual reality and mixed reality and what it might be able to enable?

[00:15:18.062] Yiwen Rong: Yes, the reason we're in that industry is because we believe it's the next computing platform. When we talk to everybody else, we think what's actually going to happen is that in the next two years, the development of the industry will be slower than most people expected. But after year five, when we solve the three major problems on VR, number one is the display system. Number two is the computing power device. Number three is the tracking technology. In the next five years, I think the whole industry is going to really focus on solving these three problems. Once we solve these three problems, we will be having awesome VR experiences. And that's the real next computing platform we're talking about.

[00:15:56.525] Kent Bye: Okay, great. Well, thank you so much. Thank you. So that was Yuan Rong. He's the vice president of product development for Usense. So I have a number of different takeaways about this interview is that first of all, what they're doing with the inside out tracking is really interesting technology. And at the point that I tried it back in March, the HMD did have a lot of extra weight. felt like they needed to bring the weight down. And of course, the power consumption is a huge issue when talking about doing something on a mobile context. So that was something that they were also looking at. And so I think that this is technology that's likely going to get acquired and picked up at some point. I know that there's a lot of other companies that are looking at very similar things internally. And I think coming up here, Oculus Connect 3 and over the next couple of weeks, especially with Daydream is launching on October 4th. And there's other approaches with like the Project Tango in the long term, which I know that they've had their own solution from doing these inside-out tracking. So what Yuan was saying is that something like Tango was trying to, at first, focus on proving what is even possible with the technology, and then focus on trying to bring down the power and the weight down. I think this is something that we're going to see a convergence at some point sometime in the future. And just what I was able to see within this specific demo where I was wearing, essentially looked like a VR HMD, but I was able to have both an AR and a VR experience. So I think that the technology is eventually going to converge so that you can be able to do both at the same time. But actually the form factor of each of them is so different enough that I do think that there's going to be use case for doing AR glasses that is going to be different from VR. That in one use case, one is going to be used in public, which is going to be augmented reality. And the other is going to be used in private because you're just occluding your face. And it's just something that I don't think people are going to be walking around with a full VR HMD out in public. Even though there's some ideas that people may want to do that, I think the reality is that it doesn't look cool. It just feels like you may be putting yourself at risk. And there's just going to be different use cases and contexts for each of these technologies. the short term. In the long term I think maybe they'll just all kind of look like glasses that you put on and they can do a full VR experience or an AR experience and so that's I think where the trajectory is going and up until before we actually have like contact lenses where maybe it's even less perceptible that you have something that's augmenting your experience. But right now I think UCINS is got some solid technology, they've got some money that they're raised, and I think it's likely something that could be acquired. I'm not quite sure if they'll be able to have the resources to be able to do a full hardware production on their own and be able to compete with the other big players, but it's something that have a lot of really smart people that they're working on and pushing the limits on type of augmented in VR kind of blending that I haven't really seen anywhere else. And also, just to work on the hand gestures and the other computer vision problems is just a really hot topic in the whole field right now. And so they've got a lot of progress that they've gone so far. And since this interview, like I said, it was about three months after I did this interview where they raised $20 million. And that should last them for a bit, and we'll see what happens to them. So keep an eye out for use-ends. So that's all that I have for today. If you'd like to support the Voices of VR podcast, then you can spread the word by telling your friends and become a donor at patreon.com slash Voices of VR.

More from this show