#544: Google Tango’s Engineering Director on AR Capabilities Enabled by Depth Sensors

Johnny-LeeAugmented Reality has played a huge role at the developer conferences for Microsoft, Apple, Facebook, and Google, which is a great sign that the industry is moving towards spatially-aware computing. Microsoft is the only company to start with head-mounted AR with the HoloLens while the other three companies are starting with phone-based AR. They are using machine learning with the phone camera to do six degree-of-freedom tracking, but Google’s Project Tango is the only phone solution that’s starting with a depth-sensor camera. This allows the Tango to do more sophisticated depth-sensor compositing and area learning where virtual objects can be placed within a spatial memory context that is persistent across sessions. They also have a sophisticated virtual positional system (VPS) that will help customers locate products within a store, which is going through early testing with Lowes.

I had a chance to talk with Tango Engineering Director Johnny Lee at Google I/O about the unique capabilities of the Tango phones including tracking, depth-sensing, and area learning. We cover the underlying technology in the phone, world locking & latency comparisons to HoloLens, virtual positioning system, privacy, future features of occlusions, object segmentation, & mapping drift tolerance, and the future of spatially-aware computing. I also compare and contrast the recent AR announcements from Apple, Google, Microsoft, and Facebook in my wrap-up.


The Asus ZenPhone AR coming out in July will also be one of the first Tango & Daydream-enabled phones.

A video of one of the Tango Demos at Google I/O

Demo video of Tango’s Virtual Positioning System

Video of “Into the Wild” 10,000 square foot Tango AR installation at the Marina Bay Sands Art & Science Museum

Here’s a Twitter thread discussing the different AR SDKs from the major tech companies

Here’s the What’s New on Tango presentation from Google I/O 2017

Subscribe on iTunes

Donate to the Voices of VR Podcast Patreon

Music: Fatality & Summer Trip

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. My name is Kent Bye, and welcome to the Voices of VR Podcast. So the season of developer conferences for the major tech companies is coming to a close for the year 2017, and a big focus for all of them, including Facebook, Microsoft, Google, and Apple, was augmented reality. So, phone-based AR is what everybody but Microsoft is doing. Microsoft, of course, has the HoloLens, but everybody else is looking at how can they use mobile phones as a stepping stone into more and more sophisticated augmented reality technologies. So on today's episode, I'm going to be talking to Johnny Lee of Google, and we're going to be doing a deep dive into Google Tango, which has a depth sensor camera, which is able to do some of the most sophisticated phone-based AR of anybody else that's out there. So we'll be talking about some of the features they're going to be able to do with their special hardware that these other companies are not. So, that's what we'll be covering on today's episode of the Voices of VR podcast. But first, a quick word from our sponsor. Today's episode is brought to you by the Voices of VR Patreon campaign. The Voices of VR is a gift to you and the rest of the VR community. It's part of my superpower to go to all of these different events, to have all the different experiences and talk to all the different people, to capture the latest and greatest innovations that's happening in the VR community and to share it with you so that you can be inspired to build the future that we all want to have with these new immersive technologies. So you can support me on this journey of capturing and sharing all this knowledge by providing your own gift. You can donate today at patreon.com slash Voices of VR. So this interview with Johnny happened at the Google I.O. conference that was happening at the Shoreline Theater in Mountain View, California on Friday, May 19th, 2017. So with that, let's go ahead and dive right in.

[00:02:03.299] Johnny Lee: My name is Johnny Lee. I'm the engineering director for Tango, which is a division inside of Google's Daydream group, focusing mostly on technologies for augmented reality. building sensors and hardware and software for mobile devices to track their position in 3D and map the environment.

[00:02:21.332] Kent Bye: Great. So maybe you could talk a bit about some of the hardware needs for the phone in order to run a Tango. What is the camera and the sensors that need to be there in order for you to actually do the magic that you're able to do with the Tango?

[00:02:33.944] Johnny Lee: Sure. So most phones have one camera in the back. Usually it's a color camera that people are used to using for taking photos. But what we want to do with Tango is be able to track the physical motion of the device, be able to recognize where the device is in the room or in a building, or even be able to create geometry of the floors and the walls and the tables. So when we create AR experiences, The characters actually know all the different surfaces in the room just like you do, or even can be behind objects in the room like behind the couch. And the standard camera that's built in most phones just gets a color image of the scene, but we really want to both be able to see a very large part of the room with a fisheye camera. So a very wide angle camera that gives us 150 degree field of view that allows us to see the room. from whatever angle we're in. And if you think about what happens when you look through a binoculars or look through like paper towel tubes, when you have just a small view of the room, it's easy to get lost. And the wide field of view camera helps us recognize where we are. The other sensor that we have in a current Tango device is a depth sensor. And a depth sensor is an infrared camera that sees an infrared pattern, usually from an LED or some pattern generator. And we could essentially get a point cloud or 3D measurements of all the surfaces. So that's what allows us to detect the floors, walls, and tables.

[00:03:53.355] Kent Bye: Now, are you doing some combination of sensor fusion when it comes to, like, doing computer vision from the camera, in combination with the depth sensor, in combination with IMUs, in order to do this world-blocking phenomena that you're able to do with Intengo?

[00:04:07.202] Johnny Lee: Yeah, the really challenging algorithmic part of the work, and I'm not the strongest algorithmic person on the team by far, is doing what's called state estimation. And that's basically saying that you know, I have information from the camera, I have information from the gyroscope, I have information from the accelerometer, and there's one position and movement of the phone that best explains all of those measurements. And that's sort of philosophically how the algorithm works. It tries to say, well, all sensors are noisy, all sensors are imperfect. So how do we actually combine our guess of where we are and how we're moving with the measurements that we're seeing? So it's a tight fusion of gyroscope, accelerometer, and camera tracking information at the same time.

[00:04:51.139] Kent Bye: So I just came from the Microsoft Build Conference to be able to see the HoloLens. And so when I try to compare the world-locking capabilities between something like the HoloLens and something like the Tango, The HoloLens is optical see-through. And when I was talking to people on Twitter, they were saying, well, with Tango, you can start to do some camera stabilization on top of that and then know where the image is and do the world-locking on top of that. And so in some ways, it's a little bit easier to do it with a tablet and phone-based rather than optical see-through. And I'm just curious if that's correct, if you're already doing some camera stabilization, and then because you know where the world is, you're able to have that really solid world-locking.

[00:05:30.277] Johnny Lee: So there is an app that actually does specifically camera stabilization, but it's actually a beneficial feature of the fact we're built into the camera and built into the phone, which is you can actually recover the full 3D trajectory of someone holding a video. And as a result, you can then reconstruct a nice smooth path and actually make like a nice stabilized video. When we do the AR experiences, we actually just show the camera frame. There's no additional stabilization that happens. It's true that they're different for sure. One of the strengths of HoloLens is that they are able to do really low latency tracking with some of the special hardware that they developed. And kudos to the hardware team and the software team there for the work that they've done. It's a very nice piece of equipment. On Tango phones, we don't have the same latency requirements because we're compositing on top of the camera. But that introduces other challenges simply because, for example, to run Tango phone, we have to run all three cameras at the same time, the color camera, the fisheye camera, the depth sensor, do image processing all at the same time. We have a lot of time stamping requirements because we want to align all the data from all these sensors at the same time, even when there's high system load. So when there's a heavy application running and sometimes the cameras will drop frames or they'll become out of sync. And Android's not a real-time system underneath. And so dealing with that architecture and improving the camera performance to actually maintain a good, consistent behavior, even under heavy system load, introduces its own challenges, which are separate from the fact that we know whether or not it's see-through or not.

[00:07:00.124] Kent Bye: Yeah, and I know that phone-based AR has been around for a while, but the difference, I guess, with the Tango is that you're adding the depth dimension, where you're kind of doing this inside-out tracking that's keeping track of where you're at in physical space, which allows you to walk around objects. And I think that And when I had the experience of doing that here at Google I-O, being able to walk around the planets, I had this sense of presence that I have never been able to have with phone-based AR because of that. And so I'm just curious to hear your thoughts on the capabilities of this inside-out tracking and what this extra dimension of depth is giving you with this phone-based AR.

[00:07:33.894] Johnny Lee: Yeah, I would actually say that the core components of Tango is tracking, depth, and area learning. These three functions. And the tracking capability itself doesn't require the depth sensor, actually. It's just using the fisheye camera to allow us to move through the space. And the main difference between that and previous systems that use markers is the fact, well, there's no more marker. And we just treat the entire room or the entire space as a marker itself. And that means you don't have to be careful, you don't have to print something out and point the phone at it and be careful about keeping that marker in frame. And I think a lot of those constraints just added a lot of cumbersome friction to the experience, despite the fact that We've seen education apps, we've seen some games all using marker-based things, but the need to have a physical marker and being constrained on where you look I think always made it difficult to reap the benefits of those experiences. So what we've been doing with the tracking capabilities of Tango is as soon as you turn on the camera you can start walking around arbitrarily. You can walk around the small space, you can walk around a whole house, or you can walk around the entire building, and it'll continue providing really good tracking. And I think that simplicity of it just works as what's needed to take advantage of these applications. And then when we combine that with the depth sensor, we get more plausible composition and reaction to the environment. So the astronaut stands on the actual floor because it knows what the floor is, We're starting to see games where characters can run behind couches or jump on the chairs. That actually paints a really exciting future where just more scene understanding, to know that a chair is a chair and a table is a table and maybe the characters actually respond to that, paints actually a pretty long roadmap ahead of what's potentially possible with smart annotations. The third element that we call area learning is a little bit harder to understand, but it's essentially spatial memory. So people can remember what their living room looks like or what their office looks like. If someone showed you a picture of a place that you've been before, you'd roughly know where the camera was when that picture was taken. and so area learning is basically the spatial memory and that allows a device to a few very interesting experiences which is one you can leave content anchored in the space so when we work with museums or we work with retailers like Lowe's we can actually leave assets physically anchored in the world and probably more importantly multiple people can go to the same space or have the same experience but they see the content from their own perspective and one of the really powerful things that we're seeing come from AR is isn't so much that you can composite things onto the camera, but the fact that if I give two people the same object, and they can see it from their own perspective, they can actually make their own observations. And it actually enhances the social interaction between the people. And that's one of the things we're seeing with Expeditions AR, which is using Tango phones in the classroom to teach lesson plans. And teachers can put objects in the middle of the room, and all the students can talk about it and see the same thing.

[00:10:28.601] Kent Bye: And so is the visual positioning system, the VPS, which is kind of like an internal GPS, but it's inside of buildings, is that the area of learning? And with that, where is that information stored? If I were to scan my home, would it be stored on my phone, or is that pushed up to the cloud? And if someone close, is that something that also is stored locally, or sort of a collective aggregation of all the people that are doing this VPS tracking?

[00:10:56.712] Johnny Lee: Yeah, so I was using two different terms there. Area learning is the ability to create a small experience in your home or in a game that allows you to lock objects to the world. But all of that information is stored on device and it's just like part of the game file. And so none of that is shared with Google. And so area learning is what most developers are using today to develop the experiences, as well as expeditions. When we talk about VPS, this is when we're working with a customer like Lowe's or a partner that has given us permission to go to their space. create these descriptions of the environment and giving us permission to upload those and process those in the cloud. Usually these are very large venues because the amount of data is too large to keep on device. So we provide processing service where we'll host the data and then serve it down kind of like YouTube. So if you use video as an analogy, you can capture a video on your phone, you can keep it on your phone and share it locally. But if you want to make it public and you want to share it with lots of people and you give permissions, you can upload it and Google will help distribute and stream that down to the people to use it.

[00:12:04.367] Kent Bye: When you were giving your talk yesterday, you mentioned that there is a moment where you're granting permissions to your phone to allow this to happen. So there's permissions on a couple ends. One is for Lowe's or whatever building you're in to grant you permission to create this more public VPS system that has maybe a collective aggregation of lots of people feeding into the database. But then there's on your phone level. So what are the phone level permissions that you would permit them in order to enable this VPS?

[00:12:30.803] Johnny Lee: Yeah, so we want to make sure things like the camera and things like location aren't being accessed without the user's knowledge. So just like giving access to the camera or GPS. pops up a permission, the user has to explicitly say, OK, to use these in order to use the Tango features, even when it's locally, regardless if it's connecting to the cloud. In the venues like Lowe's, we actually have an agreement with Lowe's that allows us to go in the stores. And it's also important to note that right now what we're doing is we're sending trained operators to walk around the store. So it's not people in the store using their phones that's uploading data. It's specifically professionals that we've hired to help us create good maps of the environment.

[00:13:09.723] Kent Bye: Now, as I was watching the Lowe's point cloud as you're navigating through these stores and seeing all these dots that they've basically like these mile markers that you've located and you had like three different types of points that you're tracking. But how sensitive is that to change? Like if they were to change the items that are in that aisle then is it kind of using a contextual information of where you're coming from and then you know kind of knows that it can maybe extrapolate that you're in this store or I'm just trying to think about if you were trying to break this, to what extent would you have to go to to change the store for it to get lost?

[00:13:43.774] Johnny Lee: Yeah, there's of course always a breaking point to this stuff. So if someone came to your house and rearranged all the furniture and painted the room differently, you probably wouldn't recognize your own house. That's true with Lowe's too. If someone overnight came in and restocked all the shelves, the map would not work as well as it did before. Now the nice thing about it is that we don't actually require every single step to be perfectly mapped. We can probably go several seconds to minutes between recognizing where we are. So you could probably go through the entire store and only need to localize to the map maybe four or five times to make sure we've got the right space. But that's where we're working with Lowe's as a pilot to help us understand exactly You know, how frequently do we have to update things? Do we work with all those store managers so that, you know, in the morning when they're doing restocking, part of their run is they just walk around the store a little bit with their associates to help us update the map. So that's one of the reasons we haven't announced like global scale coverage for all of this stuff because we want to make sure we're testing it well and we're rolling it out appropriately to help improve the system and be able to figure out what is necessary to make sure there's a good experience.

[00:14:50.760] Kent Bye: And one of the things that I noticed in the demos that we're showing here is that I was kind of putting my hand in front of an object. And it should have been in front of the object, but it was being shown up behind the object, so being able to do occlusions. And I noticed that that was something that's on the roadmap to come in the future, as well as object segmentation. Maybe you could talk about some of the features that, by the time that these phones come out, are those features that you think are going to be implemented, or is that going to be the next phase in terms of occlusions and object segmentation?

[00:15:20.046] Johnny Lee: Well, the hardware is actually all already there in the phone. So it has the tracking camera and it has a depth sensor to be able to do that type of occlusion. It really depends on the app developer to implement those features and to render the graphics in a way that makes sense. And a lot of it is, we haven't seen as much of it because there's actually a lot of math to do it correctly. And so we're starting to improve our SDK sample code so it becomes easier and easier for developers to incorporate those features. So we recently added a Unity prefab. Unity is a very popular game development engine. But some prefabs and examples that automatically show you how to do good occlusion, do tracking, how to do hit point detection on all the surfaces. And as our samples essentially become more and more thorough, we hope to see more and more apps start to pick up those capabilities. So some of the demos you saw yesterday were just earlier, and they just haven't had a chance to incorporate that code into it yet.

[00:16:13.914] Kent Bye: One of the things that I thought was really super impressive was putting the Tango in the back of a car and driving around the city and being able to map the city to distance estimation to what you claim to be less than a percentage point of drift, like 0.17% drift. after driving all over the city. And so it seems like you're doing computer vision combined with these depth sensor cameras are able to get a sense of space in a way that you're able to do pretty accurate measurements. I'm just curious like what level of accuracy that you're able to get in that.

[00:16:48.089] Johnny Lee: Yeah that was a quick test some of the engineers did coming up to IO, we were just looking for things that could show off the capabilities of it. And the drift accuracy we tend to optimise for, or our minimum bar of quality is essentially 1%. So if you walk 100 feet, we've drifted about one foot, that's open loop. So once we add in the area learning and recognition, we can do something called loop closure where we recognise where we are and we can correct drift. But even without that, we aim for less than 1% of drift. It just turns out that when you're doing this tight sensor fusion between the cameras and the gyro and the accelerometer and you do it in the way we described of trying to minimize the error from the sensor measurements, we can get very, very good performance if we calibrate the sensors properly. And so that's probably one of the things that's unusual about a Tango phone from most phones is that we do a fairly rigorous in-factory calibration because we want to know all of the camera parameters, we want to know the gyroscope parameters and accelerometer parameters and be able to model the behavior of those sensors really, really well. And that's what gives us our tracking performance.

[00:17:52.889] Kent Bye: And I just had a chance to talk to Brandon Jones yesterday, going in-depth into the future of web VR and web AR. And the thing that really struck me in seeing some of the demos and examples that you're showing off here at Google I-O is that a lot of the stuff I'm seeing from HoloLens is HMD-based, and there's maybe three gestures that Google has. But yet, with a tablet-based or phone-based AR, you're able to use the basically touch control interfaces and have a lot more sophisticated user interactions than you would on what is existing now in the HMD-based. I mean, I think eventually we're going to see a fusion and have an evolution of user interface design, but we have, I guess, the benefits and insights of being able to pull in a lot of both phone-based and tablet-based touchscreen user interface interactions on top of the phone, such that you can start to blend in realities and pull in stuff from WebAR into the application. Just curious to hear from your perspective of some of those combinations that you see that are really interesting.

[00:18:53.636] Johnny Lee: Yeah, when people talk about AR, especially HMD AR, everyone gets really caught up in the display technology and field of view and latency. But input's a really big problem too. Buttons are really good. And there's a lot of places where not having a button makes it very difficult to interact with things. So that's one of the reasons we decided to focus on phones currently, because we have a very rich set of APIs and interfaces and interaction models with 2D screens. and simply leveraging all of that assets allows us to explore these experiences without getting encumbered by the maturity of input and recognition technologies. In the long run, I think what we'd like to see is things that are essentially multimodal interaction, where you take advantage of not just gestures but gaze and speech and scene understanding as well as where you are. to all provide context to the tool or the computer to help give it a sense of what you want to do. And I come actually from interface technology background and because I see the limitations there I'm actually really excited to see people out there or students interested in getting this space to think about multimodal interaction for the future because I think you want that context awareness when you're starting to use devices that understand space. And so a single click gesture or using a touch screen will even be not enough. And eventually you want to say, well, tell me more about this. And that means you need to know what this is and you need to be able to realize that that's a query from speech and understanding this is probably a difficult pronoun to resolve unless you know exactly where they are in a space and what they perhaps have been doing before.

[00:20:31.682] Kent Bye: And it sounds like some of the Lenovo phones that are coming out later this year are going to be able to be both Tango and Daydream enabled. So we're going to have a phone that's able to do both AR and VR. And so what kind of applications do you think that would enable to be able to solve one problem, but be able to do both an AR and a VR experience of the same problem space, but with both technologies?

[00:20:53.702] Johnny Lee: Yeah, the Asus phone will be the first phone that's Tango enabled and Daydream ready. And we already have one or two applications that are starting to make that bridge very nicely. One of them is Constructor, our 3D developer tool for scanning environments. And so using the tracking camera and the depth sensor, you can create a reasonably high fidelity 3D scan of the room that you're standing in or the apartment that you happen to be in and create a 3D model. You can view the 3D model on the screen just by tapping around on the touch screen. But then there's a VR mode where you push that, then you can put it in a DaydreamView headset, put it on your head and click around with the Daydream controller and actually walk around the apartment that you just scanned. So that's a nice blend of using the 3D sensing capabilities to capture content and then using DaydreamView and Daydream to view the content. And we hope to see more and more applications that start to bridge whether or not you want to look at a 2D view of it in your hand or you want to put it on a headset and then get immersed into the experience.

[00:21:54.628] Kent Bye: One of the demos that you showed in the upcoming and future of Tango was being able to identify people as you're changing the orientation of the phone and walking around. The thing that made me think of is, is this going to be able to do facial recognition? Because I think there's an issue looking at the Google Glass of some of the potential privacy concerns of being able to start recording people at any moment in time. And so when you have these devices out in public, you know, I think there's a sociological element there of, you know, is this recording, capturing, identifying me? And so when I saw that, I was just wondering about the facial recognition possibilities of Tango.

[00:22:30.503] Johnny Lee: So we actually haven't been looking at facial recognition at all. Most of our work at the moment has been in environmental detection and tracking. So one of the reasons for that work is actually so we can ignore the people. so that we don't map them and we don't look at those features. Because, first of all, people move around, they're not stable. People are actually considered noise to our tracking system. So, the purpose of that code is actually to avoid looking at people, actually. But you can imagine where there are some applications like we have in the Tango Sandbox where we have characters like the lion, and the scarecrow, and a tin man. And these playful characters actually would be nice if they could actually realize that there's another person in the scene, and perhaps say a wave to them, or at least step out of their way if they happen to be in front of the camera. It's kind of like the next step past occlusion, where it's not just compositing behind you, but the character actually moves out of the way. So at the moment, that's the scope of the people detection and person detection capabilities that we're trying to build in the system.

[00:23:30.041] Kent Bye: One of the other capabilities that I saw that was really amazing was to be able to actually take out objects from a room. And so you have your bed and whatnot and you're just taking it out. And I know that looking at Lowe's and Kevin Nell and some of the things that he was showing for what Lowe's is doing with the Tango is that he was saying that there's a little bit of like an optimal cognitive load whenever you're able to be in your space and be able to basically look at what would this look like if you were to do this remodeling job with this specific equipment and be able to take that out and then put it in to see it rather than going into a store and kind of being overwhelmed and not being able to bridge the gap of what that might look like in your home, but actually be in your home and to see it. But it seems like this being able to take out real objects and put in virtual objects is one of the big major use cases for the Tango.

[00:24:16.873] Johnny Lee: Yeah, when we worked with Pottery Barn or Wayfair and other retailers doing furniture, that's often a request. It's like, I'm interested in a new couch versus I'm trying to populate a completely empty room. So the work you saw at the talk was some research work out of University of Washington, which is really, really phenomenal graphics work, where they take a Tango device, they walk around a room, but then they have some prior assumptions or machine learning models that understand, well, what are the major walls and floors and what's furniture? So that kind of scene understanding is already helpful of, say, room or not room. And that allows you to classify the objects and then say, well, This is the parts of the room that are static, and then they can estimate the lighting and then estimate room completion. And so that's a direction that I think is really interesting because that semantic level of understanding provides more utility to the user in the end who wants to do something as simple as, well, I want to replace my existing couch rather than populate an empty room, as I said.

[00:25:17.500] Kent Bye: And so what do you want to experience in augmented reality then?

[00:25:21.385] Johnny Lee: Mostly, I'm interested in this idea of what I would call spatially-aware computing, which is just devices that understand the physical world the same way that you and I do. When you have it mixed with a display, it's an immersive experience like VR. But AR devices, being able to realize that we're both sitting here in this park area at Google I.O., we have some water bottles in front of us, there's a table here, yet your phone and my phone have no idea about that. And if we want to be able to bring up information about what's around us or talk about a project that we're doing and I can bring a visualization right in front of us and we can both see it in front of us. I think that's particularly exciting.

[00:25:58.438] Kent Bye: Great. And finally, what do you think is kind of the ultimate potential of augmented reality and what it might be able to enable?

[00:26:07.139] Johnny Lee: We'll change the world. It's hard to really be very precise and articulate about what's going to happen. I think everyone's realized that it's hard to be right over a long course of time. I'm just excited about devices that will help us interact with the environment more like we do. And that could be AR headsets, it could be VR headsets, it could be phones, it could be robotics. It could be wearable devices that just help us not get lost more. And in some cases, we may find the most value in very humble places. And the greater aggregation of all of that over time, I'm enthusiastic about.

[00:26:43.061] Kent Bye: OK, great. Well, thank you. Thanks. So that was Johnny Lee of the Tango project of Google. So I have a number of different takeaways about this interview is that, first of all, the tracking capabilities of the Tango phone that I saw, the demos that were being shown at Google I-O are some of the best phone-based AR demos that I've seen so far. I started to feel this sense of presence that I don't think I was able to feel in any other phone based AR that I've seen so far. So, you know, you are able to create some level of immersion that really wasn't possible before. So the three things that Johnny said that the Google Tango phone is going to be able to have tracking depth and area learning. So being able to learn a space and then kind of shut it off and open up and then the phone will be able to detect where you're at in that space. something that's going to be unique to the type of system that Google is releasing. So I just want to take a step back and kind of look at the overall ecosystem and what's happening with these other companies. Apple just announced that they have the AR kit that is coming out and they're doing something that is a lot simpler. They're pushing that out into all of their existing iOS devices and they're just using a camera. So there's no ability to be able to do depth sensing tracking, and so they can't do any kind of sophisticated slam. They're doing everything that's based in visual odometry, which doesn't have the level of area persistence that the Google Tango has. Apple tends to be pretty closed in their participation in artificial intelligence conferences. You don't see them in a lot of AI conferences. So I expect that the AI from both Google as well as Facebook is just going to have the ability to recruit some of the best talent that's out there. So I think that the AR kit from Apple, even though they're making the claim that they're going to be the mostly widely distributed AR kits that are out there, I think their system is going to be the least capable. In some senses, you could think of it as almost like the Google Cardboard of phone-based AR. It's going to be, I think, doing what is the bare minimum of what's possible with phone-based AR, with the cutting edge of technology. Facebook, they don't have any hardware. They are basically doing a software layer on top of the other phones, either the iOS or Android. And so they're stuck with dealing with the limitations of the existing operating systems for both Google, as well as the iPhone in the sense that they can't really do any hardware modifications until they start to ship their own hardware, which they haven't. So they're doing something that's been a pure software play. So with that, I imagine that whatever Facebook is able to do is going to be able to work equally as well on the iPhone, as well as Android, which I'm not actually sure if either Apple or Google will be able to make that same claim. I think that Google is going to be optimizing for Android and Apple is going to be optimizing for the iOS. Now I think that overall Facebook is doing a lot more like social features so they're bringing in a lot of like Snapchat like features so they're going to be relying upon your social graph and more emotional based identity expression. with their augmented reality applications and a lot of the demos that Google was showing which were much more like information-based or education-based so very much a mental presence being able to be in an environment and pull down information to learn more about your environment whereas Facebook is much more focused on emotional connections to other people. Although, you know, I think both of them are showing the capability for games, but at this point, it's going to be a number of years before, like, every single phone has, like, these depth sensor cameras. And Google is making the hard push of having the technology that has the best sensors, but are they going to be able to see the amount of adoption that's going to create the compelling applications? And Apple, they're starting very slow. I think, you know, they're just now getting into the virtual and augmented reality game. Now they've been. pretty much silent completely. I mean, you couldn't even run a VR experience on any Mac hardware that's out there today. And they still haven't announced any explicit VR headset. And Tim Cook has been very vocal in terms of not really believing in the power of virtual reality. They did demonstrate a lot of Vive technologies on Apple hardware. And they're making it available for developers starting today, Apple at least, to be able to have access to an eGPU to be able to do virtual reality development. But the Google Tango to me is the most impressive in terms of when it comes to phone-based AR. When it comes to overall AR, I think Microsoft is actually in the lead in terms of what the HoloLens is able to do technologically. That was a point that I asked Johnny Lee, which was the differences between optical see-through and what they're doing with the Tango. And the optical see-through has higher requirements for latency because you don't have the ability to change the real world at the basically like latency that you have with your eyes. When you're mediating it through the technology such that you're taking the image from the camera, then you can control that camera image and then composite and paint the augmented reality picture on top of that, such that it gives you the impression that the world blocking is as solid and as good as the HoloLens, but they don't have as strict requirements for latency and they're doing a different process for that. One thing that was interesting for me was to hear Johnny Lee say that, you know, Android wasn't optimized to be a real time system, such that kind of sounds like, you know, they have to take these inputs from three different cameras and to a certain extent, maybe fight against the way that Android is architected in order to have the timestamps and do this type of sensor fusion. So it'll be interesting to see if at some point we start to see this fork off into maybe a different branch of Android that is completely optimized for doing real-time interactions and optimized for augmented reality applications. Maybe that's something that they're working on in terms of their self-contained system, which I had a chance to actually try out. And it was, it was good. I mean, it was kind of a year or year and a half old technology in terms of the OLED, but you know, the inside out tracking that Google is able to do on these systems is just super spot on. The thing that worries me is what they're going to do with the input controls overall with these immersive technologies. So, you know, when you look at the three things that the Tango is doing in terms of the tracking, the depth sensing and the area learning, all the other phone based AR systems are doing some type of similar tracking where you're essentially just tracking yourself through space. The depth sensing is something that is unique to the Google system so far. So that's going to allow them to do more plausible composition as well as being able to do more reactions within the environment. And also the area learning in doing the spatial memory, as people were talking about the Apple's AR kit, as soon as you stop and restart an experience within the Apple system, it's going to basically have to relearn everything about this environment. But yet this area learning will allow you to capture your environment and kind of come back into that same context and perhaps revisit some of the virtual objects that you placed into that environment. So I think that the multimodal interactions is something that is going to be a little bit of a difference between what Microsoft is doing versus what Facebook, Apple, and Google are doing with phone-based AR. Because you're starting with the phone-based AR, you're going to have a little bit more sophisticated tablet interactions that you're not able to necessarily do within the HoloLens. In terms of the primary user interfaces for a lot of these phone-based applications, the screen's going to be right there, and you're going to be mostly interacting with that touchscreen. But that eventually all this is going to be going into this combination of gesture control, voice control, as well as doing tablet interfaces, and potentially even keyboard interactions if you're working in an environment where you don't want to be speaking out loud. Or if you're typing computer code, that's something that is actually faster to type on a keyboard than it is to actually speak it out loud. I have seen some indications of the potential of this rise of the chorded keyboard where you're actually like a chord stenographer pushing multiple keys at the same time, but this is a singular object that you're holding in your hand and it's called a twiddler keyboard. It's one of the models where you're playing a guitar chord where you're pushing combinations of keys in order to type text into an experience. So something like a corded keyboard it's still yet to be seen whether or not that's going to take off and be a thing within augmented reality as you're moving around and maybe in public and maybe you want to write an email or something you don't want to speak it out into everybody that's around you. So I expect that that's going to be something that's going to be becoming more popular as an input device at some point. Overall, you know, the thing that Johnny said is that we're moving into this new paradigm of spatially aware computing, where your phone is able to identify and interact with your environment much more like a human. It seems to be the trajectory that all the major companies are moving towards. And the fact that Google is launching some of the depth sensor cameras integrated within the phone-based AR first means that they're really on the leading edge of pushing the technology forward. To me, I wonder whether or not it's going to have the same level of adoption as the Apple approach, which is essentially them pushing out to everybody who has an iOS device. So is it going to be the low-end experiences that drive adoption, or is it going to be something that is actually using the full capabilities of the technology that's really going to drive adoption of these technologies? The spatial awareness in terms of remodeling, in terms of being able to take out existing objects in your place and to put in the scale models, I think what Lowe's is saying is they actually found that it was reducing cognitive load. When you go into a store at Lowe's and you see a model of what it might look like, you still have to imagine what it's going to look like, and it takes a lot of brain power. What they found is that when people use the AR and they just were in their context of their environment and they're able to actually see it to scale then they can make a buying decision right there. And so for Lowe's is a technology that they're already adopting and kind of going all in with in a lot of ways in terms of just allowing this virtual positioning system, which is like this internal GPS to be used within their stores. For a huge store like Lowe's, you can imagine that's one of their biggest pain points is that when you walk in there and you have a list of things you need get and you have no idea where they're at and it takes a long time to explore around and you end up trying to flag down a person to ask where something is at and so being able to actually do that on your phone and have it help you navigate and do the optimal route even I think it's gonna be a huge application for a company like Lowe's. So the spatially aware computing that's aware of our environment in different ways, I think is, it's just something that I see happening on all the different major technology companies. And it's worth looking at the Google Tango phone because eventually within the next couple of years, all the phones are going to have these features. So some of the leading innovation I think is going to be happening on the Android and Google Tango platform. So that's all that I have for today. I just wanted to thank you for joining me today on the Voices of VR podcast. And if you enjoy the podcast, then you can help me out in a couple of ways. You can just spread the word and tell your friends, share one of your favorite episodes to someone who might like it. And you can also become a contributor to my Patreon to help ensure that I can continue this podcast. So you can donate today at patreon.com slash Voices of VR. Thanks for listening.

More from this show