#623: Training AI & Robots in VR with NVIDIA’s Project Holodeck

At SIGGRAPH 2017, NVIDIA was showing off their Isaac Robot that had been trained to play dominos within a virtual world environment of NVIDIA’s Project Holodeck. They’re using Unreal Engine to simulate interactions with people in VR to train a robot how to play dominos. They can use a unified code base of AI algorithms for deep reinforcement learning within VR, and then apply that same code base to drive a physical Baxter robot. This creates a safe context to train and debug the behavior of the robot within a virtual environment, but to also experiment with cultivating interactions with the robot that are friendly, exciting, and entertaining. This will allow humans to build trust in interacting with robots in a virtual environment so that they are more comfortable and familiar with interacting with physical robots in the real world.

I talked with NVIDIA’s senior VR designer on this project Omer Shapira at the SIGGRAPH conference in August, where we talk about using Unreal Engine and Project Holodeck to train AI, using a variety of AI frameworks that can use VR as a reality simulator, stress testing for edge cases and anomalous behaviors in a safe environment, and how they’re cultivating social awareness and robot behaviors that improve human-computer interactions.

LISTEN TO THE VOICES OF VR PODCAST

Here’s NVIDIA CEO Jensen Huang talking about using VR to train robots & AI:

If you’re interested in learning more about AI, then be sure to check out the Voices of AI podcast which just released the first five episodes.

This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality


Support Voices of VR

Music: Fatality & Summer Trip

Rough Transcript

[00:00:05.412] Kent Bye: The Voices of VR Podcast. My name is Kent Bye, and welcome to the Voices of VR podcast. So at SIGGRAPH this year, there was one exhibit of virtual reality that was really quite mind-blowing. It was in the NVIDIA booth, and what they had was this robot that was working with these different dominoes. Now, the really fascinating thing was that they were able to train this robot within virtual reality. And so they had a robot that was actually manipulating these dominoes that was out front. And then in the side room, they had this HTC Vive running Unreal Engine 4 with a model of this Baxter robot with the dominoes. And you were able to go in and do different actions where you were actually training the artificial intelligence through your interactions within a virtual environment using the exact same code and algorithms, both in the virtual world as well as in the actual final robot. So you have the situation where game engines are now to the point that they're able to simulate reality to such fidelity where you could have this seamless blending of being able to train AI algorithms within these virtual worlds. And this is something that I've seen covering the Voices of AI podcast, going to the International Joint Conference of AI, where they're starting to use both video games to do training within deep reinforcement learning algorithms, but also within Minecraft. And so these 3D worlds are able to train some of the most cutting-edge AI that's out there today. So I'll be talking with Omer Shapira about a cross-section of AI and virtual reality and what NVIDIA is doing with their Holodeck platform on today's episode of the Voices of VR podcast. So this interview with Omer happened on Tuesday, August 1st, 2017 at the SIGGRAPH conference in Los Angeles, California. So with that, let's go ahead and dive right in.

[00:01:56.074] Omer Shapira: My name is Omer Shapira. I'm a senior VR designer at NVIDIA, which means I deal with engineering because everyone in NVIDIA who's doing VR has to be an engineer at some capacity, but I'm also designing a lot of the human aspects of VR interaction.

[00:02:11.103] Kent Bye: Great. So you're showing a demo here at SIGGRAPH that has a robot. So maybe you could talk a bit about the interaction between what you're doing with robots as well as with virtual reality.

[00:02:20.185] Omer Shapira: So at NVIDIA, we have a sort of like approach to AI that because everyone is going to be doing AI very soon, like we're going to be interacting with intelligence systems all the time. We want to make sure that as intelligent systems are shipped to be interacting with real humans, that they're proof of the right way. And so we want to make it easy to test interactive systems and intelligent systems, and we want to be able to introspect with them, right? In order to do that, we build a bunch of simulators in which we train those virtual agents that we have. In this particular case, what we did is we wanted to show that a real robot can play dominoes with a human right now and do it in a safe way and also be rather entertaining. In order to do that, robots are actually a challenge, right? Because a robot has enough force that it can harm things around it like windows or humans. And this will be reality. The more we interact with robots, even like cars, the more we have to deal with the fact that these are machines that if they're not programmed well, they might not do the right thing. So in order to do that, we want to make sure that it does the right thing. So we built a large system that simulates an environment for reinforcement learning algorithm to run on top of. In this case, we build dominoes, right? So, we started off by building a really good simulator for dominoes for humans to play. So, we have an environment called Holodeck that's been developed at NVIDIA internally. It gives you some hands in VR and, like, very expressive emotes that you can do and also some good dynamic control of objects in the world. You can actually use your controllers as hands. They look like hands. They feel like hands. and you can grab stuff with it. So we built dominoes that you can play with it. Two humans can play inside it, but we replaced one of the humans with an engine that just knows where to place dominoes. It was a really simple gripper. It's basically like a levitating thing in a VR experience. After we did that, we had one of the humans be replaced with a domino engine, something that knows the rules. So what you have is a machine that doesn't know anything about dominoes, playing against a machine that knows everything about dominoes, and one machine is learning from the other. It's getting rewards for doing the right thing, and it's getting a little bit of, like, correction for doing the wrong thing. This is called reinforcement learning. After a while, tens of thousands of games, hundreds of thousands of games, it gets it right, gets it more and more right, and it does some right moves. So after tens of thousands of games, like multiple tens of thousands of games, it does the right move eventually, like continuously does the right move. So you can think about it like the way a lot of psychologists treat it. It's like it's training a child, right? So the robot that we have inside the environment acts like a five-year-old. But in order to get it there, after we had a simulator inside the environment just playing dominoes, just like telling dominoes where to go, we actually replaced it with a real rigged robot. We used the same model of the robot that we had interacting with humans eventually. It's called the Baxter robot. It's a research robot. It's fairly standard. And we just modeled the same robot that we were actually going to use playing dominoes with the real person. But we wanted to make sure it's playing it safely, it's being nice, it's being entertaining because it has to look at humans and wave at them and so on. And we tested this in VR. We did this very quickly. We had a team of artists model the real thing. The artists knew that we were going to play with a Baxter robot eventually, so it's a fairly common robot used for research, a lot of universities use it. They just modeled a virtual model of Baxter with the same masses, with the same scales, a slightly different face to make it friendlier, but basically it was the same robot functionally, and we can use it for simulating and seeing how that feels to interact with it. We want this technology to be friendly. We want humans to understand it for what it is, not think about it as a scary thing. So we had to have its demeanor match its AI. And we did that for a bit. We programmed some animations for the real robot to play. And the thing is, the real robot and the virtual robot are running basically the same code. Up to the point of inverse kinematics, which is what takes a coordinate and translates it to an actual motion of the robot, everything from inserting the image into the robot up to getting the coordinates out is done with the same code for both the real robot and the virtual robot.

[00:06:27.247] Kent Bye: So you're literally able to train robots in virtual reality by using essentially what it sounds like a virtual camera within this robot and be able to do this same image recognition so you're able to kind of project in a virtual world and then use that same computer vision algorithms as well or is that different?

[00:06:44.214] Omer Shapira: it's exactly the same. We're actually using the exact same recognition code. So what's going on here is we have a product called Isaac, which is an environment for training artificial intelligence inside world simulators. Game engines just happen to be really, really good world simulators at this point. We have a lot of tech based on trying to get game engines to look real. So in this case, we're actually taking the game engine camera and sending images to it the same way that we would send real world images to it. Not only that, we actually trained the real-world cameras to recognize dominoes using a game engine. We just placed dominoes there, and the training system learned that way. It had the semantics of which dominoes it's looking at, and it slowly learned to recognize its semantics. We didn't add anything apart from placing dominoes on a board lots of times and adding semantics. That's all there is to it. And there's not only that, the component of training the AI to interact in a game engine is actually queryable, right? And this is one of our key points. If we want robots to do the right thing and to avoid doing the wrong thing, we want to be able to test that in a reliable way. World simulators like game engines are not only good for training them, They're also good for testing them. And what we're doing inside the virtual experience right now is we're getting robots to actually prove that they're able to interact with humans in a good way. And after we've done this in a while and we basically quality assured it, we can put it on a real robot, which is what we've done.

[00:08:09.227] Kent Bye: That's amazing that you're able to start to use an engine like Unreal Engine and start to plug in these neural network architectures. Specifically, are you using something like TensorFlow and like a convolutional neural network in order to train that vision? And maybe talk a bit about these cloud services and these neural network architectures that you're able to plug into the game engine in order to start to train neural networks that could be trained in virtual worlds but be used in the actual real world.

[00:08:33.678] Omer Shapira: So, we do have a lot of different components of AI from classic machine learning to reinforcement learning and obviously a lot of convolutional neural networks along the way to do all sorts of recognition here. Because what we have is we have a machine vision model inside that is attempting to recognize dominoes and send it from the cameras into a sort of like higher level of semantics that a reinforcement learning environment can discern something out of. We also have very high processing inside the engine itself, because the engine itself needs to be able to do simulations such as a tower of dominoes like the person behind me is doing right now. He's actually doing a four-story tower of dominoes that's using PhysX, and I believe I could do that in real life? but it's very, very stable, which is something rare for game engines. We did change the reinforcement learning model quite a lot, so I'm not sure if we used TensorFlow at the end for that component or Theano. We had a bunch of researchers. They all have different preferences for things. Some algorithms have papers that are using one framework, so it's very easy to query and see the performance of that paper. If you write your own implementation with the same framework, some of our researchers prefer to do other things. So we have a stack of different machine learning products running the back end for this thing.

[00:09:48.346] Kent Bye: So I'm really curious to hear your thoughts on the implications of what you can do in a game engine. Because it seems like you're bounded by physical reality and the physics of real reality when you're training things in the real world. But imagine that you could potentially maybe accelerate the physics or go at faster speeds or maybe do things in parallel. What are the opportunities here for being able to train AI using virtual worlds?

[00:10:10.865] Omer Shapira: So that's a really interesting question because if you're a tool maker, whenever you build a tool, you'll find that if people aren't abusing your tool, then you haven't built a good enough tool, right? So that's what we find with our physics engines, but that's what like we find with our AI tools as well. Like you will always find a surprising use of AI. And if you haven't, you just don't have a robust enough system. And the thing we're seeing with this case is that a lot of people are trying to break it and discovering weird behaviors in the robots. And this is our point. We want to be able to discover these weird behaviors in the robots in ways that are safe. Because if you're training something that is mission critical or is interacting with a human that is vulnerable, you want to be able to detect its bugs as soon as possible. So you want to attack it as much as you can with different scenarios. This carries a lot of implications with any industry that has humans interacting with machines. If you are thinking about a robot that should handle, say, a dog daycare, then you can test it because you have real-world data and you can observe what's going on. So you can stop and you can try multiple things and you can look at the data. But if you have humans interacting with a robot in your target simulation, then you can actually have those same humans interacting inside VR. And that's our point, right? We haven't made the game engines redundant in this case. If we make a better simulator, it won't make the game engine pointless. The game engine is also meant for humans to interact with a real robot in it. And game engines are... It's funny that you should call them game engines anymore, right? They're somewhere between a media framework and an operating system. Like, it's a simulator for doing a lot of the things that you just wish to see in the real world. And if we're training computers to interact with the real world, we have to be using something that is as close as possible to the real world. The only way to get there is using really, really good simulators.

[00:11:58.183] Kent Bye: It's like a reality simulator at the end of the day.

[00:12:01.305] Omer Shapira: I really wish we got there. We're close in some respects. We're actually better than reality in a lot of other respects, optics included, or some areas of optics included. But yeah, I would be very, very happy if my job ended up being a very good, plausible reality simulator.

[00:12:18.508] Kent Bye: Well here in this demo room you also have like a webcam that you're also recording people that are in the room. You're doing some AI tracking to do some skeletal tracking to be able to be able to see what the gestures and whatnot are. So I guess the idea is that there may be one person in this specific case that's in a vibe that has like full six degree of freedom tracking that has the interactions with the robot. But you're also looking at other things where people may not be immersed within a VR experience, but also finding ways for having those people interact with the robot. So maybe you could talk a bit about that system and what the intention behind that is.

[00:12:50.702] Omer Shapira: Yeah, so our intention was never to just build a virtual robot, right? We were aiming for the real thing. We're aiming to make robots interactive and friendly and exciting and entertaining. Replacing a lot of the interactions that you miss in your daily life with other humans, having a robot help you complete the trust in computers and reinforce some of your daily interactions. In order to do that, we needed to find ways for understanding how humans interact with robots, but also how friendly robots seem to the humans. And what we have in this case is we built a system along with our partners at Wrench, which tracks some humans in the room. We're not recording them, by the way. We're just passing information to the camera. It's not being saved. And it's detecting their skeleton, which is an achievement by itself. We're only doing this with an RGB camera. It's detecting a lot of skeletons and it can tell which humans are paying attention to the robot. We have a small system like an AI system running there that knows where humans are looking, how their body poses compared to the robot, and we're essentially trying to figure out who the robot should look at. And this has been a challenge for me because I need to understand a robot is being some kind of entertainer here. A lot of people are watching this robot play dominoes right now. It needs to be friendly, right? So, we made it look at a bunch of people, and if they wave at him, and he's looking at them, he will wave back. And we can add a lot more of these emotes that will make the robot seem like it really is on the inside, which is a level of training that you would typically find in a toddler, right? We're training it to do certain things, it's learning to get better at it, but based on a lot of evidence, we would like the demeanor of the robot to be kind of the same. It's a very curious thing. So we built a model for human interaction with a virtual robot. The robot looks around the room. It has small saccades because it's looking at a group of people. If it's seeing the group of people, isn't too interested or too excited, then it tries to look for another group of people or will leave them alone, you know, not be creepy. And it will make a like a sort of like a large travel with its eyes somewhere else and try to find a different group of people that it can be entertaining to. The idea is it's supposed to be friendly, supposed to be entertaining attention or like honing attention. And it can do this while playing the game. So we have the robot taking a break from a move, saying hi to people and continuing with the move. The idea is the more we can do of this, the more people will trust being around a system that we pretty much can already confirm does the right thing. We just haven't shown it to the humans.

[00:15:20.505] Kent Bye: And so if I look at what you're able to do in terms of with this robot, you're basically modeling this in a real-time game engine running at 90 frames per second to be in virtual reality. I imagine that a lot of the use cases for people to train similar situations, it may be quite a learning curve for people to translate the simulated environment into a game engine that is being modeled at that rate. But there's also the thing where this is a little bit of a greenfield project where you're able to start from scratch and have shared code between the two of them. There may be legacy systems or people that have already built stuff that then they would have to model their legacy system in the same way within these virtual environments. I'm just curious to hear your thoughts of some of these big applications where you see application of using virtual worlds to train AI. If it is in these greenfield environments where you could start to do both, or if you also have a certain amount of gaming experience to be able to have these real-time simulations.

[00:16:13.402] Omer Shapira: Well, I mean, it's always true about like, you know, like the adage of computing is like garbage in, garbage out, or rather your training is at most as good as the data that you put into it. And this is no exception, right? We build a simulator. The simulator has to be very, very good for the simulated environment to be very, very good. Some things are surprising, like there are papers about domain randomization, which essentially allow us to create very immune systems, immune in the sense of immune of error, based on non-realistic images. You're capable of reinforcing a good distinction, like features on a certain object that you're looking at. There are a lot of different techniques that are not necessarily just going through the photorealistic path. Regardless, it's all about data. You really need to have good, high-quality data. And the thing about game engines in particular is that, like, if you're a game developer, the worst thing that you will have is being limited by a game engine. So, the end goal of a game engine is to be robust enough where necessary to do human interaction. So, like, input has to be fast, rendering has to be fast, but also flexible enough to allow different kinds of interaction, right? You're trying to build a new genre of a game, you're trying to build a new type of interaction, so you have to allow the game engine flexibility. And UE4 is a combination of both. It's sufficiently flexible to have artists manipulating it. Our robot was completely built by artists. But it's also strong enough to have it running very, very fast on our hardware. So we're very lucky to have the system to allow us to do a game of dominoes very well. But I won't deny that not all things are equally possible with all pieces of software. And it's inevitable that we will have to write some of our own code to do some interactions that are not possible with the current generation of game engines. In general, interaction design for hands, for small scales in game engines has never been dealt with, right? When we're playing games, we're always playing at scales that are very large, you know, like action is exaggerated and so on. So there's, like the current generation of game engine has to go through a transition of becoming gentler, softer, more precise, and more delicate. Like you can break small things in order to be somewhat reminiscent of the real world. And this is really important for VR because you're manipulating things with very accurate manipulators called your hands, which you are used to being accurate and responsive. And if they're not in VR, you get frustrated.

[00:18:35.965] Kent Bye: Yeah, right now the current generation of six degree of freedom controllers are basically like a big wand that you're swinging around, but you don't have the dexterity to really track the fingers. So something like Leak Motion or other tracking that's able to get the more fine grain movement of the fingers seems like that would be something that would be needing from a game design perspective because there's a lack of haptics. I think the interaction design for those isn't as compelling as maybe some gesture controls and user interaction. In terms of training a robot, it sounds like that level of fidelity is what you would start to need to have a human go into VR, maybe do an action, and then train a robot after many iterations for them to go in and do the exact same interaction.

[00:19:17.143] Omer Shapira: Yeah, and my approach about this, like, you know, being a person who's all about interactions, my approach about this is, like, you have to think like a cave dweller. You weren't born with a VR controller in your hand. You were born with hands. You're not swiping on an iPad. It happens to feel very good, but it's not something that you had to learn, right? In controllers, It's kind of surprising because you get a lot of things that are frustrating to do, but you also get a lot of things that have power multipliers to them. Grabbing dominoes and manipulating them happens to be a really easy thing to do with the Vive controller. So we built a small system, it's a rather large system actually, we built a rather large system called Holodeck to enable us to do a lot of multiple human interactions in virtual reality. We're using Holodeck as the basis of human interaction in this particular case. manipulators that look like your real hands. You can do gestures with them. You can also, as you touch a surface, it won't allow you to go through a surface and create an illegal state in the simulation because it's meant to do the right thing. And it basically attempts to mimic the real world in the very small use case of things you can control this way. If we were to choose a different set of paradigms to help humans interact with robots, we would need to not only design code for this exact controller, we would need to find different controllers to do this. I don't necessarily think that a camera tracking your fingers is the ultimate one true way to go. In particular, because the most important thing about sensing or manipulating something with your hands is not looking at your hands while doing it, it's your proprioception, which is the tension in your muscle spindles going directly into your brain. It's a very, very fast feedback loop. You've been trained on this since you were a kid. You were able to touch your nose with your eyes closed. You were able to grip an apple with your eyes closed. You're able to do a lot of things behind your back. These are really important things that are not being addressed by cameras. So we really have to think about a plethora of tools, or rather an array of tools to solve different problems. And in human interaction in VR, I think the most important ones are tools that address your proprioception. And they're not necessarily cameras.

[00:21:18.301] Kent Bye: So where do you see the domains where this is going to be really applied of training robots in virtual worlds? What kind of applications do you see this really taking off in? What industry verticals?

[00:21:29.457] Omer Shapira: I mean, a really cool thing, I can say what I wish to see in the world. I wish to see in the world a situation where robots become helpers in trivial tasks, which humans either are incapable of doing completely because they weren't built to do this or because they're impaired in some kind of way. You know, I have some disabled people that I'm close to. I want them to have the same affordances as other humans. So it's a very important thing for me that robots will be able to help this way. And I think we're not that far off in that case. But there are a lot of cases where robots can help us cope with living alone. They can help us have fun, right? Play games, right? You can play basketball with a robot. You can do a lot of things that people thought, well, it's either too expensive or too unsafe to do. You know, I wish I could have a robot help me while I'm in VR and blind, fetch stuff in the real world. There are a lot of multiple applications here that you can imagine robots doing as tasks before becoming fully intelligent agents. We don't have to look as far as robots being the next Jarvis to be able to imagine robots augmenting our lives and creating a power multiplier or creating an equalizer for us.

[00:22:41.717] Kent Bye: Great. And finally, what do you think is kind of the ultimate potential of virtual reality and artificial intelligence and robotics and what it might be able to enable?

[00:22:51.529] Omer Shapira: Oh man, I'm conflicted here because I make art in virtual reality and the thing that grabs me about virtual reality is that you can put people inside things that you really couldn't anywhere else. I used to be a filmmaker. Film is limited, right? If you're an editor and you edit fast, you treat everything like a ride and you put people through a one-way non-interactive emotional tunnel. In VR, you can do so much more because you can surround a person, but the person also has a certain ability to react to the world, and the world needs to appropriately react to it. I'm not even talking about interaction, we're talking about just reaction, right? Like, it needs to react properly, like a physics engine. Not like a semantics engine. So I think there's a lot of potential just there. We have to see amazing art in this field in order to compel the humans in it. I think that on the practical side, we're already seeing applications that are equal, if not better. than what we can do with humans. If we have good manipulators with our hands, if we can use systems to quickly solve a problem in three dimensions that would be hard to solve in three dimensions on a screen, we gain a lot of advantage. I talked about proprioception. Proprioception is my obsession because You have muscle memory and you have spatial recognition all based on your body pose. So the same way that you learn a route to your school when you're a kid, you can learn an action that you do on a purely digital component. You can learn it via physical actions and that is so powerful because you as a human can become a better performer. inside virtual reality without even involving all of the intelligent components that interact with you. Just by moving your hands, you can be smarter. And that is a huge potential that is not fully utilized right now in virtual reality.

[00:24:33.890] Kent Bye: Awesome. Well, thank you so much.

[00:24:35.371] Omer Shapira: Thank you, Kent. It's always a pleasure.

[00:24:37.392] Kent Bye: So that was Omer Shapira. He's a senior VR designer at NVIDIA using virtual reality technologies in order to train AI algorithms. So I have a number of different takeaways about this interview is that first of all this was one of the most mind-blowing applications that I saw at SIGGRAPH and I had been waiting to talk about it until I sort of really got the Voices of AI podcast up and launched. That is now Launched I have the first five episodes of the voices of AI that you can go binge listen to and kind of really dive in But as I'm listening to this interview, I'm realizing that there was a lot of really technical and wonky things about artificial intelligence and so if you want to learn more go check out the voices of AI I'm gonna be doing deep dives into much more of these cross sections between the virtual worlds as well as artificial intelligence. And I think the thing that Omer really showed was that there's so much opportunities here to be able to train AI algorithms in a way that's very controlled, as well as to be able to really scale it out. Game engines, as what Omer said, is turning into this media framework combined with a operating system. So you're able to basically do this spatial computing and simulations where What Omar said is that you know, there's a lot of things that are good enough and things that actually are better in some Aspects like the optics for example, and then some stuff that's not quite there as much yet but yet when people go into these virtual environments, they're able to get a pretty good experience and kind of get to the essence of the archetypal experience that they need in order to actually train the robots. And so I think that's the challenge is to try to figure out what types of representations that the AI needs to be able to be trained on and then how it can use these virtual worlds to be able to train them on. And the experience that I saw a lot of people doing within this specific experience was actually trying to break the physics engine. So the task that you're given when you went into this demo was to actually play dominoes with the robot, but people kind of get distracted and then start to try to push the physics engines to its edge. And so in that spirit of that, I think what Omer is saying is that that's kind of what they want to do, but in the context of trying to find these edge cases with the robot. So you're interacting with a robot trying to break it, but within the context of a virtual world where it's a lot safer to do that. That's the thing that, uh, Philip Rosedale had mentioned to me saying that like, you know, in the future, we may be having these interactions within VR and with AI technologies, not only to feel like, you know, we're safe from them, but perhaps at some point that they feel like they're safe from us. And so. There's kind of like this mutual context of safeness when you have these virtual worlds. And so you're able to go into the virtual world and have this experience, but able to, if the robot acts really crazy and would do something that would potentially harm you, then it's in a virtual world. There's no collision there that are actually going to like harm you in any way. And it was really fascinating to hear Omer talk about proprioception as being this key concept that he's been really fascinated with lately. And I think that what he was saying is that there's the principle of embodied cognition, which I've talked about a lot on the Voices of VR podcast, and it's just this idea that We don't just think within our brains, we think with our entire bodies. And that what he's saying is that just for the process of our muscle memory and our spatial recognition and the fact that we're moving our bodies through these different spaces is allowing us to change the way that we actually think about different things. And as we start to capture those embodied movements within virtual reality, then we can start to teach robots how to move, which is a lot of the thrust of what Omar is talking about here is how can you create a safe environment to be able to train robots and to find out all the different things that you need to do to make robots friendly and interactive with human beings. So they're trying to make robots more interactive, more friendly, more exciting, more entertaining. And at the end of the day, trying to build trust with these robots so that you have maybe a virtual experience of you interacting with the robot. And then you maybe gain some levels of trust and familiarity so that when you are in the real world interacting with that same robot, then you're a little bit more at ease. And they can also just figure out things of like, what does it mean to be friendly? They can start to do these experiments within these virtual worlds interacting with a virtual robot and seeing what kind of actual surprise and delight, wonder and awe, and emotional connections that you can get with a robot. And what are the different sort of minimum viable emotional expressions that they need to have within a robot in order to make it friendly? Because I think there is an emotional component for how we connect to other humans. And so if we see the robot starting to emulate that, then it can make us feel like we're actually building a connection. We have this perception of it being friendly. And the other thing that came up as I was listening to this was that, you know, I just came back from Sundance a few weeks ago and I used the haptics glove, which did a lot more sophisticated way of doing collision detection with your fingers. A lot of the solutions that were out at the time of this interview that I did back in August and. SIGGRAPH, a lot of the camera-based technologies like the Leap Motion, which really aren't that great when it comes to occlusion or flipping your hands around very quickly, and so it's very easy to kind of lose the tracking. They have a lot of really sophisticated and great AI algorithms to make that a lot better, but when you're talking about the level of needing to simulate the hands with such fidelity, so that there's no occlusion or anything else. You really want, if you're trying to teach a robot how to use its hand and pick up something, you kind of want every single nuance of that movement with your hands. And I think that's part of what the haptics goal enables you is to not only give you some of that haptic feedback, but you're able to then potentially start to mimic different movements of your hands and picking up different objects such that you're able to then train the robots to be able to do those same types of behaviors. So the mantra that Omer said is that in AI, it's garbage in, garbage out. So it's all about the data. It's all about the fidelity of the data. And it's all about the quality of that data to be able to somehow archetypally represent these different behaviors that you're trying to then train the AI to be able to learn about. So it seems like the game engines have evolved to the point where they're pretty good reality simulators to start to make this seamless transition of having a single code base, taking a AI and training it within a virtual world and then taking that and then plugging it into the actual Baxter robot and to be able to have these different demos. So to me, this is probably one of the most significant and important movements starting to connect these two worlds of the importance of virtual worlds and what you can do with being able to train AI and to do all sorts of things that you couldn't be able to do in the real world. So you have the capability to do like. Take a single simulation and do many different iterations and perhaps do some transfer learning amongst all the different scenarios and situations and to Explode out reality into all these different virtual worlds so that you know, you're not constrained by space You're able to kind of go out into the virtual world and really start to cultivate and train these AI systems And like I said, I did just also launch the first five episodes of the Voices of AI podcast. It's a lot of rich information. The first one is with the president of the triple AI, where you're able to kind of get a overview of the landscape of artificial intelligence. And then in episodes number two, four, and five, it's all about interactive storytelling. So looking at the future of how AI is going to be able to simulate either personalities or social dynamics, social situations, as well as to kind of control the fate within an overall simulation so that you can start to find new ways of expressing agency within these interactive environments. And then in episode number three, I talk about one of the next generation Turing tests, which is what are the benchmark tests for AI to reach this certain level. And this is about the Winograd schema challenge, as well as pronoun disambiguation, as well as knowledge representation. And so there's a deep dive into there. But I'm going to be starting within some of these higher level interactive storytelling and then kind of mixing in these more academic discussions and deep dives into artificial intelligence. So I'm really curious to see how the evolution of both VR and AI continue to be on this trajectory of collision. I think, especially within training AI, but also within these interactive story contexts and so many other different ways when it comes to augmented reality, because there's so much of AR that's not going to be even possible until you have enough training done with the computer vision algorithms within AR. I think a lot of that training can also start to be done within VR as well. So that's all that I have for today, and I just wanted to thank you for listening to the Voices of VR podcast. And if you enjoy the podcast, then please do spread the word, tell your friends, and consider becoming a member to the Patreon. This is a listener-supported podcast, and so I do rely upon your donations in order to continue to bring you this coverage. So you can become a member today at patreon.com slash Voices of VR. Thanks for listening.

More from this show