Philip Rosedale is the creator of Second Life, and more recently High Fidelity. He talks about a lot of the things that he’s doing differently in creating a virtual world for the second time around including a focus on 3D audio, low latency, speed and texture of experience as well as using a standard scripting language with JavaScript rather than rolling their own.
He talks about virtual body language and how the target of 100ms of latency is the threshold for a compelling telepresence experience that is indistinguishable from face-to-face interactions.
Philip talks about how High Fidelity wants to create a set of open standards and protocols so that people can host their own virtual worlds on their own servers. He also talks about their approach to distributed computing to help offload the computer power to run a complex and nuanced virtual world, and how mining a cryptocurrency could be a part of that process.
Finally, he talks about his vision for the future of the Metaverse, and how these virtual worlds will provide opportunities for participants to be more thoughtful, more open, and more creative than they can be in the real world. He doesn’t see that these worlds are necessarily escapist since they can be as sophisticated, complex, navigable and challenging as the real world. His experience with Second Life was that you have to be just as capable, smart and entrepreneurial to succeed in virtual world environments.
Reddit discussion here.
Be sure to check out this blog post on High Fidelity’s system-level architecture for more details.
TOPICS
- 0:00 – Intro – High Fidelity. New virtual world taking advantage of changes in technology
- 0:32 – Motion sensors and Oculus Rift are driving changes in virtual worlds. Driving to interact naturally in 3D virtual spaces, and the requirement to have the learned skill of using mouse and keyboard is going to end soon.
- 1:33 – What types of interactions have you had within High Fidelity with these new tools? Body language, and seeing someone blink. Nodding head is important. Moving hands is remarkable. Got PrioVR working for full upper body animation. Group interactions and face-to-face interactions.
- 2:47 – Facial capture with either a 3D camera or a webcam with Faceshift, and reconstruct it via 50 floating point numbers. Aiming to get less than 100ms latency to mimic 1-to-1 interactions
- 3:48 – Using VR HMD and facial capture at the same time. Can only get one at a time. Oculus thinking about doing facial capture. Can use a 3D TV, and adjust the view as a intermediary between full VR HMD and computer screen
- 4:54 – Using High Fidelity as a telepresence tool. Use it with their distributed team, and cool to see others.
- 5:35 – Good enough for enterprise use? Proof point of recording telling the same story with the same story with same avatar, and can identify people even without sound
- 6:20 – Distributed computation at High Fidelity. Limited by centralized hosting. Distributing small computers quickly, and use computers at home to offload some of the processing.
- 7:30 – Dynamic multicasting with audio. Mixing it in 3D. Dynamically assembling a multicast repeater and can perform a concert in real-time with less latency than in the real world.
- 8:47 – What is a voxel, and how are you using it? A way to organize space virtually. Represent what things look like at a distance, and enables to see at infinite distance. See full mesh topology up close.
- 10:06 – Hierarchical nesting of voxels for the decomposition of space with a “sparse voxel octree”, and then distributed computing with those. Can create infinitely complex city
- 10:59 – Other things that you’re doing differently from Second Life: Audio processing, low latency, speed and texture of experience, using a standard scripting language with JavaScript rather than rolling their own. People want to run their own services, it’s a protocol and open source standard rather than a world upon it’s own.
- 11:59 – Cryptocurrency and paying people for helping run the virtual world.
- 12:56 – How is identity different on High Fidelity? By default, you’re anonymous, and using OAuth and SSL for authorization for certain secure sites, but also a lot of open worlds. Having name floating over your head is not a great solution, because sharing you name is a choice and form of greeting
- 14:23 – Future of the Metaverse. Create a set of interconnected virtual worlds, where they’re living adjacent to each other. Instead of hypertext links, there will likely be doors. Virtual worlds of the future will be a set of interconnected spaces like the real world. There will be hidden servers that you can’t get to, just as there are private intranets.
- 15:34 – What inspires you with what you want to see? How people are changed by virtual worlds for the better, more thoughtful, more open, more creative. Virtual worlds are our future. They will become a real added space, and it’ll be a profound expansion of the real world.
- 16:35 – Are virtual worlds escapist? Technology is getting us the ability to create worlds that are just as sophisticated, complex, and navigable and challenging as the real world. Only escapist if you’re escaping from other people, or simplifying the world too much in a way that isn’t in our best interest. To be successful in Second Life you have to be capable, smart and entrepreneurial.
Theme music: “Fatality” by Tigoolio
Here’s a recent talk that Philip Rosedale gave about High Fidelity. Note that this is NOT within High Fidelity, but a government virtual world called MOSES, which is the “Military Open Simulator Enterprise Strategy.”
Rough Transcript
[00:00:05.452] Kent Bye: The Voices of VR Podcast.
[00:00:12.014] Philip Rosedale: I'm Philip Rosedale and I'm the founder of Second Life and now of High Fidelity. And what we are doing is building a new virtual world that takes advantage of a couple of the big changes that are happening in the industry right now that we think will make it possible to get from a million people, which is about the number of people using virtual worlds today, to a billion people.
[00:00:32.019] Kent Bye: I see, and so what are some of those key changes in the technology that you see?
[00:00:36.003] Philip Rosedale: The first one is all these inexpensive sensors that are letting us capture body motion, facial expression, and then things like the Oculus Rift that are letting us, you know, give you sensory feedback, visual immersion in a way that we just couldn't do before. I think the most important changes beyond the Oculus as a visual device are various things that detect the motion of your hands and your fingers. There's a number of different products that are doing that. We're building a platform that just anticipates that one way or another the problem of knowing where your body and your hands are is about to be solved. Because what that means is that we can interact naturally in these 3D spaces and this is the thing that has made adoption of 3D virtual worlds so difficult because you have been forced historically to use the mouse and the keyboard to move and manipulate objects in 3D and that is just something that is extremely difficult and so it's a learned skill and we hope that's about to end.
[00:01:33.319] Kent Bye: And so what type of interactions have you seen happening in the high-videlity environments with these new 3D sensors and being able to have more expressive body gestures and body language?
[00:01:44.784] Philip Rosedale: Well, High Fidelity right now is just in an early alpha stage, so we've probably had 50 or 100 people that are in there actively messing around. Of course, we as the company have been in there from the very beginning. The things that are remarkable are first body language, being able to just see somebody blink, for example, when you're talking to them with very little delay is a very powerful signaling mechanism, nodding the head like you're doing right now. is another very powerful communication thing. It means I'm listening, it means I'm somewhat engaged, or I'm not very engaged, or whatever. Moving the hands, obviously, as you begin to be able to capture that, it's remarkable. We just got a new full-motion bodysuit that's a Kickstarter consumer-targeted device called the PrioVR. We just got that working and did a big demo here with it. And just being able to see the full range of motion of somebody's even upper body is just stunning when you see that happening. We're watching things like that, group interactions, face-to-face interactions. We did hugging on stage today for the first time. We had one of our guys putting his hands behind one of our other people and hugging her.
[00:02:47.215] Kent Bye: And so maybe talk a bit about some of the work that you've been doing in terms of facial recognition and then rendering that in 3D.
[00:02:53.740] Philip Rosedale: Well, the way we're doing facial capture is we're watching your face with either a 3D camera, a PrimeSense camera, or a 2D camera, the one that's built into your laptop. The 3D cameras still work a lot better, they can capture a lot more detail. We then turn that into, using an SDK, using a partner of ours called FaceShift, we turn that into about 50 numbers, floating point numbers, and stream those over the network and then reconstruct that on the face of the avatar at the other end. A big part of what we've been working on is getting the delay When you move or when you speak and the other person seeing or hearing it down to about 100 or less milliseconds That is a critical Neurological break over point where if you can get to a hundred or less people can't tell they're not face-to-face with you if you're much more than that One-on-one interaction becomes very difficult as we all know today from our cell phones that are about 500 milliseconds Which is why we don't use them anymore
[00:03:48.242] Kent Bye: I see. And so maybe you could talk about this tension and dynamic between using head-mounted displays and doing facial recognition. It doesn't sound like you can necessarily do those two things at the same time.
[00:03:58.167] Philip Rosedale: Right now, High Fidelity supports the Oculus and these 3D cameras, but you're right, you can only do one at a time. I think this will get resolved, though, because I think the folks at Oculus and lots of other people are thinking about how to do facial capture using a head-mounted device. I don't think a head-mounted display is what you're going to always use. You can sit in front of a big 3D TV, for example, with our stuff, and have a pretty amazing interaction. For example, we can do things like shift your field of view slightly as you move your head in front of the TV because we can actually use a camera to see where your head is and adjust the view for that. And that's something that's kind of remarkable and in between the fully immersive head-mounted display and the sort of just sitting in front of your computer screen experience. Virtual reality in general, I think, is going to have a lot of different ways people get into it beyond the fully immersive head-mounted display and a lot of other systems.
[00:04:54.178] Kent Bye: Can you talk a bit about how your team is using high fidelity as a telepresence tool in terms of collaboration with distributed work employees?
[00:05:02.303] Philip Rosedale: We're 12 people right now. We've got one guy in Costa Rica and one guy in Seattle. The rest of us in San Francisco. But, you know, sometimes traveling. And we are very effectively able to use, admittedly with all this hardware, you know, we got to plug in all this gear right now, but we use that system every Friday at lunchtime to sit and look at each other and talk about what's going on. And it's pretty cool. I mean, it's a pretty amazing experience to be able to look around somebody to see somebody else or make eye contact or watch their body language as you talk to them.
[00:05:34.158] Kent Bye: And even though it's low res, do you feel like that it's a compelling enough telepresence tool to make it into the enterprise?
[00:05:41.318] Philip Rosedale: Yeah, we did a test of that. I mean, the short answer is yes. We've done some testing with enterprise, more corporate executive type users, and they love it. The simple test we did that's a really great proof point of that is we recorded ourselves all telling a short story about our life, all using the same avatar. But its face, of course, is moving to mirror ours. And then we turned the sound off. And within five or six seconds, you can tell who everybody is just by watching the avatar move and the body. So I think that's a good example of where if the technology is sufficiently good that you can pretty quickly know who you're talking to, it's got to be useful for real communication.
[00:06:19.856] Kent Bye: In this second iteration of building a virtual world, it seems like you maybe have new perspectives in terms of doing things different. One of those things seems to be distributed computation. Maybe you could talk a bit about what you're doing there.
[00:06:31.993] Philip Rosedale: Yeah, the other big part of what we're doing at High Fidelity is building an infrastructure that allows us to use more than just server machines. It allows us to use everybody's machines as part of the computing system that is the simulation of the virtual world. We think that's really important because we know that if all your friends show up at a party in your virtual world, you don't have enough bandwidth from your home to serve that experience to them. you're naturally limited. Virtual worlds of the past, like Second Life and game systems, have also had the same problem, where you just can't have too many people in one place. So we've attacked the problem of distributing a lot of small computers very quickly to solve problems like these, and also being able to use people's computers at home to do part of the computation of the virtual world. So for example, a pet or a flock of birds or something like that in the virtual world can be running on somebody's computer while they're sleeping and usefully contributing to the experience for everybody else.
[00:07:29.624] Kent Bye: And it also seems like you're doing some special things with audio in terms of dynamic multicasting. Maybe you could talk about some of the innovations that you're doing there.
[00:07:37.921] Philip Rosedale: Yeah, the general structure that we're building for most data types is really effective with audio, where we're, well, the first thing we're doing is we're mixing all the audio in 3D, so you hear everybody around you in the right way, you hear the reverb in the room. When you turn your head, you know you hear the sound change, just like you do in real life. And then we're building a system that allows us to dynamically assemble a set of multicast sort of repeaters that lets us get that out to a larger audience, so you can stand on stage and perform for 10,000 people that are waving their arms around and stuff. And that can all happen as close to real time as it need be. And in fact, it turns out that if you build the network the right way, you can actually, strangely enough, you can actually broadcast like a concert experience. to a larger number of people with less latency than you can in the real world. Because in the real world, the speed of sound and air is about one foot per millisecond. So once you're 200 feet away at a stadium, you're about two-tenths of a second behind the performers, which is an experience we've all had, and it's not a very good one. We can potentially broadcast to a million people, you know, with an extra 100 milliseconds of delay or something, which is probably better than you can do in the real world.
[00:08:47.263] Kent Bye: And can you talk about the importance of using voxels and what a voxel is?
[00:08:52.025] Philip Rosedale: Right, so what's a voxel? A voxel is a way of organizing space. It's basically a way of saying what's in a certain region of space. Some games, like Minecraft, color that whole voxel a solid color and create this kind of cool look that we've all, I think, in some ways grown enamored of in the last few years. We're using voxels a little bit differently. They actually represent what things look like at a distance, and they can be subdivided or added together as you get farther away. And what that enables is for us to have, say, a mountain ridge in the distance with a small city on it with little blinking going on as people are changing things down in that city. And all that processing over there is happening on a bunch of machines that don't have to talk to you because they're just updating these voxels and the voxels are kind of like pixels that just change their colors a little bit to tell you there's something going on over there. So what voxels do is enable us to see an infinite distance in the virtual world When you get up close to things, at least today, we're still going to give you the option of using full or mesh topology, meaning anything you could import from a 3D warehouse around the web, you're going to be able to drop into High Fidelity and see it in its original glory, but only when you're up close to it. When you're farther away, it turns into boxes.
[00:10:06.025] Kent Bye: I see, so it sounds like you're doing some sort of hierarchical nesting of voxels within voxels, and then instead of rendering out every single component of a complex scene, but you're also enabled to distribute that, it sounds like?
[00:10:18.314] Philip Rosedale: Right, so that's a great observation. We are hierarchically nesting the decomposition of space. This is called a sparse voxel octree. And then we are deploying those trees on nested servers, so that as you put more content in an area, at some point we'll just create another server that's actually storing the gigabytes or so of data in that space. And by doing that dynamically, and by borrowing machines from everybody else, we can create an infinitely complex, say, city. that uses this voxel partitioning to store everything and then also servers become voxels themselves so they represent everything that's within a smaller space.
[00:10:59.226] Kent Bye: And so coming from founding Second Life, which is like one of the first really big popular virtual worlds, and then moving into High Fidelity, what are some of the other things that you decided to really do differently than the first time around?
[00:11:11.157] Philip Rosedale: Well, really focusing on audio and latency, really focusing on the speed, the sort of texture of the experience at the level of bouncing a ball off a wall or something like that. I think we've kind of tried to focus more on the things that we didn't get a chance to do yet in Second Life first, like audio processing and scripting. We chose to use a scripting language, JavaScript, that's a worldwide language standard, rather than writing our own, which we did in Second Life. We've taken a new look at lots of things. You know, I think people want to run their own servers and interconnect them, and that's fundamentally a part of High Fidelity's strategy. It's fundamentally a protocol and an open source standard, and then a set of services we provide, rather than being a world unto itself.
[00:11:54.335] Kent Bye: And so in Second Life you had Linden dollars, and I've seen that you're going to be having some sort of cryptocurrencies within, you know, have you decided on a currency yet?
[00:12:03.115] Philip Rosedale: We don't know what we'll call the cryptocurrency or if we'll somehow meaningfully link to other cryptocurrencies. We are pretty sure that the computation economy, the ecosystem in which people are sharing computers, will be one in which they are paid with a cryptocurrency, which they probably are then going to be using to buy and sell things from each other. And that'll form the basis of the economy. I don't know what the name of that currency is yet. We're still thinking about it.
[00:12:25.551] Kent Bye: I see, so not only are they distributing their compute power to help actually run the world, but they're potentially having sub-processes of that to actually mine coins, it sounds like.
[00:12:35.734] Philip Rosedale: Right, although we're focusing on making the mining a very small part of the computational load so that most of the time what you're computing is somebody else's, you know, trees blowing in the wind in their backyard, which is a far more interesting use of computation resources than just doing hash solutions, which is what you do with something like Bitcoin.
[00:12:56.662] Kent Bye: And what are some of the questions in terms of identity of how is identity different in Second Life versus the high fidelity?
[00:13:03.724] Philip Rosedale: I think identity and high fidelity has to be more open and more like the web in the sense that by default you don't really identify yourself. That is you make the decision around what types of identity you wish to disclose either to a server that is to an area you go into or to another person that you meet. So we're building a system where we are useful as a kind of a a federating authorizing agent. We're using OAuth and SSL for that. SSL at the server side where when you come into somebody's server, they may challenge you and say, hey, I'm not going to let you in unless you have the following level of security. But there will be many, many public servers, no doubt, for which there won't be any requirement We also think that the sort of name floating over your head that you see in so many games and virtual worlds is not the right solution. It doesn't work in a truly heterogeneous planetary scale environment where you're walking around. Nobody in the real world would want to go to a new city or a new neighborhood and have their name floating over their head because deciding to tell your name to somebody else is a choice that you make as a form of greeting and not something that's imposed on you by having something over your head. So, you know, I think there are a lot of rich questions around identity that we're tackling and I think we've got some good solutions for it.
[00:14:23.535] Kent Bye: And so when it comes to the metaverse going with Second Life, I'm just curious in terms of how you see this issue of creating a metaverse or many different metaverses and where you see that going forward.
[00:14:36.300] Philip Rosedale: Well, I think that what we can and will do is create a very large set of interconnected virtual worlds. That said, I think those virtual worlds can actually be adjacent to each other in a larger kind of a metaverse space. So where the internet is linked together by hyperlinks, one text link to one page, jumping to another page. I think we can have a door opening on to somebody else's server or even your backyard bordering on, you know, looking out into the sky and seeing Google's planet floating in the distance. I think that's all things that we're going to want to do. And so there are things we're going to enable in the hardware layer. So I think the virtual worlds of the future will probably feel like a fairly well-connected set of spaces simply because that's a navigational paradigm that we all understand. Now, there will be hidden servers that you can't get to in the same way there are intranet websites that you can't get to today unless you're, say, working for that company or whatever.
[00:15:32.609] Kent Bye: And finally, what is it about working in this space of virtual worlds that really inspires you in terms of what you want to see happen with all of this?
[00:15:40.825] Philip Rosedale: I think there are many things, but one would be how people are changed by virtual worlds. Having gotten the blessing of being able to see that so much with Second Life, people are changed for the better. It makes them more thoughtful and more open and more creative and many times more functional or productive as a member of the human society for the time they spend in virtual worlds. That's very inspiring. The second one is I think that virtual worlds are our future in some sense. I think that we are going to build and then go into a series of worlds like these, what we're trying to do with High Fidelity, that are going to take over a lot of our time and become a real added space that, you know, the real world is an example of and this virtual world is a profound extension of. And I think being part of that revolution is something that will always keep me working on this project.
[00:16:31.403] Kent Bye: Just one little add-on on that. When you say that, the first thing that I hear the skeptics say is, well, isn't that just sort of escapist in terms of going into a virtual world like that?
[00:16:41.813] Philip Rosedale: Right. Well, it's only escapist if you're going into a place with a lesser set of capabilities. Moving to New York isn't escapist. I mean, I think that what's happening is technology in this wonderful, relentless set of changes is giving us the ability to create worlds inside the computer which are every bit as sophisticated, complex, navigable, challenging as the real world. And who wouldn't want to go and explore those places? So I think they're only escapist if we're escaping from other people or simplifying things in a way that we want but maybe isn't best for us. And that's just not true with virtual worlds. Second Life is a great example where to be really successful in Second Life, you have to be very smart and very entrepreneurial and capable. So is that escapist? I don't know. Great.
[00:17:31.039] Kent Bye: Well, thank you so much, Philip.
[00:17:32.280] Philip Rosedale: It's great. Great. Thank you. Thanks for having me.