#81: Josh Carpenter on WebVR & upcoming VR experiments from Mozilla including collaborative web browsing with TogetherJS

Josh Carpenter is a VR researcher at Mozilla looking to see how to combine the best of the web with what the VR communications medium can offer. Mozilla lead the effort to release a WebVR API in June and it’s supported in Firefox as well as Chrome.

josh-carpenterMozilla is trying to answer what makes a great web experience in VR by doing a number of research experiments that they will be releasing sometime in late October or early November.

They’re trying to answer the question of “What’s the strength of VR on the web?” They want to move beyond just adding more screen real estate and look to the “Jobs-To-Be-Done theory” for insights about what people are actually trying to get from the Web. They want to learn, socialize and be connected, and so they’re thinking about what it means to connect to a friend or look up a piece of information in VR. Some of their experiments are more transient VR experiences via the web, some are more integrated social aspects, and they’re also looking at mash-ups of the web and VR.

Josh talks about building out collaborative browsing experiences with tools like WebRTC real-time communication and TogetherJS for collaborative browsing and chat. They’ve also been working with the directors of ROME: “3 Dreams of Black” on a new VR experience.

Josh is seeing a lot of interesting things happen in the JanusVR community in that if you give people easy syntax and tools to create VR experiences with a low barrier to entry, then people will create a variety of VR experiences. He’s also interested in tracking the progress of mixing the web in social spaces with AltSpaceVR and some of the video experiments by the eleVR team.

Finally, he talks about the support for WebVR for Safari and Internet Explorer, and what he sees as some of the educational possibilities and ultimate potential of creating the metaverse from the foundational components of URLs, View Source and APIs.

Theme music: “Fatality” by Tigoolio

Rough Transcript

[00:00:05.412] Kent Bye: The Voices of VR Podcast.

[00:00:11.975] Josh Carpenter: My name is Josh Carpenter. I work at Mozilla in virtual reality research. I really come at it from a user experience angle. So what I'm trying to do, along with our team, is to figure out what does a really great web experience look like in virtual reality? And then, when you're experiencing that web experience, what is the user agent? How do you type in a URL? How do you grant permission? How do you check your history? I mean, how do you do the most basic interactions in virtual reality for the web?

[00:00:38.295] Kent Bye: I see. Yeah. I mean, so you think of a browser, you have a history and a URL web. And how do you take these concepts of the web and translate it into working in 3D environments?

[00:00:48.244] Josh Carpenter: Yeah, you got it. You're exactly right. There's some just neat challenges, like you say, typing in a URL. But even the more interesting thing in the long term is we've been creating APIs that let anyone who's a web developer, and there's millions of web developers out there, to take the skills they already have in WebGL or in HTML and CSS, and then just say, I'm going to make a website. But instead of having a bunch of flat planes like a magazine, I'm going to tilt those in a 3D plane, and I'm going to make a box out of them, or I'm going to make a world out of them. And then I'm going to use a really simple API call, like just make my site full screen, and I'm going to make it into a VR website. So right now, today, you can go and get a VR-enabled version of Firefox. There's one for Chrome, too. We're both using the exact same APIs. And you can publish a website at a URL you can send to anybody. And then you can just hit full screen. And it'll read your Oculus Rift's head-mounted display tracking data. The browser will take care of all the matrix distortion using the information from the SDK. So you, as a developer with a couple lines of JavaScript, can basically build a VR-enabled website. And so what we're doing is we've made these APIs, but we're not totally sure yet what exactly we're going to use the web and VR for. What's the special alchemy that happens when we combine the unique strengths of this new medium with what the web is uniquely good at? So we're doing a ton of experiments in that, experimenting with some are more evocative, some are more utilitarian, some are more about social experiences. And we're just trying to figure out what this is good for, while also testing out the platform capabilities. Like, where does it break? Where do we need to get to more performance? Latency is one big issue we're having. Like, how do we get latency down? So we're just basically doing kind of what you would think from Mozilla Research, which is doing a bunch of experiments in a bunch of different areas.

[00:02:21.602] Kent Bye: Yeah, it seems like since SVVRCon that the web VR has had a lot happen over the last number of months, since like March. And so what would you say has happened in the last half year in terms of the web and the VR and what type of growth and evolution that you've seen?

[00:02:38.295] Josh Carpenter: Yeah, that's a great question. I think the big thing for us was in July, Vlad, one of the guys I'm working with, he's a co-creator of WebGL. And so Vlad put forward a proposal to do this VR API that, like the one I was mentioning, takes head tracking data and through a JavaScript call, actually pulls in information. He published that, and then so did Brandon Jones and the Chrome team. So now we've got VR-enabled browsers out there right now, and that's really exciting. I think the other really big thing that's happened is you're seeing more and more cool stuff happen around the Janus VR community, which is web-like. It's not exactly like classic web syntax, but you'll be able to write one HTML file, describe a scene, and have it work, and even import other web elements, like YouTube elements, images, meshes, etc. Look at JanusVR and the vibrance of that community and it really tells me that it's kind of like a market signal that people, if you give them an easy syntax and give them tools, that they will create and share and publish VR sites. At a much lower barrier to entry than let's say making an Unreal or a Unity game. I love what I'm seeing with Unity and Unreal games. There's a whole class of person who wants to make stuff but doesn't have the time to let's say be a Unity or Unreal creator. I think that's where the web's always been very strong, like low barrier to entry, no need to publish an application, quote unquote. You just make a site, share it with a friend through URL. So I think the new APIs and then the continued vibrancy of some early players like Janice are really, really exciting.

[00:03:56.794] Kent Bye: Yeah, I know Brandon had ported over Quake 2 or Quake 3 and have that in the browser. That seems to be sort of a direct port of a game that's converted into WebGL to be able to play. I'm curious of your thoughts on that and other sort of exciting, flashy things that you've seen for people to try out.

[00:04:13.003] Josh Carpenter: Yeah, I mean, I... I was a lead designer in Firefox OS previously. This was a whole operating system, a web-based operating system. What we were trying to think about was, what does the term app even mean? Most applications have a fundamental web component. Think of Facebook. A whole stack of Facebook is built on the web, right? But it's not in a browser. So this continued blurring between what's an application and what's a site is really intriguing to me. So if you look at something like Quake 3, well, you know, Quake 3, you might want to install that to a home screen. That's something you go back to a lot. That might be more of a quote-unquote application. Maybe what the VR web is good for is not so much apps, because then we're competing with the operating system and that's not really a good position to try and be in. Maybe the place for the VR web isn't kind of like transient bite-sized experiences or experiences that lend themselves to really kind of a social collaborative components or mashups. You know, the web is really good at mashups. Everything being open and everything being standards-based means that we have all this content we can draw on and mash together in really cool ways. So we're working on, for example, weather visualizations, things like that. So we've got some really cool ideas. Our timeline is basically late October-ish, early November, we're going to be sharing some of our early ideas. And because we're Mozilla, we're kind of first people into the jungle right now. And the undergrowth is very thick, and everything's very hard to even do basic stuff. But we're actually pushing our way in. What we want to do is lay a road down behind us. So we're taking all the code that we generate, we're going to throw it up on GitHub, and then anyone can just fork it and have fun. Our only end state, our only goal as Mozilla is really to make the web as strong as it can be. So the more people experimenting with the stuff that we build and giving us feedback and then working with us to do cool stuff as a web community, the better.

[00:05:51.800] Kent Bye: Yeah, you sort of have this potential of the walled garden situations with a lot of the iPhone and Android apps and sort of appification of things. And so yeah, if there is a way to do a simplified or at least streamlined workflow to be able to get things onto the web, I know that with Unity 5 coming out, they're going to have a direct WebGL export, which may help. But yeah, like you were saying, it's still more of an app rather than what the web may be best for, which is social experiences.

[00:06:20.778] Josh Carpenter: Yeah, I think that's exactly right. I mean, I'm actually an Apple guy, so I'm an Apple guy and a Mozilla employee, so I kind of straddle the boundary. I love the App Store and I love my beautifully designed Apple iOS operating system, but I also really love the fact that with the browser, And the magic of a URL bar is you can go anywhere from anywhere. And that's incredibly powerful. And to be honest, I think people tend to think it's got to be one or the other. And I really don't think that. I think that you want to live in a world where you got both. And frankly, when DVDs came out, radio didn't go away. And new technologies don't tend to kill old technologies. It just kind of joined them in the pantheon of technology. So I think that VR, apps, browsers are all going to kind of continue to, this sorting will kind of continue to happen. And frankly, it's just, I know it's such a cliche, everyone keeps saying it, but it's just really fun to be here at the beginning of all this and to kind of feel like you're a pioneer in a new medium. That's the really fun part.

[00:07:18.127] Kent Bye: Yeah, one of the most obvious sort of applications that I could think of right off the bat is something like a Twitter stream or Facebook feeds where you're able to even be a little bit more immersed into all of the streams of information to be able to very easily and fluidly move between those and be completely immersed by these streams rather than seeing them on a 2D screen. As part of your experiments, have you looked at some of the social streams in that way?

[00:07:43.293] Josh Carpenter: Yeah, absolutely. I mean, I think the way we talk about it, I'm a big fan of jobs-to-be-done theory, which is kind of an analytical framework for thinking about innovation. When I chat with people like, ah, I'm trying to imagine the web in VR, I guess you'd have more windows visible at all times. That might indeed be part of it, you know? But I think that that might be a failure of imagination if I were to embrace virtual reality. I mean, I think ultimately, What you need to do is kind of step back and think about, you don't open up the web browser to look at flat panes of content. That's just the presentation format. You open up the web browser to connect with a friend or to just do a quick Google search for a piece of information or to go to Wikipedia and browse that information, you know, to learn, to socialize, to be connected. Those are kind of the fundamental jobs to be done that we hire the web to do. So when I think about those fundamental abstracted jobs to be done, again, separated from the presentation formats of the current web, and then I think about VR, that's where I get really excited. So we're, a lot of really neat concepts we're working on, where, you know, we're taking, even in some cases, technologies that we innovated at Mozilla, or that have come out of Google, and other companies that didn't really work so hot in 2D, like collaborative browsing, for example. Collaborative browsing in the web browser doesn't really make a ton of sense, and it's kind of a weird thing, but, you know, collaborative browsing in VR, if you imagine what the Airbnb or the CNN.com of VR could be, And then you imagine a social component to it, maybe even with an avatar, like a standardized avatar system, it gets pretty gnarly. I mean, that feels a lot more like the web I know and love, married with what's really cool about VR. So that's kind of where we see things going. And in the next couple of months, we're just going to get some code out there that let people kind of create these experiments. And we're just going to keep making the platform as good as it can be, and then let devs together figure out what the cool stuff is and what's going to work. And honestly, next year at Oculus Connect, I really hope there's some really rad devs up there on a web VR panel talking about like, ah, here's how you make a web VR website. And here's so-and-so, and he made this amazing game. Or here's so-and-so, and he made a social Twitter education mashup. I'm pretty confident that's going to happen.

[00:09:45.142] Kent Bye: Yeah, we were just talking to Symatic Bruce of AltspaceVR, and they're sort of creating a Unity base, like a room where you can do collaborative web browsing and trying to also bring in some of the more web VR components. And a thing that comes to mind also when you're talking about the social aspect is that in JanusVR you have the ghosts that are kind of the representing of the previous actions of people walking through these rooms looking at kind of like gallery pictures of web pages and you get this sort of like sleep no more type of immersive theater effect where you see like the attention that's been drawn and you sort of the the wisdom of the crowd starts to emerge based upon previous actions and it sort of reminds me of this like what if you could do eye tracking for everybody and capture that and then store that and share that in some way and so you're experiencing the web through this sort of like collective wisdom of a lot of people looking at specific things. And so you tend to look at what they're looking at. So, I don't know, that's sort of like where my mind went with that. And so I don't know what you see as the ultimate potential for, you know, collaborative web browsing and how do you have a social aspect around something that is so linear and people skim and read at such different paces, how do you have a collaborative web browsing that goes beyond just watching events together, but is into actually digesting content?

[00:11:09.750] Josh Carpenter: I mean, I think we could probably riff together and generate about dozens of these ideas. That's what's so fun about VR is there's so many low-hanging, really gnarly ideas. But one is that, imagine what a teacher does with her classroom. She takes the kids on kind of a voyage to a place, right? And you show up at classroom at the appointed hour and then she says, today we're going to learn about ancient Rome or ancient Greece or a battle of the Bulge, you know. And then you're transported there through her words and the visuals she puts in the projector and the things she writes on the chalkboard. You know, the same fundamental mechanic applied to like, let's say, a browser. and to VR could be pretty amazing. I mean, we're getting into kind of Rainbow's End territory here. Rainbow's End was an amazing examination of AR and VR and education, but it could look something like that. Show up at the appointed time for guided tours of the web with so-and-so, and so-and-so almost becomes like a celebrity. And that person can even trigger events in the scene. So it was at this point that Napoleon strode into the clearing, and then she hits a button, and she cues the trigger of the event that runs a function that makes the Napoleon sequence play, Napoleon walks up and we can kind of examine him. He gets off his horse and speaks to the class. It's not even that science fiction. We have most of the technology to do this. It just doesn't look like it because it was all designed for 2D world. But when you look at, I'm going to get into web techie terms, but when you look at WebRTC, which is real-time communications over the web, no plugins, When you look at stuff we're doing with Together.js, which is collaborative browsing, you see other person's cursor on the page at the same time, and it's a one-to-one connection between you and that person. You can chat and send images back and forth. You see all the pieces you're going to need to be able to build out this VR collaborative browsing experience. Right now, we're only constrained by time and resources. We can't possibly do it fast enough. We've got a lot of really cool stuff to work with because it's the web. That is the inherent strength of the web. There's so much technology. and content for us to work with. So yeah, we just got to build it.

[00:13:01.506] Kent Bye: And can you speak a little bit more and extrapolate some of these small experiments that you've done with the web and VR and some of the main takeaways that you've gotten from that?

[00:13:09.673] Josh Carpenter: Yeah, yeah, totally. So as folks may be familiar with something that Chris Milk and Ricardo Cabello of 3JS and a really rad other team did a couple years ago was Three Days of Black Rome. It's like a music video done in WebGL, it was a Chrome experiment. And it's like a music video you can look around inside as you kind of move forward on a track. And it's a combination of live footage of cell-drawn, hand-drawn animation. and then 3D animation. And it's just a stunning, interactive music video, totally done in WebGL, but it wasn't done for VR. So we're actually working with some of the same people who made that, and we're doing some visualizations, kind of like fly-through visualizations of WebGL world, showing that, hey, if you're a WebGL developer, and you know 3JS, maybe you're a creative coder, you can now make a WebGL-based VR site, and it runs really nicely. That's one example. And then on the other end of the spectrum of web technology, you've got the DOM, which is declarative HTML and CSS. It's kind of what we traditionally think of as being the web. We're also making websites built out of the DOM. So let's say, like, take Wikipedia. Wikipedia is totally a DOM-based site. Well, maybe in Wikipedia, you can take those elements and you can kind of twist them in 3D and build them into shapes and such. Maybe, for example, if you go to cnn.com, there's like a 360 video playing in the background streamed from Jaunt VR or Condition One or NexVR, these amazing camera companies. And that's playing in the background. In the foreground, you've got kind of informational overlays. And the reporter, the notion of what a reporter is, is like the person who's kind of guiding you through what you're seeing in this conflict zone, or maybe at the foot of this volcano that might be erupting in Iceland. So you've got a combination, in that case, of WebGL providing the video backdrop. And then you've got traditional DOM elements facing you as a user, saying where you are, the date and time, other related stories, pulling in information from CNN's existing properties, which, because they're all built in WebTek, we can access. They're all there. It's HTML and CSS. So we're trying to work on even responsive design, which is a big thing in web design. It used to be you'd have to make two totally separate sites, one for mobile, one for desktop. But now by using smart CSS authoring, you can actually have a little bit of code that says, oh, I'm on mobile, I'm on desktop. make it look like this or this. And 90% of the content is untouched. It doesn't change between the two. So we want to extrapolate that and say, all right, maybe one website can go from a VR site to a mobile site to a desktop site. So we're actually trying to work that out as well. Like, what are the things a site has to look for or the user agent has to look for to know that it's a VR site? So it's always like fundamental, difficult web questions we're working on. But if we get them, then there's millions and millions of web devs. And ideally, hopefully soon, there'll be millions and millions of headsets out there. And then we can really start to build some cool stuff.

[00:15:37.317] Kent Bye: And do you see the fabric of the communication between these different websites as being something like XML or JSON? Or how are these sort of mashups going to be facilitated?

[00:15:47.562] Josh Carpenter: Yeah, I mean, not to get too out of my depth, but APIs are obviously pretty rad. When a site publishes, like when Twitter publishes an API and lets people draw on that API, when Wikipedia has an API, it lets me as a content creator or as an application creator go and grab that data and pull it out and run a query against it. Give me all the Oculus Connect tweets, for example. Let me display them in a really rad mesh that I fly through, for example. So APIs are one. And then just URLs. The URL is kind of like the original. Actually, I was going to say a URL is the original API, but that sounds way too authoritative for someone who just pulled out of his ass. There's people at Mozilla who really know the theory of this stuff. But URLs are kind of amazing, because I can target an image somewhere. An image has a unique URL. So I can go and say, hey, grab that URL and pull it in. Or grab that YouTube video and pull it in. So that's one of the things that makes the web so great for mashup purposes, where as you think of the application model, what's inside an application, by and large, is really a black box. And applications kind of have to really work to opt into a bit of transparency using APIs that are ever evolving. But with the web, you get that transparency by default. Even the fact that you can view source on the web by default is huge. So if I go to a 3JS built, 3JS is a framework for WebGL, really awesome stuff. It was what was used to make the Three Days of Black Rome video. I can go and view source and see how Ricardo Cabello and his team actually built that. Like, that's really huge, right? you know, APIs, URLs, and just view source are through the pillars of what makes the web so rad. And I think we'll enable this kind of mashup experiences.

[00:17:08.296] Kent Bye: What's some of the most exciting things that you've seen come out of this VR on the web?

[00:17:12.837] Josh Carpenter: Yeah, I mean, I would say it again, Janice. I think what they've done is really inspiring. I mean, really hats off to them. They're really showing that there's a thirst there to do this kind of stuff. AltspaceVR, I'm really intrigued. I haven't checked out their stuff but I saw it a couple months ago and it's pretty cool what they're doing too. And then there's kind of experiments percolating here and there. LVR is a team out of San Francisco. I'm a research team, a bunch of really, really smart people from MIT. And they're doing like a WebGL video player. So they're taking side-by-side 3D, 360 video footage and making it playable in the browser just using WebGL. using Firefox and Chrome VR-enabled builds. So there's just three off the top of my head. And then another guy, Tony Parisi, we're working with, he's doing Second Wave GL, taking 3JS, which are still pretty programming intensive, and he's wrapping them in declarative tags. So instead of you writing, like, 40 lines of code to say, OK, give me a sphere with these properties, you just say, in HTML, just say sphere. And you give an ID, like, sphere beach ball. And then in CSS, you say, yeah, Beachball has this kind of shininess to it. It has this texture map, and it's this big. So you're programming 3D websites, but in a way that's very familiar if you're a traditional web developer. So you're seeing all these people, when this is where the web is so strong, is all these people just build and release cool open source tools. And then the web dev community just rapidly, rapidly gets better. I mean, the web dev community has never, ever been more strong than it is right now. If you're a web developer, you know that actually the pain we experience now is having to learn a new framework every week, because something amazing comes out every week or every month. That's what it feels like. The tools are just getting better and better and better rapidly. So I want to take that same phenomenon and turn it on for VR, because I know web devs will respond, and we'll have some really cool tools.

[00:18:47.655] Kent Bye: And so we have Firefox, we have Chrome. What's going on with Safari and Internet Explorer? Those are the two other major web browsers when it comes to getting buy-in to implementing the web VR APIs for it to really take hold.

[00:18:59.660] Josh Carpenter: That's a really good question. So Safari, mobile Safari and iOS 8 can now play WebGL. So I went to the Apple store the other day, being a big Apple fanboy, and I held the beautiful, beautiful iPhone 6 in my hand. I went to 3js.org, first thing I did, and I kicked the tires on some of the demos that Ricardo and contributors have actually made. They went really nicely on the iPhone 6. So now you've got WebGL, which was innovated like seven years ago by Vlad and a team of creators, co-creators now, the Kronos Group, totally standardized API for accessing WebGL through JavaScript. It's now pretty much everywhere. It kind of won the battle. So you can now make 3D beautiful experiences, and they'll work everywhere. And then with IE, my understanding is that in the case of Internet Explorer, they've also expressed interest in VR recently. I don't know where that's at, but obviously they've got a huge... I mean, I think they're still the largest browser out there. So if they get involved, that's really a good thing. I can't help but think that, hey, they just acquired Minecraft to make... You know, I'd love to play Minecraft in VR through the browser. That would be pretty rad. I think the best thing that can happen is for Oculus and the devs who are here today just to kick ass, do really great stuff, sell a lot of head-mounted displays, and make the addressable market as big as possible. And then it makes it really easy for guys like me to go to our leadership and get them excited about this stuff. I mean, they're already excited. They're all fans. In the traditional browser market, I think in most traditional software companies, we need the market to be as big as possible so we can make the case to our bosses, basically.

[00:20:23.085] Kent Bye: I guess one really tricky thing in terms of the user agent is having something like Gear VR showing up as a mobile agent, but yet it may actually be a virtual reality headset. So how does Gear VR, just entering the picture within the last couple of months, start to change how you're going to know whether or not this is a VR HMD or a mobile cell phone?

[00:20:46.593] Josh Carpenter: Yeah, that's a really good question. That just kind of gets back into the, so I'm a content creator, I publish a website, and I want to know a little bit about what I'm running on, right, what the user's viewing me on. And so right now I've got media queries, which basically let me say like, oh, okay, well the screen is this big, so I can assume it's a mobile phone versus a desktop phone. I'm not an expert on this stuff. I know there's other tags out there you can say, like, you can kind of declare how you would like to be handled, but they're all pretty primitive. There's also stuff like app manifests, which is a format where you can publish a little bit of JSON that says like, hey, I'm an app. Here's the permissions I require. Here's my icon. That's if you want to create a web experience that's more supposed to be presented as an application, maybe even installed. So I think that we have some prior art and some precedent here for sites understanding what they're being run on and how they want to be presented. And now we just need to kind of extend those and figure out how they work in VR. And that's some of what we're working on right now. Also, there's nothing really unique about the Gear VR from that perspective. It's running Android. We have an Android browser, for example, which is really awesome. It's Firefox for Android. So in theory, our code base could run on that platform that they've released, which is something else that I would love to start to work on, is like the gear VR, a gear VR web experience.

[00:21:53.992] Kent Bye: Great. And finally, what do you see as the ultimate potential for where you see what virtual reality and the web could go?

[00:22:02.114] Josh Carpenter: I read a dystopian future novel called Shovel Ready last week. And then I put it down and I picked up Ready Player One to finally get around to reading that. And I'm like, man, I need to stop reading dystopian VR futures where everyone's poor and they all escape into VR and to some extent or another. That's a terrible opening to that response. But when you think about what the people who created the web had in mind when they started building it, they had in mind something like the metaverse. They had in mind anyone being able to connect with each other anywhere in the world in a really distributed way. I think VR is really the capstone to that. I think when you spoke with Vlad at SVVR a couple months ago, and Vlad says something to the effect, I love this line, that We all want to build a metaverse. And the good news is we have got 90% of what we need. We actually did the hard part. All the networking infrastructure, all the sites, even getting people's head to wrap around the notion of there being a browser so my mom can use it. All that stuff has been done. And now we just need to make a couple APIs and make it sing in 3D at high performance to actually enable maybe, I know Occus is calling it the final platform. but like the web that you can step into, where you do all the same things you've always done on the web, but now you're doing them in an immersive realm with all the capabilities that it has. And then hopefully, you know, us being Mozilla, we really hope that the way that that shapes up is, it is, that there's some openness to it, there's transparency to it, users aren't being taken advantage of, they have some sovereignty over their data and over their privacy. These are things that we really believe in. I think if, you know, if we do those things, maybe we avoid that, the dystopian components of this technology. Like any technology, you need to balance what's cool about it with some kind of careful management of user need and values.

[00:23:35.734] Kent Bye: Great. And is there anything else that's left unsaid that you'd like to say?

[00:23:39.597] Josh Carpenter: No, man, I thank you for doing all you're doing to kind of foster the community. We've talked in a couple different podcasts, a couple different groups, and it's really awesome. I actually kind of envy you because you get to interview a ton of really interesting people, like me least of all. But there's some really cool people out there. And yeah, man, thank you for doing what you're doing. Great. Well, thank you. Cool, man. Thank you.

More from this show