#1475: A-Frame’s Diego Marcos on WebXR Momentum and Integrations with Open Source AI

I interviewed Diego Marcos, Founder of A-Frame and worked on Firefox and one of the original creators of what would eventually become the the WebXR specification, at Meta Connect 2024. See more context in the rough transcript below.

This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality

Rough Transcript

[00:00:05.458] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR Podcast. It's a podcast that looks at the future of spatial computing. You can support the podcast at patreon.com slash voicesofvr. So continuing my coverage of MetaConnect 2024, today's episode is with Diego Marcos, who's the founder of A-Frame and has worked on WebXR and is one of the original creators of what would eventually become the WebXR specification. So Diego and I had a little bit of a friendly bet that he didn't think that Apple was going to ship WebXR in any version of Safari on any of their platforms before 2025. So I took him up on that bet back in 2022 and just a couple of weeks ago, back on September 16th, 2024. Vision OS 2.0 was released, and in that version, the version of Safari is shipping with WebXR without any flags and by default. So finally, we have both Chrome and Safari shipping with WebXR, and it's kind of like a green light go for the entire WebXR community. So Diego has continued to work on A-Frame, and I wanted to just catch up with him and to hear some of his latest thoughts. It sounds like he's also integrating lots of different generative AI features and really excited about all the different open-source AI models that he's been able to start to play around with, just in terms of thinking about content creation for 3D content and just how overall some of these different AI systems can start to be integrated within the context of WebXR. So that's what we're coming on today's episode of the Voices of VR podcast. So this interview with Diego happened on Wednesday, September 25, 2024. So with that, let's go ahead and dive right in.

[00:01:42.488] Diego Marcos: So I'm Diego Marcos. I've been involved in VR since the very early days before the Oculus DK1 released. And my path was working at Mozilla at the very beginning in the Firefox browser. I'm part of the original team that started what is called today the WebXR standard. And also there, as a side project, I started with some friends there, A-Frame, which is a framework to develop web-based 3D content that is accessible to anyone, regardless of technical expertise. That's our goal. We want to make everyone capable of creating compelling 3D experiences. Yeah.

[00:02:22.017] Kent Bye: Great. Maybe you could give a bit more context as to your background and your journey into the space.

[00:02:26.580] Diego Marcos: So my background is from the very early days. So I went through college, and I specialized in computer 3D graphics. But then I then actually started my professional career in 3D computer graphics. And I was working in the academia for a while, doing data visualization for scientists. So I kind of always had this desire or need to go back. And it didn't happen until I actually got access to one of the very early DK1s that released before the Kickstarter of Oculus. And then I got obsessed with it, and I started to think around the side. And at the time, I was working on Firefox OS and Mozilla. And as a side, I keep tinkering with VR. And one day, all of a sudden, it became my full-time job. A bunch of us at Mozilla, we started to work on VR on the side. We managed to persuade some leadership that this was going to be important for the company and for the web. And we released the first web VR spec. that Google found interesting as well. They decided to implement in Google Chrome. And yeah, it became my full-time gig. And it's been a decade since then. And yeah, I'm very happy with the progress.

[00:03:43.447] Kent Bye: Awesome. Well, a quick question, a follow-up on the Firefox. Because I know they implemented the WebVR standard. Have they implemented WebXR yet?

[00:03:50.432] Diego Marcos: Yeah. So for people that don't know, WebVR was the precursor of WebXR. So at the beginning, there were only VR headsets. So we called the standard WebVR. And then people started to talk about XR, MR, AR, the HoloLens release, Magic Leap release. And then the term WebVR kind of falls short. So then we redesigned the API and the spec to accommodate other use cases. And what was known as WebVR, it became WebXR. And now, like nine days ago, Safari for Apple Vision Pro released WebXR. And today, mission accomplished. All major browsers are shipping today. WebXR enabled for all users. So we won.

[00:04:41.141] Kent Bye: Yeah, we had a friendly little bet that I think it was 2022. You said that you didn't think that Apple was going to actually ship WebXR before 2025. And I said, I'll take that bet. And then nine days ago, they shipped Vision OS 2.0. And now with Safari, at least on Vision OS, is shipping with WebXR. So very happy to win that bet. I feel like the whole industry wins. Thanks for paying up the $50. But more than anything, I'm just happy that my optimism proved correct to the fact that they would eventually come around. But it was a little bit down to the wire. It was literally like the last time they would have released anything. If they wouldn't have released it for Vision OS, I would have lost the bet. And it would have been really sad. And I would have been upset on multiple levels. But I'm really happy that it feels like now, like a green light go. Like now we can actually start shipping different WebXR experiences online. all these different platforms and have different ways for people to go beyond the App Store model. I mean, meta is already changing things up. But at least you can write it once on the web and then have it on all these different platforms.

[00:05:43.604] Diego Marcos: So I think we should take credit for that. I think that bet, because people pay attention, right? And I think, I want to believe this is all speculation, but I want to believe that the bet we made put actually pressure on Apple to ship the, what should I say? Because they saw that people care, it's something that people wanted. And I want to believe that we actually influence. And without that bet, it wouldn't have happened or it would have taken at least like one or a couple of years. So yeah, I'm super happy. Today is a reality. You can implement your WebXR application once and really deploy on every single device. There's some differences. For example, the Apple Vision Pro only supports hand tracking, and other headsets don't have hand tracking, or they have both controllers. But overall, the differences are minimal, and we can say that, yeah, implement once, deploy everywhere.

[00:06:34.537] Kent Bye: Well, I know that you've been working on the A-Frame open source project for a long, long time. And I think for a while, you were maybe doing it as a startup or your full-time gig. And what's been happening with A-Frame as all the different time that you've put into it and working on it full time? Where are you at with it now?

[00:06:51.411] Diego Marcos: So November 2024 is going to be nine years I've been doing A-Frame, getting close to one decade. Yeah, and the community is amazing. It's super active, super creative, and it's been a journey. So it's been always my side project. It's kind of I'm my baby, and I kind of let it go. So for people who don't know, it's like sometimes it's very hard to have a long-term commitment with an open source project that doesn't have a business model. So people get tired and sometimes it's like creating something new and ship it is very exciting. But when it comes to maintaining something that already works, there's a lot of like dirty job that is not fun to do. Right. So and it's not easy. But I don't know, it's this community that kept me going, people making amazing things. And I always want to support and empower these people to create stuff. And it's what kept me going. If the community hadn't responded, I probably have moved on. And it's been nine years, and I'm very happy to do it. And yeah, it's amazing.

[00:07:49.976] Kent Bye: So it sounds like it's still a bit of a hobby project or a side project, but not your full-time employment. But what have you been working on as your full-time gig?

[00:07:58.803] Diego Marcos: So I have a startup I've been running since 2018 called Super Medium. So the justification is like, A-Frame is audience, right? We've been growing for a decade and it's like, okay, A-Frame is going to be open source and free forever, but what else can we offer on top of A-Frame that people will be willing to pay for that makes the whole adventure sustainable, right? I don't know if you've been following Super Medium, we released first a browser for VR headsets in 2018 when we didn't even have standalone headsets. We released on Steam for HTC Vive and Oculus Rift. Later on, we released, by the way, the browser was implemented on A-Frame. We released SuperCraft, which was a casual content creation tool for VR headsets as well, also implemented in A-Frame. We wanted to prove the world. One of the skepticism that people had at the time is like, is the web or WebVR or WebXR ready to ship content that people want to use, it's like, OK, we're going to prove it. And for that, we took, what's the content that people love today? And dancing rhythm games was one of the most popular genres with Beat Saber at the time. It's like, what about if we create a web-based version of those games that people love? And if we are able to implement it good enough for people to actually use it, we prove the point that the tech is ready to reach our users. And we made what is called Moonrider, which is an open source version of Beat Saber, let's say. It has a saber mode and a punch mode that people actually love more than the saber mode. And it's using the songs that the Beat Saber modding community have been creating for years. And it turned that people actually loved it. It was completely playable. People really enjoyed the gameplay. The performance was great. And we grew up to 300,000 monthly active users. So today, to my knowledge, is the most popular WebXR or WebVR content ever. And we proved that point. It's like, yes, there's no roadblock. The tech is ready for you to implement compelling content that people want to use. This is one of the other things, not only the browser, WebXR itself, but also A-Frame as well as a content creation tool. So that's out there, that data point. The goal was not like creating a business, but making a point. It's like, hey, I'm sending a signal to all developers. If you want, you can actually build an audience with WebXR content. That's the other thing we did. We also created a comic book reader for VR headsets as well, also using A-Frame. And today, it's like, oh, I'm wondering, what's the next step for A-Frame, right? It's like, the mission of A-Frame is making 3D content creation accessible for anyone, regardless of technical expertise. And it's, what would be the next step? And Gen AI came into our world all of a sudden. And it's like, oh, man, it's like, I'm seeing what Gen AI is doing for images and video. It's like people that never contemplated before to be creative with images and video are now doing amazing things. I want the same thing to happen for interactive CD content. And it aligns completely with the A-frame vision. So right now, I'm trying to think, what would be cool workflows and fun workflows on top of A-frame that take advantage of Gen AI? So today, if you go to supermedium.com, you can already see the first attempt to that. It's like we have a text-to-3D API. that integrates with A-Frame. You can just have an entity, pass a prompt, and all of a sudden a 3D model shows up in your scene. And I'm exploring a ton of different directions. I would like to eventually to completely bypass the coding part and the same that we have for images and video, go from text directly straight to an interactive 3D scene. That's the dream. And I don't know still how this is going to happen, but stay tuned.

[00:11:52.887] Kent Bye: Yeah, a few days before Kinect started you had put out a post which was essentially like in VR you do white boxing where it's a very like sparse hallways and pathways and you had done a style transfer where you had given a prompt and then it had put a bunch of textures on top of it and you were just kind of walking through the scenes. It was more of like you're producing a 2D video of this 3D model and then inputting that video into other Gen AI that was then putting another layer of all the textures. And it actually catalyzed a lot of discussion and people had a lot of strong opinions around that. I think there's this larger fear of the type of creativity that humans can make versus when you start to rely upon these models that can have a little bit of like, you know, look and feel in a style that can sometimes be homogenous or like it doesn't have the diversity of perspectives that you get from human creativity in a way that it can be difficult to control or steer it or My fear with the type of generative AI that robs the human creativity aspect of it in that it just feels like this monoculture thing that doesn't have any personality, character, or soul. That's my fear and perception with the AI versus the types of handcrafted, bespoke, passion-driven projects by people in a way that has that spirit and soul and that human creativity that sometimes the AI stuff doesn't have as much. I think there was something around that post that was tapping into a lot of these different debates, but I'd love to hear some of your thoughts since you were going back and forth with lots of folks.

[00:13:18.167] Diego Marcos: Yeah, I can understand the concerns a little bit, especially like most of the models and workflows and tools that we have today don't give you a lot of control. You pass a prompt and you get something out, that something can be very similar to what someone else has created, but But we are seeing more and more degrees of control that allow people to express their taste or vision in a much more concrete way and defined way. And for me, HNAI is just another tool. I'm old enough to remember people expressing similar kind of concerns with Photoshop was becoming popular. Or electronic music, like people using samples to create music. People were saying, oh, you are stealing, or that's not real creativity. You don't know how to play an instrument, right? You're just using a keyboard and a sampler, and that's not real music, right? You don't know how to read music notation. You don't know how to play an instrument. In turn, right now, electronic music is accepted as a form of creative endeavor. And I think we are in that transition where people look down or dismiss Gen-AI But I think it's going to change. It's not a process like for some people threatening because this accumulated experience in certain tools like the same. It's like when we had like the first like 3D movies that we were using computers, people were like doing animation, like drawn by hand. And that skill set kind of became obsolete, right? Like nobody except like very kind of like ghibli studios still do animation by hand but for the most part people use computers and they don't draw by hand anymore right but at the time like people that had that skill in their hands and they felt like empowered and respected all of a sudden like this computer was disruptive in the world so i think you need a little bit of transition and time for people to accept the new status quo and to retrain themselves and to feel comfortable. And I think it's going to happen the same with GNI. So I understand the concerns, but I think we will address it over time. Yeah, everything normalized. Yeah.

[00:15:33.416] Kent Bye: Well, given your career path within the context of Firefox and then working on these open source project now for over nine years for A-Frame, there's a lot of announcements today from Meta around the Lama models, a lot of open sourcing for what they're doing. So there's this interesting thing where these big companies are open sourcing a lot of these technologies and then driving innovation from at the edges of people that are adopting them and kind of pushing forward what's even possible. And so I'd love to hear, since you're in this open source space, if you're looking at things like open AI and they're more closed models, if you're more interested in looking at things like Staple Diffusion or even like the Llama models, or if there's any other open source projects that you are interested in potentially integrating either from your own production side, or if this is something that could be small enough to be as a part of the websites, or if the models are way too big right now to really be functional, if it's something more that people are downloading on their own computer to then use to be able to generate stuff that may be smaller to be able to distribute.

[00:16:32.250] Diego Marcos: That's what I like from Gen AI these days. It feels so similar to the early days of the web. I'm talking about all the open source, like LAMA, stability models. It's crazy. Creativity is people doing all the crazy stuff. There's no constraints. do all sorts of random things out there. It feels edgy in a way that is very similar to the early days of the web. So yeah, I'm basically benefiting. The text to 3D API that we released is using Triposer, which is the open weights model that Stability AI released. and yeah i'm always like always looking and following what was going on on the open source ecosystem and see what we can reuse and and hopefully one day i can also contribute back i don't know i'm not planning to train a model but eventually yeah one day and if i train a model i would like to this month to be open source as well and yeah talking about the web is very interesting too because if you realize The feeling in the last decade is like the web has been always behind native, right? And with AI, it kind of reversed. Like if you look at the space, ChatGPT has become like a tab for many people in their browser, right? Like RunwayML to generate video, Meet Journey are also like a tab in the browser. everything, all the innovation is happening on the web. And for me, with the advent of WebGPU, what we are going to see is not only the front ends are going to be web-based, but also we will be able to do local inference in the browser, right? Like you could have like a hybrid models where it's part of the inference runs in the browser locally with the smaller models. And you have like something more heavyweight that can run on the cloud. So yeah, there's tons of opportunities opening up. And I think like open source is going to be key for this space to flourish.

[00:18:17.762] Kent Bye: what do you make of some of the announcements that were more web-based there's actually a lot of stuff that was i guess more in the developer keynote that happened after the main keynote that was today at medic connect but you know and just talking to some folks here just the excitement around like one year ago there was a lot of discussion around hey we need to have some way to like have a payments that are happening in the browser and now like a year later they're starting to implement some of these different methods for people to, in the browser, start to exchange money and have payments that are happening. But I'd love to hear some of your thoughts on what's happening specifically here at the MetaConnect and some of the announcements that were made today that were very specifically web-based.

[00:18:52.557] Diego Marcos: Yeah, so the announcement was we are recording this a few hours after the keynote. So I was at the keynote, but I haven't had the time to look at the specifics of each of the features. But we have progressive web apps that was announced as a key part of the web strategy, and also the web payments integration. So what I tell people is there's been a couple of years that WebXR is good enough to create content, and all those things add value on top. It's like, yeah, having a low friction way for WebXR applications to integrate payments is amazing. But the good thing about the web is we are not forced to use those platform-specific APIs. You're not forced to use meta payments. It's optional, and it can be very convenient. But if you don't want it and you have different ideas, you can just integrate Stripe and roll out your own. That's the beauty of the web. Yeah, those things add value and convenience, and it's great to have. But what I tell people is the web and WebXR are good enough to create content. And the good thing about the web is that you don't have to wait for the platform to provide you those pieces. You can roll your own. Yeah?

[00:20:01.749] Kent Bye: Well, one of the things that I think originally came out with the progressive web apps The ideal would be that you're on a web page, you click a button, and then all of a sudden there's an app icon that's on your home screen. Android has that, but it's been difficult to have that in iOS. And certainly with meta, in order to have a progressive web app, you still have to go through all of the store distribution channels rather than just having it. So I'd love to hear some of your thoughts of like, if it's a good first step for at least having some of these more 2D progressive web apps. It feels like generally that after Apple Vision Pro launched that Meta has had a lot more concern around trying to create these spatial computing applications, a lot of which happen to be web-based. So having more and more of these progressive web apps seems like a great advancement, but also feels like something like Wordle has to have a whole app that you have to download rather than going to the Wordle page and clicking a button and then having it automatically show up as an application. I'd love to hear some of your thoughts, if you feel like this is a good first step, or if there is more concern around Meta still trying to have everything go through their own application store process, rather than just having the true spirit of progressive web apps, where you don't have to have any interface with any app store at all.

[00:21:07.303] Diego Marcos: FRANCESC CAMPOY FLORES- Yes, as you said, I have my own opinion and taste. There is obviously demand of people and developers that they want to use web technologies to develop native or native-like applications. And for those people, progressive web apps is amazing. For me, it's what you said. It's not the culture of the web. The culture of the web is to have a link, Or just type a domain, or search on Google, and click, and you are in. You don't have to install anything. It's not an icon. So I personally am not planning to use it. But I acknowledge that there is a need and a demand for it. And many developers will benefit. So that's my take on it. I'm glad that it exists. But for me, the magic of the web is not seeing an icon on a native app. It's like, yeah.

[00:21:54.577] Kent Bye: Well, yeah, I guess as we start to wrap up, I'd love to hear what you think the ultimate potential of XR in the open web might be and what it might be able to enable.

[00:22:03.383] Diego Marcos: Yeah, in WebXR, XR, or everything, we've seen grand visions for VR and XR for the last decade. And I'm coming back to a more grounded, for me, VR and AR are an amazing toy, an amazing creative tool, right? And at this time, I don't have more grand visions than that. It's like a fun toy and a new way to express yourself and create new experiences that were not possible before. If those things are going to be mass market or a small niche, I don't know, but it's fun. And that's enough for me and for many people in the community. And yeah, and we'll see what we get. Yeah.

[00:22:47.529] Kent Bye: Awesome. Do you have anything else that's left unsaid or any final thoughts you have to share to the immersive community?

[00:22:53.137] Diego Marcos: No, thanks for everyone. And I think I'm very happy with the work that Meta and the Google and the Apple team did in the last decade. We ship it. We did it. Super grateful. Yeah, good job. I only have good things to say. The only thing, I would have liked to be a bit quicker. We're already here. We're already here. Good job. And also happy for all the A-Frame community that stuck for almost a decade. Yeah, I wouldn't be here if it's not thanks to the community and people that appreciate what I do. So yeah, thank you.

[00:23:28.196] Kent Bye: Awesome. Well, Diego, thank you so much for joining me today on the podcast and for your many years and over a decade of working on WebXR as a spec and as a technology and over nine years for A-Frame. And yeah, and also just want to also report back to the community that you honored the bet and you paid up for the $50. And yeah, I think it's like a celebration that now that we're finally at a deflection point and now we can start to see where we go from here to see all the different web technologies. And honestly, pull in a whole other realm of developers from 3GS, 3GS React Native. It feels like there's an entire community of web developers that opens up spatial computing and these different devices that they can start to innovate and push forward what's even possible. There's lots of stuff that is web native, like you said, with a lot of stuff even with AI. But more and more data visualizations and other types of stuff that hasn't been game engine-y with Unreal Engine or Unity. So I'm super excited to see what types of new applications of XR that we start to see because of these web-native tools and all the different libraries that are out there. So I'm super excited to see where it all goes. And thanks again for all your work for many years for helping make it all happen.

[00:24:37.273] Diego Marcos: Yeah, thank you very much. And I'm glad that I lost that bet. Yeah, I'm very happy. Yeah, me too.

[00:24:46.746] Kent Bye: Thanks again for listening to the Voices of VR podcast. And I would like to invite you to join me on my Patreon. I've been doing the Voices of VR for over 10 years, and it's always been a little bit more of like a weird art project. I think of myself as like a knowledge artist, so I'm much more of an artist than a business person. But at the end of the day, I need to make this more of a sustainable venture. Just $5 or $10 a month would make a really big difference I'm trying to reach $2,000 a month or $3,000 a month right now. I'm at $1,000 a month, which means that's my primary income. And I just need to get it to a sustainable level just to even continue this oral history art project that I've been doing for the last decade. And if you find value in it, then please do consider joining me on the Patreon at patreon.com slash voices of VR. Thanks for listening.

More from this show