#1088: Croquet’s Browser-Based, Operating System for the Metaverse: An architectural & philosophical deep dive into real-time collaboration

Croquet is an operating system within the context of a browser that serves as a virtual machine that enables bit-identical simulations across multiple computers synchronized via a dedicated network infrastructure of global reflectors. Croquet officially launched on Tuesday May 17th, and I had a chance to do a demo and then speak with Croquet founder David A. Smith on May 13th where we do an architectural deep dive on this operating system for the Metaverse using open web technologies.

Croquet’s approach bakes real-time collaboration into the core of the operating system, and aims to simplify & streamline portions of the multi-user networking architecture for the open web aiming for a 15ms latency for the Internet and as low as 5-10ms using 5G networks. Using a global reflector network means that a single server receives simulation state changes from any of the users, adds a timestamp, and then redistributes the master state to all of the computers. It also takes and distributes snapshots as an optimization feature to ensure that every computer has bit-identical representation for what’s happening within a shared virtual world. Adding this type of simulation synchronization as a core functionality at the operating system level and supported by external hardware is part of what has the potential to make Croquet’s approach help web-based applications to realize some of the explore some of the potential dreams of an open Metaverse.

Most of the applications that we use on our PC, mobile phones, or tablets are sandboxed within a native app that’s functionally like a virtual machine where you have a 2D window portal into the app. With the future of spatial computing, then that sandboxed 2D frame disappears, and applications become objects with behaviors that are interactively combined within a shared virtual space. I first had this conceptual breakthrough in talking with the team of PlutoVR about their spatially nested, multi-app ecosystem that they’re developing. Thinking about how real-time, multi-user spatial environments could be considered core functionality at an operating system level starts to open up some philosophical clarity as to what exactly is new and different with whatever the Metaverse may evolve into.

Another key feature of Croquet is the emphasis of being able to change and develop the simulation you’re embodied within in real-time. Part of the demo that was really striking to seeing how it’s possible to change some code and then have it instantly and automatically deployed to the server as well as to the virtual world copies of everyone that’s co-present within an experience. Being able to reduce the iteration cycles to real-time, distributed pair programming has the potential to catalyze a lot of experimentation and innovation in how to integrate the open source libraries that the open web is built upon. Croquet also intends to cultivate their own database of objects and behaviors to more fully leverage “View Source” collaborative spirit of the web as applied to XR applications. It’s worth noting that Croquet had not implemented WebXR or voice chat at the time of my demo and interview, but it’s certainly on their road map.

In my conversation with Smith, he took me on a historical journey through the evolution of Human-Computer Interaction and early dreams of computing starting with Doug Englebart’s October 1962 report Augmenting Human Intellect: A Conceptual Framework, Englebart’s Mother of All Demos at the Computer Society’s Fall Joint Computer Conference in San Francisco in December 1968, David Reed’s June 1976 thesis “Processor Multiplexing in a Layered Operating System”, and the contributions of his Turing Award collaborator Alan Kay that have helped to inform and inspire the Croquet operating system.

Croquet plans on open souring portions of their project, but since their operating system requires supporting a hardware infrastructure, then they plan on charging based upon how many messages are sent after a certain point. But there are free tiers to get started in experimenting with what’s possible. Rather than worrying about how to layer multi-user networking on top of whatever web-based application you’d like to develop, and then paying for the associated server costs, Croquet intends to help streamline all of that. It’s still in it’s early phases, but hopefully picks up some developer momentum and starts bringing some more of the millions of JavaScript & web developers into building web-native Metaverse applications.

LISTEN TO THIS EPISODE OF THE VOICES OF VR PODCAST

This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to The Voices of VR Podcast. So in today's episode, we're going to be doing a architecture and philosophical deep dive into the metaverse, specifically through the lens of looking at the Croquet operating system, which is designed from the ground up to be collaborative, real-time, browser-based operating system to be able to facilitate all sorts of new things within a open metaverse. So Croquet was just launched on Tuesday, May 17, 2022. And it's an operating system that is running within a browser. So they're treating a browser like a virtual machine. And then from there, they are able to have a synchronized simulation between different machines. So you get sent to a URL. You go into that URL, and you have multiple people within the same location, and they're all seeing the same thing. So what they're doing is taking the simulation as a model and pinging a reflector, and that reflector is putting a timestamp on any changes and then reflecting it back to all the individual users. And from that, they're able to achieve this kind of real-time collaborative environment. So a lot of the operating systems aren't designed like this from scratch to have these collaborative environments that are running in real time. And so they had to kind of reimagine some of these core capabilities that would be happening at the operating system level. And then given that, that's going to enable all sorts of new multiplayer collaborative environments that are happening specifically within the browser base. So it's going to be pulling in lots of different web developer communities and starting to leverage a lot of the real-time REST APIs and data that you get from within the web ecosystem. So, thanks for coming on today's episode of the Voices of VR podcast. So, this interview with David happened on Friday, May 13th, 2022. So, with that, let's go ahead and dive right in.

[00:01:49.657] David Smith: I'm David Smith. I've been doing what we call VR metaverse since the 80s. Actually, I built my first head mount around 85, and I actually used that to do a telepresence. We're using a data glove to control a Puma 560 robot that had a pneumoelastic hand on it. So you can actually, as you move your hand, the robot would follow that exactly. And then the pneumoelastic hand that was on the robot is very compliant. And so when you close your hand on something like a can or something, it would wrap around it and you'd be able to manipulate it without crushing it. But then we had a head mount so we could actually see it in stereo from a distance and we'd actually be able to get eye-hand coordination. which was particularly cool. Then around that time, I also started working on what's really probably the first adventure shooter game called The Colony. That came out in 1987 and was basically the first I was fascinated by this idea of being able to create a virtual environment that I could explore, wander around, and in this case, where an adventure could happen. Colony did pretty well for that time. It was Mac-only when it first did it, and it was certainly one of the top-selling games on the Mac. Not a huge market, but it was an important one. A couple things happened out of that. The first was Jim Cameron, the director, was working on the movie The Abyss at the time. He saw a pirated version of the game and asked me to help visualize his set, the deep core underwater drilling platform that's in the movie. And so I did that, and so I actually could walk around on the set before it got built and see what the camera angle's going to look like. They realized a whole section of set was not even going to show up on film, so it saved them a couple million dollars since they didn't have to build that. The other thing that happened was Tom Clancy, the author, got very engrossed in my game. He spent about a month, and he was calling me every other day He was never calling me to ask for hints. He never did that. He was just saying just what an asshole I was for making this game very, very hard. But he finished it and he was like so excited. He said, I got to work with you in some way. And so he became my, I was starting a company at the time, Virtus Corporation, creating a product called Virtus Walk through the first real-time 3D design tool for PCs. He was my outside investor and first board member. And he and I were talking about what would be next after that because he wanted to do a project together. He introduced me to the FBI hostage rescue team at Quantico, Virginia. They took me on one of their training missions. And it was amazing. These black helicopters fly over this concrete town. Ninjas come out, all dressed in black, and they start blowing things up. And it was so much better than any movie I'd ever seen. So I told Tom, we've got to do a game on this. And so that's where Rainbow Six was born. And in fact, my partner at Croquet today was Brian Upton, he was the game designer for that, and later joined Sony PlayStation as one of our chief game designers. But the other thing that happened around the same time, this was a very busy time, Alan Kay saw a Verge walkthrough. In fact, he was introduced to it by his wife, Bonnie McBird, And that's actually kind of interesting because Bonnie wrote the initial screenplay for Tron. And she met Alan when she was doing the research for that. She basically asked, who is the best computer scientist in the world? And everybody pointed her to Alan. So Bonnie introduced us And we started working together, and we've been pretty much working together ever since. The discussion was about what's next. I mean, Alan, as you're probably aware, invented object-oriented programming, not necessarily the way we think of it today, but the way it was intended. as objects and messages. But he led the team at Xerox PARC that created the Alto, which is really the father of almost every device you think of today. When you think of the Macintosh, Steve Jobs visited Xerox PARC and Dan Ingalls gave him a demonstration of the future. And Steve said that within 10 minutes, he knew every computer would work that way. And in fact, they do today. But one of the things that happened was that when they created what we think of the PC today, it had communication. They had designed Ethernet there, they created Ethernet. And so you could do emails, you could text, you can share files, but they couldn't do live collaboration. And the reason that really mattered, you have to go back a little bit to Doug Engelbart, Doug is known as the person who invented the mouse and certainly did that. But more importantly, he was after a new way of engaging humans, what he called augmenting human intellect. And he didn't think of it as just a human and a computer. He looked at it as the humans working together mediated by a computer. And he did this demo in 1968 that just transformed reality, completely changed what people thought of as what a computer really is. They understood it finally as a true medium of engagement, of sharing ideas, simulations, that sort of thing. And what happened after that was a bit sad because his team wound up leaving him and going to Xerox PARC and they started a project there to do kind of the next generation of this amazing system that Doug had demonstrated. But then the Alto shows up, this amazing device, took all the air out of the room and all the collaboration stuff that they were working on was abandoned. So coming back to where Alan and I were talking, it was like, what was the next big step in computing? It was pretty clear to both of us, it was picking up the threads of what Engelbart had done and making them work in a modern world. you know, the idea of having a true collaborative system, you know, using the computer as a communication platform, we saw as a very important thing. So, after we started Red Storm and Rainbow Six game, but later, Alan and I joined forces to work on this full-time, we brought in David Reed, David was the architect of the UDP protocol and was also a co-architect of TCP IP, which we're using right now. And his thesis was the idea of replicated computation. And the idea is you have two computers and what you're doing is maintaining their bit identical state, running a simulation by sending them the exact state transform. So you say, here's a black box, I send an event, a message to that black box and it's going to transform. But if I send the same message to all the black boxes that are identical state, they'll also be transformed in exactly the same way. So the problem is he'd never implemented that. So we spent actually a few years trying to figure out how to make that work, working with HP Labs. And we actually did, we succeeded with that. The problem was it was all done in small talk, not a big market, but it was an incredibly important proof of concept. A little later, I wound up going to Lockheed Martin. I was a senior fellow there running AR and VR work for their training and simulation group. And we were building new generations of head mounts, which I think actually are even better than a lot of, most of what we see today. But the other thing we did is a version of the system that I'd done with Alan for the defense department called the Virtual Framework. And that's actually still being used today. But running forward a little bit, when I left Lockheed, Alan said, we have to build the real thing. So I told Alan, well, I want to do this as an operating system. And Alan said, well, you know, it takes five years to build an operating system. That was five years ago. So the thing we built is, I think, quite remarkable and quite interesting. I got very lucky. I inherited Alan Kay's team. I've been working with them for many years already, so I kind of knew them. They were part of Y Combinator Research. They were a peer group to OpenAI, and when OpenAI spun out, they wound up joining me. And so we started Croquet Corporation to build this thing. And what it is, is there's no other way to put it, but it's a true operating system that is designed around collaboration and enabling people to create their part of the metaverse.

[00:10:32.405] Kent Bye: Nice. That's a really good background and context and quite a journey. I have to say with interacting with a lot of luminaries in the history of computing, maybe one or two Turing Award winners there amongst your collaborators and advisors and people that are thinking about this. And so I guess maybe a good place to start is, and you're talking about an operating system, but this is happening within the context of a browser. And so I have a browser that's already maybe running on top of say Linux or Android or Mac, but you have within the context, like almost like a virtual machine where you have something that's sort of sandboxed out with the browser and all the browser. Maybe you could talk about how you're potentially using new aspects of say new web standards, web assembly, other aspects to kind of create an operating system.

[00:11:18.089] David Smith: Exactly. It's not like a virtual machine. It is a virtual machine. The way this thing works, I talked a little about David Reed's thesis design, but what we're doing is all the participants in this, say, shared world, are running a bit identical virtual machine. What I see is exactly what you see, maybe from a different angle. I'm running a simulation and that simulation is evolving exactly the same way lockstep on my system as it is on yours. That means, by the way, once you have that deterministic computation ability, you don't really need to update the state for everybody. In other words, I don't have to send you any information about what's going on with it unless I interact with that state. So imagine a shared physics engine, which we can do. If you interact with that physics engine, then you want to make sure that interaction is shared. But if all you're doing is watching it, then there's no need to update the system because everybody's running that simulation in lockstep. So the way this works then is a shared virtual machine. Everybody has that exact same state, and it's responsive to the user. And the way that works is when I interact with it, we send a message to a kind of server that we call Reflector. Reflector is basically a stateless server. All it does is when it receives that message from one of the users, it takes that and adds a timestamp down. It can't even look at it. The messages are encrypted end to end. So the reflector gets that message, puts a timestamp, redistributes it to everybody, and then that virtual machine will take that event, that external event, and then it'll compute all the simulation up to and including that new timestamp and whatever that event that was sent. And everybody has that same action. So for example, I'm looking at a physical object and I tap it, And of course, I'm giving it an impulse, and it's going to jump, right? Well, when I do that, instead of sending that message directly to the local version of that object, it's sent indirectly via the reflector, and then it comes back to everybody, including me, and we get exactly the same end result. So that object will jump in the air, or whatever it does, and it'll spin around, do everything, but it'll be exactly the same on everybody's system. That means that we can share even very complex simulations between each other without much of a bandwidth load. In fact, very, very little because we don't have to communicate the state transitions. We just have to communicate when a user is engaging with that. But it opens up a huge play space for creating new kinds of applications that would have been very, very hard or even impossible before. And so that's one of the key elements is you cannot build a collaborative system from the top down. You can't add it to existing systems. It really has to be baked in at the bottom. And once you do that, however, and basically your operating system is built on top of that, then every action is multi-user. So for example, you can even do live coding inside that world. So like I showed you, you could do collaborative text editing, right? Well, that could be, in fact, I showed you this, where you could actually modify the code. Everybody who's in that world sees that code being modified, and then you compile it. That's a replicated event. And all of a sudden, that new function or that new behavior is now on everybody's system. And the object that you've attached that to is now running that behavior without having to reboot, without having to restart. It's basically you're developing in a live system, but you're developing a live system with other people.

[00:15:09.930] Kent Bye: So yeah, maybe you could take me back to the moment when you told Alan Kay, you needed to design an operating system. He told you to take five years. It did take five years, but what was really the catalyst that made you want to build this effort of croquet? Cause there's already existing proprietary systems with windows, but so much is built on Linux these days. Why not just build another flavor or version of Linux? What was it that Linux does that? Yeah. that prevented you from doing what you wanted to do?

[00:15:40.752] David Smith: There were a couple of really important reasons. The first was that, as I mentioned earlier, you can't layer collaboration on top of a system. It has to be at the foundation. It has to be kind of assumed that that's there for everything else that happens. The second thing was really key was that it had to be a live programming environment. This is going back to the original ideas of Smalltalk. Smalltalk was a system where you literally create the system, modify the system from within itself. And I think one of the reasons that Smalltalk system was so extraordinarily influential, and you know, there's like almost everything you take for granted within a computer screen today was invented there. Alan invented overlapping windows. The idea of having to invent overlapping windows is kind of funny to us today, but somebody had to do that. Dan Ingalls invented pop-up menus. And think about it, where he invents this idea and then he's able to code it right in the system he's running. And he doesn't have to quit, doesn't have to reboot, doesn't have to do anything. He's right there. And now he's running that system. So one of the things that was clear to us is that if you think of this system as a communication platform, then you have to enable collaboration at the very bottom. And the other thing is that has to be a live environment because I'm going to, the communication I'm going to do is not just me talking like I am right now, communication I'm going to do is I'm going to generate a simulation that you and I are going to share that both of us are going to be able to interact with and modify. as part of that conversation. I call it actually an augmented conversation where the computer is a full participant. And when you and I are talking, I say something and the computer is listening and generates a simulation of that. And now you can touch it and interact with that. But then say a third person shows up. And what you want is that third person to be able to instantly access that object that we just created and be able to start engaging with it the same way we are. So that required a very, very different approach. The idea of an operating system is an interesting one because they're all a little bit different. We kind of went down the one path where we took the kind of hard-coded OSs like original Mac, Windows, various versions of Unix and Linux. They were designed for something very, very different. They weren't designed for real-time interaction, live engagement. We had to rethink it, and it really required a very different approach to even the kind of language you use. In our case, we wound up going with JavaScript. And there's a really good reason for that. JavaScript is a very, very dynamic language. It's an interpretive language, so you can actually modify that code while the system is running, just like you could in Smalltalk. And even more important is the fact that the platform that you deliver JavaScript in was on every single person's computer and every single person's phone and every single person's head mount. So building an OS on top of a browser made so much sense because all of a sudden we get rid of friction. As you saw, I just go to a web page and you're inside of a shared virtual world where you see me and we're both able to interact. That was a crucial part of that. But the other one that, from a business perspective, there's 16 and a half million JavaScript developers out there. That's more than all the other languages put together. So if you build a system and you want traction, you want people to engage with it, the web is just an extraordinarily good place to do that. The other thing that's really important is we've all played with the web and saw it evolve over the last few decades. It's astonishing how powerful it has become, how fast it is. Looking at the JavaScript engines like V8 are just exceptional. But the other side is we're going to see some even better boosts in performance. So for example, WebAssembly is a game changer. As a good example, we use the Rapier physics engine. We helped fund the development. That's an open source Rust-based engine, but it runs in WebAssembly. And what we did was we made sure it runs perfectly replicated, multi-user inside of Croquet. So now we've got a WebAssembly Rust engine that we can use for other kinds of applications as well. And that's running at near native speed. We're also seeing a big punch up on the graphics side with WebGPU over the next, probably over the next year or two years, where we're going to see, in some cases, a 10x increase in performance. So the way I think of it, I think of the traditional operating systems as sort of like the mainframes of the world. This is a Clayton Christensen, Innovator's Dilemma kind of thing. The traditional OSs are the mainframes. And then the web-based OS, which were one of the first, I think, is really the PC. It's not as fast, it's not as powerful in some ways, but it can do stuff that the mainframes could not do. And it could do it in a much more affordable way, a much more accessible way. And it's not only removing barriers, but if you look at how it's going to evolve over the next five years, it's pretty clear that The metaverse is going to be defined by the web far more than any other platform, especially when you look at captive systems like Meta or Roblox, which are very niche products. Let's think about that way. They're designed to do a certain thing. What we're after is a general purpose platform. That's why you create operating systems. You basically provide a layer of functionality and capability that unleashed developer creativity. Remember when the Macintosh showed up and how it completely transformed the nature of what we thought a program should be and how it should work and how it should interoperate with other programs. I got my Mac February 11, 1984. I remember that because it was my birthday. But it completely changed the way I viewed the world. And that's what an operating system can and should do. You need to be able to provide a capability that is new, unique, and valuable, and not recreating the existing infrastructure. It's like the traditional OSs weren't really designed for humans, really. I mean, you look at Unix, it's like, well, it was originally a command line system. All these things had windowing systems patched on top of them. But yeah, they weren't deep. They were a veneer. I think of it as the veneer of the Xerox Park Alto. So they weren't really fully thought out. What we were after was something, and I think we succeeded, was a system that is from ground up, multi-user, collaborative, and modifiable by the developers and even the end users. And there's a key element to this, is that the metaverse is going to be primarily a communication medium. It's not just, I'm going to go and look at cool stuff and play with stuff. It's about me and you engaging with information and ideas. This is going really right back to Diego Engelbart's idea of knowledge workers and collaborating on solving hard problems. And so our intent, and I think we're building a system to enable that, is for people to create new classes of applications that allow them to explore complex problems and ideas, along with, by the way, makes it really easy to build multiplayer games. And in those games, by the way, you can do shared physics that are actually interactive with the world. They're not just sugar on the experience. They're fully interactive. You can kick things around inside that world with very, very low latency. By the way, I also think of it this way, is that every phone in the world can call any other phone on the planet, basically. That's the way phones work, right? But the metaverse, if you think of it as a communication platform, that means every metaverse device needs to be able to talk to every other metaverse device. Five years from now, we'll be wearing some kind of AR glasses and we're going to be talking. I'm going to have maybe an Apple branded one. You'll have maybe a meta version But we're going to be able to share this conversation and share the idea space because we're both going to be running what I call plain old telephone service, right? Which was POTS. We're going to be sharing this microverse between us and any device can talk to any other device because we're running via the browser. And so really that's the vision of the system. And that's why it had to be an independent OS. The functionality, the focus of what we needed to make true was very, very different from what you're going to get from the existing infrastructure. In fact, it's laughable to think about taking something like Android and patching it to become a metaverse OS. That's not what it was designed for ever. And it's such an orthogonal problem that what you're going to have is a bad OS that doesn't have a lot of functionality, not more functionality than what you get on your phone. That's not the right answer.

[00:25:13.455] Kent Bye: Yeah. And so I guess one of the things as I reflect on what you're saying is that when you think about the big players in this space right now, you have unity and unreal engine. Those are game engines. Whereas when you think about how information is transmitted through the web, it's mostly through the web browser, which is a document object model, the Dom, which is more static, like showing information. That's not a shared space. It's. more of a broadcast model. It's very static. It's an object oriented. So it's not like a dynamic virtual world space. And so it sounds like that in some ways croquet is a bit like trying to treat the operating system more like a game engine that kind of runs in the browser, but it's also doing all the other operating system stuff that maybe goes above and beyond what say unity or unreal engine is doing because it's more web native that is interfacing with all these other protocols. Is that accurate to say?

[00:26:05.067] David Smith: Yeah, I think that's pretty close. Yeah. I mean, obviously we're not going to be able to do what Unreal and Unity can do today on a browser. I mean, those are native platforms are very well designed and they're pretty amazing, but we do a whole host of things that they can't do. And this goes back again to this idea of comparing the PC to the mainframe. Yeah. They're mainframes. They're big, powerful platforms. However, the PC enabled a whole class of capabilities that the mainframes couldn't touch, and that's true here. This ability to do this kind of sharing at this level, at this detail, and by the way, at this kind of latency, our focus is to be sub-15 millisecond latency anywhere, anytime. And I think we're certainly looks like we'll be able to achieve that. And so the use case is very, very different. It doesn't mean that they can't get to some of this, but we started with the idea, making everything multi-user shared bit identical simulations. Very hard, like I said, to layer that back on top of an existing system. You really need to have that as the foundational kernel of the platform to achieve what you want. And so it's kind of a completely different use case. Although, over time, like I said, the browser is going to get better and better and better to become, I think, ultimately, just like you don't see mainframes today anymore. I don't think you're going to see native apps anymore. the browser is going to be sufficiently powerful and friction-free. This is a key element, the ability to jump from one world to another seamlessly without any kind of downloads. One of the things we did was live portals in Croquet. And those portals are completely sandboxed. They're basically running a kind of iframe so that you're able to see one world from another world and even walk through that, but there's no worry about malware pollution. It's a secure and safe thing. And you can literally see from one world to the other. I'm really proud we've figured out how to do that. But when you start being able to do that, those are the kinds of things that you're not gonna be able to do with these other platforms. For that matter, this tech I just described for doing portals, at some point you'll be able to share anything, any kind of app, won't just be croquet worlds sharing to croquet worlds. We'll make it available to anybody so that if you have a web-based world to start, then you'll be able to create a portal from a croquet space to be able to walk from croquet to your space or vice versa. And I think portals are going to be an essential aspect of the metaverse experience as well. I call it meta-surfing. You're going from one world to the next, to the next, to the next, just like we used to do web surfing many years ago. So yeah, it's a different kind of animal, a different kind of operating system, but very much an operating system.

[00:29:03.333] Kent Bye: And as I'm looking at the tech stack here, I think it might be worth digging into the different layers here because this picture you have at the baseline, you have the user device, which is encompassing both the hardware, it's a computer or XR device, AR, VR device, and then presumably some level of native operating system that is on those devices. And then you have the browser, which we've talked about as this virtual machine. And then the Croquet OS, and then the core systems card manager, and then the metaverse of shared experience. And then Croquet OS, there's things that are coming off of that. From one side, you have like a development server for what's this Croquet shared experience development environment. And then another aspect is this reflector network, the Croquet global reflector network. So there seems to be little bits of metadata that have timestamps that are being sent out to coordinate different States with one of these experiences. And so can you break down a little bit about, is it referring to say an object or an in-world geographical coordinates to have it? And then what else is all included in that, that is being sent back and forth in order to coordinate the synchronization between different instances that have this goal of having 15 milliseconds of latency anywhere in the world.

[00:30:15.757] David Smith: Right. So I guess they should talk about reflectors a little bit. Reflectors, as I mentioned, are stateless servers. And we designed them so you spend almost no time in them. They do their job. You send a message from one of the users to that reflector, the reflector puts a timestamp on, redistributes it to all the participants. That's all it does in a sense. more sophisticated than that, obviously, because you have to find the right guy. But that's kind of one of the first pieces of magic. And by the way, it's very much an edge-based system. We're talking to almost all the telecos right now about putting reflectors onto their 5G MEC, basically the data center near the radios, with the goal of having like a 5 to 10 millisecond latency for users of the system when they're connected that way. But Like I said, when you're interacting, you're running a shared simulation and you interact with that, we don't send the messages directly to that simulation. We send it indirectly via the reflector and the reflector then forwards that message to all the participants so they get an identical virtual time that's integrated into the simulation. So the simulation is like I kick something or I move something and you see that simulation evolve exactly the same way on everybody's system. And so it's essential that the reflector be tiny and super, super fast for that to work. And it really had to feel as if, you know, that there is no delay and you played with it enough to see it's pretty close to that. It's super, super fast. So on that side, the way I think about it is the Croquet OS is really this shared virtual machine plus the reflector, those two together. So it's not, This OS isn't just on your computer. First of all, it's shared among other computers, and it's using that reflector as an integral part of it. Another thing that happens, and you saw this, is when you join the session with another user, you notice that you're perfectly synchronized so that you and I could be working inside. They're moving things around, and a third person comes in. And they joined that really, really quickly. But everything is in its exact right state, basically, that they were able to load in that world that we're sharing. And they've slipstreamed that simulation. They are running now the exact same simulation. So how's that work? when you and I are using the system, we actually track every five seconds of compute time, we take a snapshot of the current virtual machine and we save that near the reflector. So when a new user joins, they grab a copy of that snapshot, And then they basically add water, and now they've got that version of the simulation, that virtual machine running their system. And then all I have to do is play back all the messages that they missed, and they're caught up. So they're perfectly caught up. For example, if you're running this world and you open a QR code inside the world that you saw and scan that with your phone, you are able to have that world jump onto your phone and now you're sharing that world on your phone with what you see on the PC and it's perfectly synchronized. You're able to interact on one and see the results on the other instantly. So that part of the OS, those two pieces work very, very closely together. And then on the other side, we set it up so that you can either code within the system, do collaborative coding within the system. So you can modify, do pair programming, if you like. But you can also integrate with your own existing IDE. So we use Visual Studio Code, and we have a watch server. So what happens is if you are editing, say, one of the behaviors of the system, And then you save it, the watch server will grab that saved version and inject it into the world live, and it's updated. So you don't have to reboot the system, you have to restart it, you just basically do live modifications in place, which is wonders for your productivity. It's just such a smooth experience to be able to do that. And that goes back to one of the other reasons we decided that we had to build an operating system for this, because there was a huge advantage that they had with Smalltalk in that they built it on top of what we consider the modern GC, but they didn't know what the interface was supposed to be when they started. They were able to evolve it using Smalltalk because it could modify that system, modify the interface dynamically. And so they invented the interface based upon the constraints and the capabilities of the platform they were working on. Today, when people think about building user interfaces for AR or VR, they think of it like building a ship in a bottle. It's like, well, this might be good, that might be good. And so they build out the system and like a ship in a bottle, you throw it into the ocean and it's probably gonna sink. It's not gonna be very good. because they don't have that instantaneous feedback. They're not solving the problems of user engagement from within the system. Instead, what you really need to do is create a raft that really floats, throw that into the ocean, and if you have a big enough toolbox, then you can build out the user interface that's necessary dynamically while you're floating along. That's the way it has to be. I don't think it's possible to design user interfaces in the abstract. You have to be fully experiential and you have to have enough of a system there to be able to bootstrap. In other words, build enough of the system so you can build a better version of the system, which is exactly what Dieg Engelbart did. It's exactly what they did with Smalltalk. And that's an extraordinarily powerful way of creating and engaging. So there were a lot of deep reasons to build the system the way we did. And I think it's going to pay off very nicely for everybody because it's a very clean architecture. Obviously, you've seen it, it's pretty powerful. In particular, I think it really is going to enable this idea of the augmented conversation, which I think is going to be an essential part of computing in the future. And it's really, I think, you know, we have some very weak forms of that. You know, Zoom is what we're using right now, for example. That is not a true communication platform. We're talking, maybe I'll see a little bit of what each other's emotional content, but we can't do much more than that. And in today's world, The fact that we have such incredibly limited tools to be able to communicate and collaborate is pretty sad. So that the design of the OS was really dictated by the design of the problem that we were trying to solve.

[00:37:10.572] Kent Bye: Yeah, as you're talking about all this, it reminds me of a problem that comes up in VR chat worlds. That's called the late joiner problem, where if someone starts a game together, everybody has to start it at the same time because there's like a game state that gets tracked that you have to be there from the beginning. Otherwise, people who come in late may not get updated with all the different puzzles that have been solved and whatnot. So the late joiners are not always synchronized. So it sounds like by creating this croquet global reflector network, you're able to not only put up incremental updates, but take snapshots of that world. So that actually come in, they get the latest snapshot and then all the latest messages to quickly get up to speed in terms of whatever that state of that virtual world is based upon all the different collaborative interactions of people that have been interacting with that space.

[00:37:58.025] David Smith: That's exactly right. And it's wonderful. You know, it's like you can join a session anytime and people do. I mean, it's like, and for that matter, so you're working in the world and you've done a lot of work there and you leave. Well, you come back, you pick up exactly where you left off. we should lose nothing. And the system works that way. We basically are continuously taking snapshots. In fact, one of the cool things, we haven't put this in yet, but we will be able to later, is the world is actually defined by its initial state when it was first created, and every message that has ever occurred in time. So you could actually go back in time to the origin, and if all you do is play back messages, all the messages that all the users had added to that, you'll actually reconstruct the entire world to the state it's in now. In a sense, the snapshots are an optimization, so you don't have to go back in time and play all those messages. But in fact, we're adding a rollback capability so that if you get a bug and you crash the system, which I'm sure is going to happen, you can actually go back to a previous version and play forward to see where the bug occurred. to fix the system. But yeah, again, this is a feature of this kind of operating system, you need to be able to join at any time. And you need to, you know, like I said, it has to be a friction free conversation where a third person comes up to us, and we hand them this object that you and I've been working on, well, he should be instantly able to get that and start interacting with it just like we do, without having to like, well, make a new one for me or something, it's got to be that seamless.

[00:39:40.928] Kent Bye: Yeah, you're essentially creating a time machine in some ways.

[00:39:44.010] David Smith: Yeah. Well, time is at the center of this. We used to refer to this kernel as a T-time architecture. And like I said, it's kind of a missing protocol of the internet. It's also an end-to-end protocol like TCP and others. So it's, in a sense, I think if this protocol had existed when it should have, which would have been 30 years ago, we'd have a very different world today.

[00:40:09.818] Kent Bye: Well, I had a chance to speak to you before just to get an overview and actually try out some of the demos and to see this in action and go around these different virtual spaces and changing things and see how dynamic it is. And it's really cool to be in that kind of live coding environment, but in a virtual space together with other people and to see the portals and to see like a window into these other worlds. And I wanted to ask about this architecture that we were been talking about. And I think you had mentioned something like Alan Kay is sort of like a philosopher of thinking about a lot of these things, but also, I don't know if it was David Reed's thesis that you're building upon this croquet global reflector network, but was there some architectural designs that this type of architecture that you implemented that does it have a history that goes back to some of the advisors that you're working with?

[00:40:56.110] David Smith: Yeah, actually. So I mentioned Smalltalk and all of us were deeply involved with that. The first version of the Croquet system was done on that platform and really taught us the value of that instantaneous change. I should back up a little bit. It seems like every five years I've done an updated version of this system. The first one was 1994. And that version, we didn't have the concept of replicated computation. You have all these objects in this virtual world, and each individual object, you had to figure out how to make it multi-user. Every object was bespoke. and had to be designed, okay, when you do this, this happens, when you do that. So you're like wiring these things up. By the way, this is how people do it today. It's just a mess where you're just like, well, I do this, what happens there? And then you have the issues of race conditions. When I do this, that happens over there. And then this happens, it comes back to me and we have this balance. And also, you know, who has priority and how do you ensure we are looking at exactly the same state? Turned out it was impossible to do it that way. That was a huge lesson because it said a traditional approach for doing deep collaboration is absolutely going to fail. You might get away with some simple things, but you'll never build a real powerful application that way. So that's when Alan and I said, let's try to make the right thing. And that's where we brought David into the mix. And his thesis was indeed this replicated computation idea. Originally, he was thinking in terms of real-time banking transactions, but it was exactly the right thing. It just turned out he had never implemented it, so we had to figure that out. But that version we did. In fact, we got to demo it. Alan won the Turing Award during that time, and we actually demoed it as part of his Turing Award speech, which was very, very cool. But that version had a problem itself, which I didn't realize until a little later, which was the entire world was replicated. And I realized that that was a mistake. The right way to do it, there's a programming model called model view, a model view architecture, where the model is kind of that abstract simulation layer and the view is what you see and what you interact with. The next version I did, which is a version I did at Lockheed Martin, was a model-of-view architecture where the model was really that virtual machine. And by doing that separation, that meant, first of all, that the model was the thing you took the snapshot of, which was much, much smaller, much faster. And it also meant that the quality of the system was much higher, much easier to understand, much easier to work with, and forced a certain level of programming discipline on the development process. So that version was the first iteration, but even that wasn't quite right. So when we did this version, it's always, this is the fifth iteration, we learned huge amount about how to design these things to be super fast, super seamless, very, very clean code. One of the lessons we had, Alan, who invented object-oriented programming, said, we need one kind of object, and every object should have enough functionality that you can layer additional capabilities onto it to add the value, but every object should have a core capability. And we designed that that way, and it was magic. A lot of the idea of what we did of this particular approach came from HyperCard, if you remember that. HyperCard was this wonderful platform that Bill Atkinson built at Apple that allowed fairly novice developers to create fairly sophisticated applications. It was a card-based thing. You could take a card, you drop pictures on it, you could create a Rolodex, you could drop buttons on it, that button could be instrumented to load something when you click on it. It was pretty magic. In fact, if you remember Myst by Cyan, that was built in HyperCard originally and allowed for just some magical transportation. We saw that and said, you know, that's an incredibly powerful way to build out a platform. And so we embrace that. So everything inside the Croquet Microverse platform is, we're using the word card, but it's loosely a card. We'll probably come up with another word for it. But imagine everything's the same kind of thing, and you can put these cards together to make more complex cards. And each card can have a behavior that you can drop onto it to allow it to be expressive. And I'm sorry to talk very technical, but the reality is it's so simple, and I even dare say it's fun. It's a fun thing to do, to actually build out the worlds of these things. You drag and drop objects into the world and then you add scripts to it and you're live and everything's multi-user from the word go, even the development process, which is really just a blast and very, very compelling. It's a very different kind of experience than what you get when you're trying to build these worlds and you're by yourself. It's a much, much richer kind of experience. And the end results are far more compelling because they're far, far more interactive than what you'd get with other approaches.

[00:46:18.365] Kent Bye: Yeah, I have a question about time and the synchronization of time across servers. I spent some time at Puppet Labs, which does IT automation. I was working there as a content marketer, but I was in conversation with lots of system administrators. And one of the things that came up again and again was that, you know, one of the first things you do on a server is ensure that it has the NTP server. So the network time protocol that is the basis of all computer security is having the right time And it seems like there's so much a part of this croquet global reflector network that is based upon time. That does that mean that you need to kind of have a synchronization of all the times and does it start to break if it has like a time mismatch between.

[00:47:02.388] David Smith: Actually, that was the magic that David Reed came up with. Time is virtual. In other words, when we send a message to the reflector, the reflector generates a timestamp. That timestamp is literally the time that that session has been alive. But all that matters is in a sense that that timestamp is shared with all the participants, and then they can in turn compute all their internal simulations up to and including that time. It doesn't matter that that timestamp isn't exactly the same as kind of the official time on the computer, which by the way, those things drift like crazy, as you know. I mean, sometimes you have to update NTP by once a minute. So that's a very bad source of managing time. So doing it virtually as a virtual time step gets you the bit identical computation without all the overhead of trying to maintain that time offset. On the other hand, we do use a variation of that for some of the applications. We actually have a thing called Mediaverse, where we run synchronized sound and video perfect within a few milliseconds. And the way we do that, we're using this croquet reflector backbone, but we also are looking at the offsets. We compute the offsets in the same way you compute the NTPs. We compute the time offsets from the reflector to the clients. And so when you get a message from the reflector, we actually know what the actual synchronized time is for everybody. So in a sense, we do that, but we don't have to do that for the regular system. It's only when we want to be able to synchronize two people side by side with their iPhones, and the sound is perfectly synchronized, which is really kind of magic.

[00:48:51.222] Kent Bye: Yeah, I'm sure there's a lot of interesting applications for live music, having that synchronization there as well. But I'm curious because you're talking about time offsets. Does that mean that for any session it's using one reflector? Because if you have multiple reflectors, that would mean that you would have to coordinate a time between those reflectors. And so is it for every session, is it one reflector?

[00:49:12.018] David Smith: That's right. When I joined, whoever the first person to join, is we actually do a temporal ping. We ping what's the closest reflector, and you connect to that reflector. And then when a new user joins, if the session's already operational, then we vector you to that same reflector. So we're all on the same reflector. Now, what that can mean is if I'm in LA and you're in Bangalore, which we did last week, what will happen is you'll have in Bangalore about a 150 millisecond latency, and I'm going to have a 10 millisecond latency. The latencies will be different, which means it'll be a little bit mushy in Bangalore, but for me, it'll be perfect. One of the things we're working on is dynamic latency balancing, which means that we're just be watching who's doing most of the work because one of the things we do is throw the entire session over to another reflector dynamically. So we can actually throw one onto a reflector in Bangalore and all of a sudden that guy now has a 10 to 15 millisecond latency because he's the guy doing all the work and then it can jump back. But yeah, we all have to share the same reflector because we have to have that same timestamp.

[00:50:26.038] Kent Bye: Now, are you using anything for voice chat, like say web RTC? Cause I know there's, that comes up with say audio. So as your reflector network somehow helping to either improve or synchronize as people are talking to each other, are you just using web RTC as a protocol, which is essentially completely different than what you're doing with your reflector?

[00:50:45.833] David Smith: Yeah. For certain things like any kind of large bandwidth sort of thing, it's not really great to put over the reflector network. You could. But you're not going to get a win you want out of it. And the fact is, too, that audio and video are encoded and compressed before they're sent. And so you already have a pretty big latency built in before you even get to our reflectors, for example. So the best thing to do is just use those platforms as is. And we're actually going to be adding voice chat to the system probably the next week or two. But yeah, we use the croquet system to do the synchronization. We know who you are. We kind of know, for example, I'll know where you are spatially using the croquet system. And that way I can spatialize the sound properly. But the sound is still going to be sent via like an RTC channel or something.

[00:51:41.933] Kent Bye: And so because this is an operating system and there are other free and open operating systems like Linux, but when you talk about Coque, because you have this whole infrastructure of the reflected networks, and there's a certain amount of hardware infrastructure that would be required to have the next iteration of the operating system, which is with this metaverse collaboration built in from the ground that you need hardware infrastructure. And because of that, it doesn't seem like you're just able to give this away for free, but that it's actually a business to start. Maybe you could elaborate on your plan for how to get this out into the world for people to use it, but also at the same time, if it's reaching a certain scale that you're able to actually fund the development of this work.

[00:52:25.015] David Smith: Yeah, our business, as you can probably tell, is selling events, selling messages. What happens is anytime you touch the reflector and that gets redistributed, that's really how we would make money. And that's a reasonable thing because we're building out a reflector network. Worldwide, we're already on most continents in the world. We're targeting every large city to start. But over time, especially with 5G deployments, we're going to be hopefully just about anywhere. Our goal is to have a sub-15 millisecond latency anywhere you go, which, you know, that's expensive to build out. It's expensive to manage. But at the same time, It's incredibly inexpensive for the developers because it's all built there. First of all, your cost of creating a multi-user application drops seriously because you don't have to figure out how do you do that crazy network and multi-user stuff. That's a pain in the ass. But the other side is, you know, we've done a very, very good job of optimization. So the cost of using the system is very, very low and it's everywhere. So it's just the quality of service is extremely good. So from that perspective, we sort of feel like, yeah, that's a reasonable thing to charge people for. We're giving our Microverse platform away for free for people just use all the core systems, much of it's open sourced already. So people are going to be able to take that and run with it and start building out applications, multi-user applications immediately. without a lot of overhead and cost. And once you go above a certain threshold, you'd be paying us. But theoretically, you'd have a pretty successful business built on top of that infrastructure. So it's likely you could afford it. And it's not that expensive. Anyway, it's a pretty cheap, it's an inexpensive capability that provides a huge amount of power for you. So yeah, I'm pretty proud of that. And I think also, like I said, no matter what, you pay for hosting, you pay for bandwidth using Google Cloud, you're paying that same sort of thing. So we're probably a cheaper and much better kind of solution for that sort of thing. It's very similar to Twilio, if you're familiar with that.

[00:54:31.938] Kent Bye: I haven't seen that, but as you're saying that, I'm just thinking about as you open up any instance and say Rec Room or VR chat, you know, on the backend, there's different cloud virtual servers that are getting fired up that have to kind of manage all that. But instead of paying for those cloud services, it sounds like this would be a service to be able to have other lightweight things, but still, if there's peer to peer protocols, like say WebRTC, as they're talking to each other, rather than that going over.

[00:55:01.182] David Smith: It's much more, it isn't a actual peer-to-peer or maybe a virtual peer-to-peer, because we have to go through the reflectors and we need that timestamp to make this thing run. But aside from that, as I said, the reflector, the server side maintains no state, aside from the snapshots. And so, you know, there really isn't a big load on the server at all. And you don't want that. You want all the compute should be local. I know those people are going back and forth. They think, well, we can do all the rendering on a big data center at the edge. But the problem is not only do you have to distribute all that end result rendered content, but when you are wearing a head mount and you're moving your head and you need to update that 120 Hertz, that ain't gonna work. You're not going to be able to keep that thing fed. You have to assume that the local computation is going to be managing that and updating that 120 hertz image. And it's a slight difference in velocity perhaps, but I think it's really important to note that the hardware is certainly getting to the point where it needs to be to achieve that. And I point to Apple's work on the M1 chip as the perfect example. I've been using a Mac for years that Apple did not make that chip for the Mac. That wasn't what they wanted. This is an incredibly low power, incredibly powerful device. What do you think that's for? It's obviously for a wearable device and you're going to have an extraordinary experience because you're going to have the performance local where it should be. And of course, Intel is now talking about this kind of moonshot of doing 1000X improvement in performance, which makes sense. One of their chief architects of those systems was talking about, well, a 10X improvement today if you did a 10X improvement in their CPUs and GPUs today is only about a four or five year time window. It's not big enough. They have to rethink the problem all the way to say a thousand gives us the real answer. And that means a completely different approach to hardware architectures. And again, why are they doing that? Why do they have to do that? It's because that's where the action is gonna be. The action is gonna be instantaneous, always updated. The user-centric, truly user-centric systems that are running graphics, AI, personal AIs, by the way, it's not a meta AI that's watching you, it's a personal AI that's watching meta. So yeah, we have to have that kind of hardware and we're going to have it. It's clear. Like I said, Apple's drew a line in the sand and everybody else has to step up and they will.

[00:57:47.007] Kent Bye: And because you're, you're kind of taking a new architectural approach. I'm wondering if you could explain a little bit about this snapshot, because I know there's existing open standards for scene graphs. So there's the USD open standard, there's GLTF that could be expanded into kind of containing all the different objects. It's essentially like what the Unity scene or Unreal scene is that would contain both the object model and also where things are located and maybe some level of the logic that's engaged there. So is there a format that you have? And what's included in that snapshot? Because sometimes that could be fairly big about all of that, of what that includes in terms of the download model.

[00:58:24.666] David Smith: That's a really deep question. And the answer is this. You have to separate the simulation from the content. The worst thing you do is what we did the very first version of croquet, where we did exactly what you need to do. It's like all combined into one big hairball. And so when you save something, you have to save the transitional stuff, the simulation state, and you have to save the 3D models that are static. That's dumb. Why do that? Separate the two. That's why I was talking about the model and the view. The model is that simulation state. And all that is is whatever the state to drive a simulation. So it's physics engine, for example. They'll have a list of all the objects and kind of where they are and maybe what their shapes are. But it's going to be a very, very small amount of information. And it's going to be associated with the actual 3D objects, the cubes, textures, and all that sort of stuff. But you certainly don't need to put all that hard stuff, that static information, like the world you're in, the boxes you're moving around. You don't need to put those in a snapshot. You can grab those from the original website because they're there. And then all the transforms they go through are part of the snapshot. So I moved this from here to there. Well, that's in the snapshot. But the actual object shouldn't be there. By separating them, you get a huge amount of flexibility. And the snapshots are really quite tiny. By the way, we use JSON for the snapshots when we basically traverse that model and save it out that way. And we also encrypt it, so it also can't be read unless you have the credentials to load the world. So yeah, I think that's a really key distinction. And of course a lot of these, you know, when people designed game engines in the past, They didn't think about this sort of thing, because they didn't have to. It's like when you get a game, it's delivered as an app on your phone or your PC. All the stuff is there. The multi-user part of it is usually pretty restricted. I think of it as a vocabulary of multi-user. In the case of shooters, and like I said, I did the very first one, the vocabulary is move, shoot, die, or kill. I mean, that's basically all you do. But for what you really want, when you talk about what the true metaverse is going to be, the vocabulary has to be extraordinarily rich. It has to be the same kind of vocabulary that you have when you're engaging with your personal computer. I'm picking this up, moving this here, dragging this over there, modifying this 3D model, changing it. The vocabulary has to be very rich. And it has to be shared. So everything I do, you have to see me do it as I do it, and vice versa. So part of what we were after was to have a vocabulary that is rich and dynamic as humans' imaginations.

[01:01:19.078] Kent Bye: Yeah, it makes sense that you would need to make it really slim that when I think of the scene graph, I think of open standards like USD, but it sounds like you've created like a snapshot that was equivalent to some of the aspects of the scene graph, but you could have potentially files stored out on the IPF. So interplanetary file system or exactly. Or on a website, but it may be the, so is the JSON file kind of the equivalent of that USD? Is that going out and looking for all those objects are in constructing the scene or how is that?

[01:01:50.851] David Smith: Yeah, in a sense. So what happens was it depends is I guess the answer, but what happens is when you pull that snapshot in. it will include references to all the objects in the scene and where to get them. They may have been preloaded onto our servers in a sense. Although the way it works, by the way, you create a world, you can host it on your server. It's just a webpage. It just happens to have this extra library which boots up the virtual machine, but it's going to be on your xyz.com web page when they load it. And then when the thing launches, it finds the reflector and makes a connection there. And as it's run for a while, then it'll save the snapshot up to it. But yeah, when you join, you actually go to the same website. So you might load that same data, but then the snapshot it's gonna say, hey, these are all the changes that have occurred to it since you got it there. And some things might've been deleted, some things might've been moved around. And so all that's in that snapshot, but the actual data, the 3D models, the GLTFs and that sort of thing are not in there, they shouldn't be. But anything that you did to it would be the description of how it got modified. would certainly be in there since you moved it, you scaled it, whatever you do is part of the model side. And the view side is just what you see and those models exist there. So that means it loads really, really fast. It means that you control the data, how you want to present it, how you want to display it. And it makes synchronization between users dynamic very, very easy.

[01:03:32.840] Kent Bye: Yeah, I've had a number of different conversations and aha moments over the years of talking to different people that are making the same argument that you are in the sense of that we're moving away from the native app model and more towards this kind of web model where Matt Meissnicks has talked about how in augmented reality the world is the operating system and so it's more contextual and you know I've talked to Brandon Jones and talking about how as you have more of a web model you could have contextual information based upon GPS that has different stuff and I've talked to Pluto VR that talks about how When you're in a virtual space, then the applications, rather than the separate windows, which within themselves, think about them as these virtual machines that are not interacting with each other. But as you get into virtual environments, then you end up having these shared physics and shared lighting and everything. And so rather than windows that are stacked on top of each other, they're more objects that are interacting with each other in a world. And so a multi-app environment. fusing all these things together. So I think a lot of the stuff that you're talking about here is again, kind of reinforcing this idea that we're going to be moving away from this native app model that have everything self-contained and more dynamically pulling in and fusing things together. And like you talk about portals, but you could also have entire applications that are called in that become objects engaging with other things. And so being able to interact with different things within a virtual space, but the virtual space is the collective space that starts to fuse together information from all sorts of different places. Yeah. I don't know exactly what that's going to look like, but I can imagine like say a Spotify app rather than having a window, it's like an actual radio that is in world that has speakers that you can carry around. So it's kind of like a virtual boom box that you have rather than a separate application that is in a whole other world, but it's all fused together. So it seems like a lot of the stuff that you're building with croquet is moving into this world. That's trying to create a context where you can pull in all these different things.

[01:05:25.160] David Smith: Yeah, actually, that's a really good analogy. One of the demos I showed you before was like the flight tracker. So you're inside this 3D environment and you see this globe and on the globe is almost every commercial flight and it's updated every 15 seconds. So you actually look at this thing, you see where every single airplane is it's really quite stunning, actually, because you can spin the globe around and zoom in, go east of Europe, and you'll see between Moscow and Poland, there are no airplanes. because that's where you train us. And so being able to feed live data into these worlds as part of this conversation, you described Spotify in the same way is exactly right. That's the way you have to think about this is like, and with that globe, by the way, one of the things you can do with Croquet, you can save it as a, what we're calling a verse file right now, like microverse. And then you can take it and drag it and drop it into another world. So I can actually share that globe with other people. It's kind of, if you remember VRML, Tony Parisi's 3D, it was more than 3D, it was actually, it had behaviors as well. We're looking at that as how do we make that work? And by the way, it's also a JSON file. So you can open it up and edit that object that you just created. The behaviors are in there, which is the code. And so, yeah, I think that really, the idea of applications changes. It is a kind of app, but it's embedded inside this virtual world, and they have the ability to communicate with each other. They basically have a publish subscribe so that this object can send a message to that object. One of our developers, Yoshiki Oshima, took another version of the globe and basically had it tied to a car that was driving around, and he used the globe as a steering wheel. to drive the car. And all it was doing is sending messages between the two. So that's exactly right. And I think it's the magic thing here is there's no limit. There's no saying you can't stop, you can't do this. It's really that where does your imagination end? And the other thing is there's nobody telling you either that you can't do it, Or for that matter, if you do it, you have to pay us. I mean, you have to pay us a little bit if you want this thing to work, but very, very different from paying 45% of your revenue to somebody. And for that matter, very different from somebody saying you can't even do it at all. That model's got to go away. And Yeah, I think the browser is clearly the path that is going to make this all work. And I actually believe in, it won't be the next two, three years, but I bet you in about four or five for sure. that that browser-like capability is going to be the OS. That is the right way to do this. That is the right way to engage. I thought the Chromebook as an exemplar was wonderful. It showed that, yeah, the browser itself is a kind of OS. And all we're doing, people run virtual machines all the time, right? Docker running on a virtual machine. In our case, we're running inside the browser because we're taking advantage of all that wonderful infrastructure that the browser provides, JavaScript, WebGL, WebGPU coming, WebSockets, WebAssembly. It's such a rich infrastructure for us that's already done have been proven for billions and billions of pages. So our task is to make it all work together to enable you to have a powerful and productive shared experience.

[01:09:12.382] Kent Bye: Right. Well, as we start to wrap up, I'd love to hear a little bit about the launch of croquet that's happening on May 17th and what your aspirations are in terms of what you hope to do with going into the next phase of announcing this to the world and inviting people in. And if you expect a lot of enterprise users or, you know, what, what's the next steps for you as you start to get this project that you've been working on for a number of years and a whole lifetime evolution of your work, that's kind of leading up to this point, but what's next and where do you hope it all goes?

[01:09:41.474] David Smith: Well, there's three buckets of users that we've identified. The first is, as you mentioned, enterprise industrial. Because we actually have this nice REST interface, RESTful interface, and ability to pull real-time data, being able to do a digital twin of a factory floor, for example, or some sort of large energy facility or something like that, is really, really easy to do inside this platform. So we think enterprise and industrial is going to be a pretty big business in the short term. The second area of interest has been on the 5G network side. I think Metaverse and 5G were made for each other. They exist to support each other. I don't think everybody understands it that way, but I sure do, because 5G provides some really, really key elements. Obviously, bandwidth is something But more important is the latency that you're going to get. Being able to provide 5 to 10 milliseconds latency for where I'm having a conversation with you and that simulation, its responsiveness is directly tied to the latency that I'm getting. Being able to share that is going to be crucial. The other side, of course, is When we're working with the carriers, we're giving them another revenue stream when we host our reflectors on their 5G infrastructure. And the last one is basically real people. There's 16.5 million JavaScript developers out there, and they can all take this technology, this system that we've built, and build out a whole new class of applications very quickly and very easily. The thing I have found is when you build a really rich enough and powerful enough tool, you'd be foolish to try to predict what people are going to do with it. They always exceed your expectations because the level of creativity and capability out there is absolutely astonishing. So if you give them enough of an infrastructure, take away the crappy problems like, how do I do multi-user? That has to go away. You need to just focus on What is it that I can do for the user or the users of the system? What is the thing that is going to improve their lives or make them happier or make them have fun or have them solve this class of hard problems? That's the place that developers need to be spending their time, not trying to figure out how to make these stuff multi-user. So having an infrastructure that takes care of all of that, takes care of the synchronization, takes care of where you own it, you control what's in there, you define what's in there. Nobody's telling you no, nobody's telling you you have to do this or that. That's essential. So I think that what we've got is, at least Tony Parisi actually said something very nice. He said, this is the first true MVP of the metaverse. And I think that's true. we're addressing the way the metaverse should be. And I think we've been doing this long enough to know this is pretty close. But I also have to mention that we're humble enough to know that this is a starting point. This is nowhere near the end. This is enough for the developer community to be able to start building interesting things, and for that matter, building on the Croquet platform itself and extending that, so that other people can take advantage of the value that's created. The web is the largest open source project in the history of mankind. Every single web page, you can look at the sources in some form, unless you've obfuscated too much. And we see the same thing here. When we create these behaviors that we apply to the objects, we expect most of those to be shared. So you create something that has this new effect. I can move this object this way. I can create this new kind of object. Well, you can share that behavior. And so another person can take that and apply it to their problem. They can open it up even and say, oh, you know, this is almost what I want. I can modify this. So one of the things we're going to be doing is creating databases of open free behaviors. And of course, we're building our own right now, but we expect third parties are going to be building many, many of those and they're going to be shared, which is exactly what should happen. You know, we're so early in this whole process. that, you know, we're going to be educating each other, but I think we have enough of a critical mass to be able to very, very interesting things now. And the goal was, Alan said, it has to be magic. It's pretty magic. And I'm very proud of that.

[01:14:20.731] Kent Bye: And, uh, and finally, what do you think the ultimate potential of virtual and augmented reality and the metaverse might be and what it might be able to enable?

[01:14:31.595] David Smith: Yeah, so I'm going back to this augmented conversation concept and I can't emphasize enough how important that is. You are defined more by how you communicate to anything. You know, when you learn mathematics, mathematics is a language and you're thinking in terms of that language, that symbolic manipulation of ideas. You think in terms of English when you think about ideas. And some people think very abstract and think about computer languages and how they work. But we're always thinking in terms of sort of a conversational idiom that allows us to explain things sometimes to ourselves. So in a sense, we're defined more by how we communicate than anything. What that means then is if indeed what we're doing is expanding the scope of how humans communicate, redefining how they communicate, you're redefining what it means to be human. I do think that in some ways there's a bit of a race between the AIs getting really, really smart and humans getting even smarter. And I think it's essential that humans develop a next generation way of thinking and communicating and reinventing ourselves. And this is, I think, the great hope to me, is that this augmented conversation concept that originated with Doug Engelbart, I'm not taking any credit for this, but he's the one who said, this is the way the world should work. I think we're on the threshold of achieving that, and the metaverse is going to be the vehicle which brings it about. So I think 10 years from now, I'm not sure how we'll be engaging with each other, but I know we'll be engaging very, very deeply. We're going to be sharing ideas that we can't even imagine today as easily then as we talk about the weather. And that's going to be marvelous.

[01:16:23.873] Kent Bye: Great. Is there anything else that's left unsaid that you'd like to say to the broader Immersive community?

[01:16:29.377] David Smith: We'd love to work with you. We'd love your help. As I said, we're putting a thing out there. We think we're pretty excited about what it can do. I think you'll be surprised at what it can do, but we're also very, very deeply aware that it's not defined by us. It's defined by the developers who take it and make it theirs. It's defined the users. who provide the ideas, that virtuous cycle. So all we're looking for is give this a try. I think you'll be impressed and maybe even amazed, but more importantly, we'd love to collaborate with you as we build out this new world.

[01:17:08.044] Kent Bye: Awesome. Well, David, thanks so much for joining me here on the podcast. I'm really excited to see where this goes in terms of the, an operating system that's in a virtual machine of the browser and some of the real time applications are going to be able to be developed here. You know, I think there's a lot of my own biases towards the open web, and this is a proprietary product that hopefully people will be able to build on top of. And maybe at some point there's some protocols, or we talk about the infrastructure of a lot of the stuff in terms of 5g, these are private companies. And so there's kind of always a blend of public and private, but I'd love to eventually, you know, let's say 50 years from now, have all of this type of infrastructure fleshed out in terms of the protocols and, and maybe even actually.

[01:17:51.190] David Smith: That's right. I actually totally agree. We're making as much open source as we possibly can, and I think you'll be surprised at how much we are open sourcing. We have to make sure that we protect the kernel of the system because we want to make sure we can continue investing in this platform to make it as good as we can. But our goal is to unleash this and get other people participating at almost every level. Not just, I don't want to draw a line around and say, this is where you're going to work. It has to be much broader than that. So making a lot of the core systems that we're building open source is essential to enable that kind of engagement. So yeah, I couldn't agree with you more. I think that These ideas have to be out there and the implementations have to be out there. People need to get their hands dirty exploring what's possible and in a lot of ways, improving what's possible. I mean, we took a really good shot at it. We've done this a little while, but we know for sure there's people out there who are much smarter than us who can take this to another level and we want to work with them.

[01:18:58.788] Kent Bye: Yeah, and as I look at the evolution of a lot of what's happening in the virtual reality community, there's a lot of insights that are coming from the game design world, from the Unity and Unreal Engine and the whole 3D pipeline. There's a whole aspect of immersive storytelling that's coming from more of the cinematic storytelling. There's a whole architecture, theater and dance that's like in world design and world creation and building. And I think that's the web community that is probably the most under leveraged at this point in terms of pulling in more and more web developers into this ecosystem. And I think as long as Croquet is able to provide a baseline of an experience that's significantly better than, say, what we have with like 2D applications, or as there's more remote work, folks that want to have these virtual spaces or collaborative environments that if you're able to, with this low latency collaborative network that's built from scratch, that people are able to kind of build on top of and with that open source spirit, start to have rapid series of innovation, then having compelling experiences that go above and beyond what they're able to experience somewhere else, that goes beyond the limitations of Unity, Unreal Engine, that frankly wasn't built for the web, to have more of a web native platform to start to leverage the power of the web. And I feel like that's going to be a key part as we continue to evolve the metaverse, to pull in these communities that you're talking about here, especially the JavaScript communities. And like you said, the millions of web developers and to start to have them dipping their toes into creating a baseline platform that helps them bootstrap some of these initial experiences and then use the power of the web to see where it goes from there.

[01:20:29.072] David Smith: Yeah, I mean, the web has been the center of the creative universe for the last 20 years. And all we're doing is amplifying that. The web's not going to go away. It's going to, if anything, blossom into something very, very new, very different, very wonderful. I was around before there was a web, right? And it's been a remarkable journey to see how this modest little platform has evolved to becoming just so extraordinarily capable as it is today. So yeah, I couldn't agree more. I think there's perhaps a new renaissance that we're on the threshold of. And I couldn't be more excited to at least contribute a little bit to what's about to happen But a lot of ways, all we're doing, as I said, is taking the ideas that other people came before us, Doug Engelbart, Alan Kay, Dave Reed, and packaging them in a way that's accessible. And I think that these guys started with a blank piece of paper and have been at the world we live in. And this is a pale reflection of what they had in mind. You know, we can do way better than we have to.

[01:21:37.095] Kent Bye: Awesome. Well, thanks again, David. It was a great tour through not only your journey, but also a little bit of the history of, of all this human computer interaction since the sixties and the mother of all demos, you know, it's been a great recap.

[01:21:50.385] David Smith: Yeah. Well, thank you so much. It was, uh, it's always fun to talk to you and look forward to the next time.

[01:21:55.489] Kent Bye: So that was David Smith. He's the founder of croquet, which was a new operating system that was just launched on Tuesday, May 17th, 2022. So I've remembered front takeaways about this interview is that first of all, well, there's quite a story of digging into the history and evolution of. Computing going all the way back to October of 1962 with Douglas Engelbart's report that he did called Augmenting human intellect a conceptual framework that was prepared for the director of information sciences the Air Force Office of scientific research So these ideas of how computing is going to be able to improve different aspects of the following of like more rapid comprehension better comprehensive the possibility of gaining a useful degree of comprehension in a situation that previously was too complex, speedier solutions, better solutions, and the possibility of finding solutions to problems that before seemed insoluble. So the process of creating these different simulations and being able to have a real-time collaborative environment. And eventually then the mother of all demos, which was taking place on December 9th, 1968 at the Computer Society's Fall Joint Computer Conference in San Francisco. So it's a really famous demo. You can watch it on YouTube. It demos everything from Windows, Hypertext, Graphics, Efficient Navigation, and Command Input, Video Conferencing, Computer Mouse, Word Processing, Dynamic File Linking, Revision Control, and Collaborative Realtime Editor, all those things were kind of a summary from a Wikipedia page, but you can actually go and watch the video. And it's really quite remarkable that that was from December of 1968, and it was basically a roadmap for the next 50 to 70 years of computing. It's also taking the master's thesis from David Reed from June 1976 of processor multiplexing in a layered operating system. They're piecing together a lot of these ideas and coming back to them. Seeing that the operating system is static, it's single-use, it's on these windowed frames that are not really integrating with each other, but if you have a single environment that's a virtual space that has all these different objects that are interacting with each other, on top of having people at other computers also being synchronized to that. I think that's the synchrony between those different worlds. That is the new thing that is probably distinctly the fundamental difference within the future of the metaverse, this collaborative shared space that has the same shared state. architecturally what they're doing is trying to do a model view separation where you have the simulation and then you're able to run it forward towards a certain time and then when you open up a new instance it's pinging a reflector network and setting like the start time and then from there anybody that joins in is is either going to come in from that original page, or it's going to be taking a snapshot every five seconds. You download the snapshot, all the different messages, and hopefully you're up to speed very quickly into what everyone else is seeing in that world. There's certainly other worlds that have been able to kind of achieve that to different levels, you know, within the context of, say, Rec Room. VRChat, like I was talking about, has sometimes the late-joiner problem. I think there's probably new architectural changes that has been able to shift and change that. A lot of these existing multiplayer social worlds are, again, hosted everything within the context of a server on the back end. And this is trying to have a reflector network that's a stateless server, so it's just messages that are going back and forth. And a lot of this stuff is you just getting the initial simulation state and then, or a snapshot, and then having the messages run up and then getting you up to speed. So it's a little bit more of a lightweight architecture rather than having to kind of host everything, figuring out what the multiplayer networking is going to be. And I think it's trying to simplify that in a way of offloading at least what's happening in the environment onto this reflector network architecture of croquet. There's a whole voice and video chat that's a separate issue that is still going to need either a peer-to-peer or other protocols to be able to communicate with folks. And they said they're going to be adding the audio that hasn't integrated just yet, so no specific more details. But I assume there's other more peer-to-peer protocols like WebRTC that have been used in, say, like Mozilla Hubs and whatnot. It's a system that seems like it's got a little bit more lightweight approach, but it's also trying to create, at the foundational layer of the operating system, a hardware infrastructure that's able to support specific things that would be difficult for people to do on their own. We already have been seeing all sorts of different multiplayer types of applications. within Unity or Unreal Engine and these native apps. And so trying to take experiences that you're able to have in those worlds and make it easier for web developers to start to come in and pull in all sorts of other aspects from the open web. A lot of those platforms, like Unity or Unreal, are not web native. And so there's all sorts of things that are still not necessarily pulling in lots of data visualization or existing open source libraries or, in a lot of ways, they're just trying to create this ecosystem of these shared database of these objects and behaviors and to be able to have them shared back and forth. A card system that we talked about in terms of an object-oriented programming, trying to come up with the most simplest one thing that you build and nest on top of each other. They use the metaphor of the cards. And so with those card systems, being able to share them back and forth. I think the other thing that I think is worth mentioning is the real-time editing of an environment. So someone making a change, and then being able to deploy that, and then go out to everybody that's on that server and seeing the same thing. So there's a little bit more of an environment that allows more pliability for people to generate it as it goes forward. And if there is a bug then being able to capture a snapshot and run it back up to that point and be able to reproduce that bug ideally So like Tony Prici said he sees that it's kind of like the first MVP of the metaverse And I think that's probably on the right track It's a company that is going out there and putting out something for the web community and so we'll see if there's adoption and if folks start to build stuff on top of it and do things that would otherwise be really complicated or impossible to do up to this point so It's probably like the Mozilla Hubs as an example of one of the first web-based social platforms that are able to do a lot of the things that is more on par with what you see within the proprietary world. But again, you have to stand up your own web servers on the back end. And so if this is just a service that you pay to facilitate a lot of that collaborative environments, then I'm excited to see what kind of new applications that have been done. Up to this point, it may have been too difficult to handle all that server architecture to do that multiplayer dimensions. That's all I have for today and I just wanted to thank you for listening to the Voices of VR podcast and if you enjoy the podcast then please do spread the word, tell your friends, and consider becoming a member of the Patreon. This is a list of supported podcasts and so I do rely upon donations from people like yourself in order to continue bringing this coverage. So you can become a member and donate today at patreon.com slash voices of VR. Thanks for listening.

More from this show