Spatial XR creative collaboration platform has been featured in that past two Microsoft presentations at the Microsoft Build Keynote on Monday, May 6th as well as in the HoloLens 2 announcement at Mobile World Congress on Sunday, February 24, 2019. Spatial is using the spatial computing affordances of the HoloLens 2 in order to facilitate knowledge workers to collaborate on the design of spatial objects. They’re working with Mattel to streamline the design process of toys like Barbie or Hot Wheels using their Spatial XR software.
I had a chance to catch up with Spatial co-founder and CEO Anand Agarawala to talk about his vision of what spatial computing is going to make possible, how they’re trying to facilitate flow states of collective intelligence through a more intuitive and human-centered interfaces with technology, and some of the deeper ethical considerations that should be taken into account when designing an immersive computing platform.
LISTEN TO THIS EPISODE OF THE VOICES OF VR PODCAST
Here’s the Spatial demo at the HoloLens 2 announcement
Here’s the Spatial demo during the Microsoft Build 2019 keynote
This is a listener-supported podcast through the Voices of VR Patreon.
[00:00:05.452] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR Podcast. So virtual augmented reality represents this major revolutionary shift into spatial computing. And anybody who's listening to this podcast can get some sense of what this future holds, especially if you've had some sort of embodied experience with it. And there's a lot of people who look at spatial computing and they may not get it yet, especially if they've not had these embodied experiences of what is afforded once you are embedded within a spatial computing environment. Especially when you're able to tap into these deep flow states, you're tapping into deeper aspects of your intuition. And there's just something that's qualitatively different than interfacing with computing technologies when you're trying to think in a much more linear way. Spatial computing is all about trying to tap into our more natural, intuitive ways of embodied cognition and to reduce the friction between what we're thinking and what we're imagining and how we're expressing ourselves within the computing technologies that we have. We're starting to see very early indications for what spatial computing means. And it's starting with gaming just because I think there's so many different use cases where people want to be completely immersed within entertainment. But I feel like there's something about productivity and the way that we collaborate and communicate and work together that is going to be completely shifted with all these immersive technologies. So I think that the enterprise is at the frontier of trying to figure out what is the future of work, what is the future of collaboration. And this is where Spatial, as a augmented reality startup that's based out of New York City, for the last two HoloLens keynotes, one that was in February 24th in Barcelona, where the HoloLens 2 was announced, as well as the one at Microsoft Build this past week, where they were able to, in each case, give a brief demo for what they see the future of collaboration with Spatial Computing is going to be like. And so there weren't very many other HoloLens demos that were at Microsoft Build this year. I mean, there was the HoloLens 2 booth where you could get kind of a stock demo. And then there was Spatial. They were the only other company on the floor that had a HoloLens 2 demo. So I had a chance to watch the demo during the keynote, and then see a brief little interaction with the demo, and then have like a more full demo to test out some of the different software. And then had a chance to sit down with the CEO, Anand Agarwala, talking about his vision for what he thinks the future of spatial computing is going to afford and how they're already starting to build out some of these collaboration tools that are really trying to tap into these deeper, more intuitive, creative aspects of collaborating. And in the demo, they're showing something that would be working in the context of something like Mattel, where they do these spatial designs of toys like Barbie or Hot Wheels. And when people are building these spatial objects, then they need to have like very specific tools to be able to collaborate and to brainstorm and to do this type of creative knowledge work. So we're covering all that and more on today's episode of the Voices of VR podcast. So this interview with Anand happened on Monday, May 6th, 2019 at the Microsoft Build Conference in Seattle, Washington. So with that, let's go ahead and dive right in.
[00:03:17.043] Anand Agarawala: My name is Anna Nagarwala, I'm co-founder, CEO of Spatial. We're an augmented reality holographic collaboration platform. We're trying to, I mean, broadly speaking, we're trying to reinvent how people communicate, collaborate with computers. We think phones and laptops are going to kind of go away and that augmented reality headsets or something like it is going to kind of replace it. It'll be lightweight eyewear when you get there, but we want to reinvent the computer as a kind of more fluid, creative tool, collaborative creative tool, and we think that there's really high potential with augmented reality to do that because it's the first collective display, really, where we all get to be part of a reality together, but that's linked to our actual reality very cleanly.
[00:03:59.753] Kent Bye: So why don't you give me a bit more context as to your background and how you got started into this whole realm of augmented reality?
[00:04:05.849] Anand Agarawala: Sure, yeah. So, grew up kind of a computer graphics, computer gaming kid, you know. I did an undergrad in human-computer interaction, art minor kind of thing. Did my grad school in human-computer interaction. My thesis was actually a thing called Bumptop, which was a 3D desktop user interface. Showed it at TED back in 07. And the idea was it was using a physics engine, a gaming-grade physics engine, to power the desktop metaphor. Just all the physicality and tactileness of the real desktop is actually lost in the so-called desktop metaphor. and how can we make that a reality. This is 07, so pre-iPhone, and when people are thinking about multi-touch displays and things, and so the idea was how can we kind of blow that out and kind of do that. So I was doing 3D interfaces on a 2D display 10 years ago, and then when I tried the HoloLens for the first time a couple years ago, maybe three years ago now, it's like, holy shit, this thing is the future. It's like, this is when VR, Oculus, there was a lot of hype around that, and you know, it was exciting. I've always loved 3D user interfaces, you know, since the 3D user interface background, but I thought AI was much further in the future than it was. When I saw the HoloLens, it was almost like a peephole into the future, literally, you know, with a field of view, I guess. But you could see the path, right? Tech's really good at making stuff lighter, bigger, you know, faster and stuff. And so it was easy to see the path of the way to it. So I was like, hey, we got to start this company. Because I'd seen, you know, also the stuff out there was not really that compelling. I thought that, hey, I think me and the folks, some of the friends I knew, like Jenna, my co-founder, could really do a pretty cool job here, and there's a ton of opportunity. I mean, I'll say that one of the things that really draws me to spatial computing is just the opportunity to redefine computing in more humanistic terms. I think, like, technology's kind of turned us into scrolling zombies. You know, if you look around, even here, like, people are just scrolling their lives away on their iPhone. various apps that they'll just kind of scroll away. And I'm guilty too, man. I got to set my screen time limits. Just you get into lizard brain and you're just like, you know, give me the reward, double click to like. So for me, it's like, hey, I noticed this happening and I also noticed that like our devices are turning us more into consuming machines rather than creative machines. And I want computers to be like a paintbrush or like even a pencil, like something that just lets you scribble out ideas and thoughts and You know, I think computing is just, we're so used to it now, but it's just the way we think in it, our minds are fluid. The way we think, we think tangentially, we think collaboratively, we think in this kind of almost jazz style that isn't really... Computers are so linear. Computers want, you know, now with AI, things are changing. Computers are so linear in how they want you to interact with them and force you into that, that it I think really breaks kind of where our creativity comes from. And so I'm really excited about how we can kind of open things up with the advent of this new display technology.
[00:06:52.598] Kent Bye: Yeah, so it sounds like you have quite a strong background in human-computer interactions, which, you know, I think of the mother of all demos back in 1968, Douglas Engelbart demoing the mouse and, you know, this corded keyboard and teleconferencing, you know, so much in that demo back in the late 60s that really set the course of computing for the next 50 years. And now here we are, like, 51 years later, and now we have kind of the next iteration with all the spatial computing. And so it feels like now we're entering into a new phase of being able to have a lot more natural and intuitive interactions. I mean, there is a certain amount of moving on a natural way to move a mouse, but there's a certain amount of translation that happens by moving around in a 2D plane, and it's being projected into a 2D plane. which was completely fine, but once you start to try to translate a 2D plane into a 3D spatial reality, I mean, people spend a long time figuring out how to do that. And it feels like with spatial computing, anybody that's actually working in the 3D space, there's something about having a much more natural and intuitive interface, looking at 3D objects to look at the spatial nature of them. And so maybe you could talk a bit about your early collaborators and partners of where you see a system like this, where you're trying to really create a much more intuitive interface for spatial computing, where you're seeing some of the early traction in terms of where this makes sense for some of these enterprise companies.
[00:08:16.267] Anand Agarawala: Yeah, sure. I mean, so you said a lot of interesting things there. I mean, I think one thing that clicked in for me when you were talking is that, like, I mean, first of all, Engelbart is the god. I mean, that demo, you know, the mother of all demos is just, like, the blueprint, right, that we've been kind of executing towards. But with the mouse and things, there's always a level of indirection, and that was one of the impetuses behind Bumptop is, if I throw a ball at you in the air, lob a ball at you, you're gonna catch it real quick and easy, and you've just done 3D parabolic math to catch that, and a kid can do it, right? and our brains are really good at calculating that, but now try to get that same kid to click a mouse to navigate to a target on a 2D screen, it's gonna be a lot more tricky, right? And a lot less universal, so I think this new display paradigm, and that's what excites me so much about the HoloLens 2 is because we've seen all these awesome Leap Motion 2 demos, I love that Leap Motion's awesome, but Man, now it's integrated into the headset, and so now it's an all-in-one form factor, no tethers, eye tracking's built in too, so now you have intent of where the person's looking, so you can figure out what they actually want to do, which makes input even more precise. Like, it is such an awesome package for developers to work with. HoloLens 1 was, I think, accelerated the future. I think it accelerated human progress by a couple of years. I think HoloLens 2 is now such a sweet device that it breaks that loop of indirection, which really is problematic for people, and also opens up possibilities of, basically, we can craft a reality. I mean, as we were doing in our demo, hold up two fingers, say what you want, and it visualizes before your eyes with the power of the internet, right? Where are we seeing early traction, to answer your question? We don't want to be a company that's going after one niche. There's people who really nail it in architecture, construction kind of stuff, or even just 3D modeling. And we don't want to be a company that's just letting you look at 3D models from multiple headsets, right? I think that that's been done. I mean, we really want an application or something that can be used to creatively express yourself for hundreds of millions of users. even billions, right? And I think part of the interest we've seen, we've had 140 different Fortune 1000 reach out to us. The thing that I really am excited about, it's across every vertical. Consumer product goods, Mattel's our first customer, they're designing 3D toys. But it's not even just 3D content. Financial data, I can't talk about all the customers yet, because we haven't launched yet, but soon. But people in the financial industry want to analyze 3D data sets across multiple things. Petroleum, oil, and gas, they want to figure out where to drill and analyze this 3D data. healthcare. They convene expert panels of doctors, just like in the Kingsman or the Jedi Council, right? Where they've got 12 fancy doctors around the world who getting their schedules coordinated to have that meeting is quite difficult. And so now they can do that with a push of a button. Some people join from their phone, some people join from a headset, but they get that level of immersion. Oh, I mean, NASA in defense, when you're planning missions that are months and months and years long, you need visual repositories where you can kind of put this stuff up. And sometimes people are accessing it from the field, you know, in the defense case where you're in a tent and you don't really have the luxury of that war room you had back at home. So getting all that content up visually, virtually is quite interesting. So I just gave you a bunch of examples. And the thing that's really exciting is it's everywhere. The good news is everyone needs to collaborate and they're not really well served by existing tools, like I was saying. Video conferencing tools are great, but they break down when things get increasingly distributed. When I have more than three or four sites on a Zoom, it's like, who's talking, who's, you know, like, people are not talking because they're not the loud one or whatever, you know. Spatial, your location's kind of obsolete. Ten people, ten different locations. We do it all the time. We've done it on a snow day, for example. And they're all sitting around the same virtual table, which is really cool. And then, also, the future work, we think, is really going to be more visual and more about knowledge work. AI is automating a lot of the more basic stuff. And so the way I think about it is tab browsing on your computer gives you more conceptual real estate to think through a problem, to go on tangents, let's check this thing out, let's check that thing. The web is almost built on this hub and spoke model where you do a search for something and you break out onto many side pages and see where those threads go, right? Wikipedia. We've opened on that Wikipedia hole where now I'm on different threads. Linear web browsing doesn't really respect that, whereas what we're trying to create with Spatial is really something that broadly lets you visualize your thoughts and your thought process as fluid and natural as it might be, and with the capabilities we have with 3D interfaces.
[00:12:26.355] Kent Bye: Yeah, and one of the other striking things about going through this demo is just the overlaying of a virtual avatar over you as an individual. Now, there's a different embodiment, a different avatar, so it's a little bit of paying attention to what to look at, but also just noticing in doing the HoloLens 2 demo both here and as well as taking a look at some of the Shell demos that were part of the HoloLens 2.0 release here at Microsoft Build, is that there was an amount of perceptible latency when I'm interacting with different objects. And even as you're moving around, there's kind of a delay between when the virtual representation of you is catching up and being able to embody you as well. Now, if it's fully remote, then you don't see that disconnect of that latency. there seems to be a little bit of a perceptible latency there. And just in talking to Jesse McCullough, he was just saying that, you know, these are early, they're not the final hardware, there's still a lot of optimization, it's not the final delivered product, but to me, just to be able to have, like, a sort of a photorealistic, photogrammetry type of translation, you said it was like taking a 2D photo and doing machine learning to translate that into a 3D avatar, It does a pretty convincing job of both translating that into an avatar that has some level of eye tracking. Sometimes it's a little tricky to see and there's a little bit of an uncanny valley gaze that people have that as you're looking at it, you know, it doesn't feel like it's like a real human just yet. I think there's still a lot to happen in terms of like the modeling of emotion and other facial expressions that in order to get it to the place where I don't feel like it's kind of like a zombie type of character, but still being able to have at least somebody's virtual representation as well as a general sense of where they're looking seem to take the amount of collaboration to another level more than I've seen in other VR collaborations that don't have eye tracking.
[00:14:10.875] Anand Agarawala: Yeah, you know, it's funny, because we think about it all the time. We're like, hey, how come it's like we say in the lab, like me and one of our developers, Roman, I'm like, he's like, when I'm looking at you in the visual representation, even though it's an avatar with very little facial expression, I feel like I'm actually looking at you. Because something happens in your brain when you're seeing it. Because it's in 3D, the shape of the face, the spatial audio, the eye contact, like something triggers in your brain that is just very low level. And it's interesting, because we think about it all the time. We can have phone calls without any visual representation, right? and sometimes you get a zoom or something where it shows a picture of them and that helps a little bit and your mind just kind of makes up as if they're talking but we've been having phone calls for you know dozens of years now and like it's funny that like there's different layers of body language i think right like there's body language in your the way you speak and you're doing a podcast and so they're imagining what i look like and all that stuff right so and then you get a embodiment of them now. Now, we're trying to use as many signals as we can, because there's other representations of other companies out there that try to infer, like, you're smiling or whatever, and so we don't want to make up too much signal about what you're actually expressing. Now, the cool thing is, we don't have to, you know, some of the stuff we're doing and working on more in a research capacity will be out there later on, but, you know, we can mix modes, right? We can actually send a video feed. Like, if I'm, the cool thing that Spatial opens up, there's so many cool possibilities, but, There's four of us sitting in a room. We're having a meeting and two people are joining remotely from Tokyo. The four in the room, if I'm talking, the other three people are generally looking at me, and they all have 3D cameras pointed at me. So I can actually turn on a video feed of me to give you maybe a 3D reconstruction, or a 4D, and by the way, that's better than the video conferencing system you might be using with a camera all the way at the back of the room, where my back might be facing. So you actually might not even see my face. So if you think about what we're comparing this against, I think the possibilities of, even if it's strictly, if we didn't do avatars, we just did auto camera switching, spatialized video, that was just the best camera in the room, I think there's kind of cool possibilities. And we'll continue to increase it. Now, the one thing you mentioned, and so we are definitely on very pre-release, and we showed you a version of the build because we're switching between keynote mode and public booth mode and some things weren't quite worked out there. So you are not supposed to have an avatar overlay over top of you, but we turned it on because we wanted you to see what you look like kind of thing. But anyway, it's quite funny because, and that is worth, if you wanted to get into it, there are some sides that this AR technology opens that are not always rosy, right? And I think as technologists, we don't always think like, Hey, what is the downside of this? Do we know we're going to turn into scrolling zombies with all kinds of weird government stuff happening, where things are getting subverted and hackable, where society becomes hackable due to our reliance on this device and stuff like that? Are you happy to get into that? Because I think AR is even more immersive than your phone. Look how immersed we are. on a 4-inch screen that we hold, right, and we need screen time apps on it to restrict how much we use it, think about what happens when that is 360, 3D, and you can't leave, right? And not to, like, I'm obviously into AR, but I want to make sure we're not leading society down a dark path, which we should, we need to in this industry be constantly asking ourselves, because, you know, sometimes we don't ask that before it's too late.
[00:17:15.724] Kent Bye: Yeah, there was a sci-fi piece called Rose-Colored that really explored what does it mean to start to augment your personal relationship and to have different layers of augmentation and different layers of reality that you're being presented that may be disconnected from what the other person is experiencing. And what if they're doing the same thing? So you're kind of mutually living in your own filter bubbles. But I feel like in this demo, even though it wasn't necessarily the intention of having a physical embodiment of you in a space, and then on top of that, a virtual representation, it started to have this first taste of what it feels like to have a virtual avatar laid on top of you and have a different embodiment, and then just sort of see the subtle shifts of how it's like a ghost tracing you, but not exactly in real time, but just kind of see those different realities. And in fact, you were standing next to me, and you were talking. I could hear you speaking. But I really was trying to listen and tune in to what was coming through the headset. This was earlier when I did the demo. So I'm sitting there watching, hearing the second wave of the sound coming in, and tuning into that. My brain was able to start to parse it. Because the audio that I was getting was more in sync and in tune with the virtual avatar representation that I was seeing. So it was almost like me being able to kind of slightly modulate my perceptual attention to see like, yes, you're staying there and I can pay attention to you, but this is kind of weird and interesting. I'm going to like shift my attention to this virtual representation of you as well as the audio that's coming over the network with this slight delay. So it was kind of a trippy taste of what the future of living in an augmented world looks like. And I have my hesitations into whether or not we are going to be all walking around with AR headsets or VR headsets in public. And my bias, at least at this point, is to say, hey, there's going to have very specific use cases at home. But is this the kind of world that we want to create, where everybody's sort of walking around with these headsets, augmenting everything all the time?
[00:19:08.690] Anand Agarawala: I mean, I agree, I mean, that's a very, so here's the way I view it, because I've thought about it a lot, and I would love your perspective. I mean, for me, I think it's inevitable. I think it's inevitable based on the way display technology works, because think about it this way. You carry a laptop around when you travel. Some people now just carry a phone, but imagine you can get, your glasses you're wearing now aren't much bigger than a phone, and it's much smaller than a laptop. If you could get the same resolution you got on your laptop, would you not carry that instead? And if you could have the same input fidelity? And even think about your modern-day office. I mean, you've got a 4K display at best, and front of your face, maybe it's 30 inches. If I could give you a 200-inch display with also 4K, and it costs about the same price, because these devices are now sub-$1,000, if you talk about Unreal where the compute's outside, all the way up to, you know, $3,000, but at a moderate price range. So I don't think it's... It's easy for me to draw a line to, just in terms of just raw display technology, where if I can give you the same number of pixels, With potentially all these benefits, let's not even talk about the benefits of increased immersion, increased collaboration, increased presence, but just on raw display technology, same resolution, way smaller form factor, and way more portability, I think it's going to happen. Now, will we end up living in hyperreality, right? And like, do we want to live in hyperreality? I mean, it is quite interesting. You know, I think we are going to start using Google Maps We use directions all the time. I mean, I can, like, I'm in a new city, Seattle, right now. I've been using them non-stop. So, guess what? It's pretty easy to see the line if we're wearing these things all the time, where I'm getting directions through this thing all the time. And Google Maps has already given us a little, oh, you know, there's a Starbucks around the corner. Oh, did you know there's a McDonald's around the corner? Well, they're going to start trickling that into this free service they're offering you as little hints and ads that they're going to start to sell. So, that is the downside of this technology. It's all immersive and, I mean, when you think about it from an advertising perspective, like, we use our computers a lot, you know, we spend somewhere between, you know, eight hours a day on them maybe, six to eight hours a day on a computer or phone. And only 5% of that time is monetized with ads. You know, you're searching for something or maybe browsing Facebook, Instagram, whatever. Think about all the real estate you have now when you can place ads anywhere subtly. You can put a little Coke can on that desk over there in the corner of your eye and someone pays a cent for that or, you know, micro advertising. So that's why I think we've got to be, you know, you draw these lines and it's, It can get a little scary and a little dark, so we, I mean, why I'm doing this is because I want to put our humanistic ideals into this next platform and enhance the creativity, collaboration, connectivity. Now, there are some positive sides, I think, that this platform lets you do that others don't. So, for example, the platform is environment aware, AR, right? It's always sensing the environment. Let me tell you a dark side and a bright side of that. Bright side is that you're at home, your wife's trying to get your attention. Well, she can walk in and disrupt your holograms and get your attention. They can all disappear and vanish around her because we sense that there's a human body there. He's trying to say hello and you make it go away, right? We're I'm on the street, I'm in a new city in Europe somewhere, and I'm not sure where I'm going. I throw a question mark above my head, asking for directions, kind of thing, and someone says, oh, can I help you? Or you're trying to get somewhere, and, oh sure, yeah, just go this way, or whatever, right? That's a way we can potentially foster human communication. The other thing that is actually, if I may, just something I've been thinking about recently that I don't know how it's going to end up is that cloud computing is getting cheaper and cheaper. Cloud storage is getting cheaper and cheaper. These devices are always sensing the world. So they always have a 3D map of the world. And you can imagine a world where you're living with this device on your face at all times. And imagine cloud storage is literally free. So you could have a complete 3D history of your life. You could go back to your first kiss, the first time you maybe not walked, but like your kid walked. any moment in your life with a 3D, and if there was multiple cameras in the room, multiple devices, you could see it in a 3D stitch. So, yeah, it's fascinating. I mean, I think how we decide to use this technology is, I mean, just like any technology, right? Nuclear technology, whatever. I think there's all these different paths you can take, but massive potential, and also we've got to be careful.
[00:23:27.737] Kent Bye: Yeah, I'm actually giving a talk at Augmented World Expo about the ethical and moral dilemmas of mixed reality, and also presenting on the same topic at SIGGRAPH, talking to Magic Leap and Mozilla and 60AI and Vin Agency, all talking about how they're specifically looking at privacy and ethics within mixed reality. So I've been looking a lot lately at things like Chinese philosophy because they have the yang and the yan and it's all about balance because I feel like with these ethical and moral dilemmas is that there are these sides between the good and the negative and that the reason why it's a dilemma is because you can never have all of one or the other. It's like With all these amazing potentials, there's all this dystopic potential futures that we have. Keiichi Matsuda's hyper-reality as a speculative design looking at the future of advertising, taking to not even what he considered to be the most extreme, but even just an overwhelming hijacking of attention and gamification of all aspects of your life, and how just like walking into a casino, but that's your life. So, I feel like there's going to be a need for a balance between those things, and I'm actually very hesitant to even using much technology at all when I'm embodied face-to-face at conferences, just because I find that there's a certain flow that I fall onto, and that in order to really cultivate that flow, it's to avoid using the technology that can break the flow. But with the spatial computing technology as a platform, it feels like it's affording us these new ways of interacting, new ways of getting us out of our maybe strict linear flow and to really cultivate this sense of open innovation and collaboration. And it feels like this spatial computing combined with a little bit more natural intuitive ways of interacting and lowering the friction between having an idea and being able to express it in some ways. It feels like that's the thing that Spatial is really trying to hone in on is what are the minimum viable ways of gesturing or interacting? Can you start to go from idea to then expressing that idea and being able to iterate and have this sense of open collaboration and innovation?
[00:25:30.260] Anand Agarawala: That's right. And I mean, one thing just thought that came to mind when you're talking is I think the fundamental problem, and this will be a problem in AR too, is notifications really, right? I used to work on Android. Mom and Dad got acquired into Android, and so we were part of that crew early on. And so the fundamental design problem is notification, because there are times when you want to interrupt someone, like a phone call from a loved one, or maybe a text message from someone you don't really want. But because of that little door, Everything get in right and so when do you interrupt someone and so spatial is designed that like I have an idea I hold up two fingers I say it and now it'll be stored for you'll be waiting for me at my desk when I get back I can just I don't have to humans aren't good at remembering but they're good at being serendipitous and creative and we want to leverage what we're good at and two finger the rest thought we call it thought flow is the name of the feature but Imagine that now when you're getting notifications all over the place or whatnot, and now in an immersive way where you feel like you can't escape, you know? Yeah, I'm really excited about your talk there. I think, like, just to bring it back to what we think we can do with it is, it's really about expressing your mind, expressing what's on your mind creatively with very little friction, and now being able to do it in a collaborative way. So if I'm jamming on something, I'm scribbling some ideas out, I'm thought-flowing some stuff, I'm exploring the stuff, And I'm like, you know what? I could really use my friend Peter's thought process on this. Boom, he beams up right beside me. Now we can jam on stuff together. We can literally toss ideas back and forth, because everything's very physical and spatial. Not a full physics engine, like my last startup, because it can be a little problematic code-wise. But definitely that physical, tactile feel, right? It makes it more playful. And that's really what this interface is about.
[00:27:08.884] Kent Bye: Yeah, I'm curious to hear a bit more about how you dog food your own product in order to design your product.
[00:27:14.112] Anand Agarawala: Yeah, I mean, so yeah, you know, we try to do all our meetings in Spatial, so at least once a week, we have folks in San Francisco and New York, and at least once a week, we meet in Spatial. And, you know, it's really cool because just all these nutty things happen, like, one of the coolest, for example, was when we had a snow day, and we were in New York, so, you know, lots of snow, and all ten people had to work from home in New York. And what was cool is we all joined from 10 different locations, and imagine that meeting on Zoom, that would suck. We're all just 10 boxes, Brady Bunch view, who's looking, who's talking, who's looking at what. We were all chilling in my living room, because all 10 of them were all hanging out, all in different locations. I think two folks were in Korea, because they were on the road, a couple in SF, some in Brooklyn, some in this. And it's just, it doesn't matter, the location's relevant. We're hanging in my living room, we're jamming, we're like pulling up, we're playing Where's Waldo, because one of the guys on the team's named Waldo, I held up two fingers and threw up Walden, we're just goofing around in my living room and so I love the idea of the social side too, like actually I've had intimate experiences where my buddy Peter, we're just literally watching TV, he's sitting on my couch virtually as a hologram and we're just hanging out, you know, you can imagine that, like watching the game together and Roman who works from home and has kids, he's like, hey I don't get to see my friends anymore with the pressures of life and stuff, the constraints. And now I can like actually holographically chill on my couch and drink beers and watch the game with my buddies, which I don't really get to see anymore because all that's now like just a construct that we can just instantly teleport to. So yeah, the dock footing is really cool, man. I mean, we had an instance where we did a meeting with Microsoft actually across, I think, six, seven sites, Australia, UK, a couple sites in Europe, four or five sites in North America. It was in Australia, it was like near 1, 2 a.m., and the guy thought it was so cool, Lawrence Crompton's his name, he thought it was so cool, and he woke up his kid. And then all of a sudden, because it's height normalized, so all of a sudden the headset got really low, and we heard this guy like, hi, what's up? And we told him not to wake up his kid, it was 2 a.m., he's like, no, I'm too excited, I gotta do it. But anyway, that was quite cool.
[00:29:12.772] Kent Bye: Well, I was talking to some Magic Leap developers, and they were saying that just the fact that you're able to have this shared social experience while you're hanging out in your own personal context of your home, then that context puts all the associations of the level of intimacy for inviting people into your home. But they're also in their respective home, and so you're kind of virtually in their home, and they're virtually in your home.
[00:29:37.705] Anand Agarawala: The other thing I wanted to, it's funny you brought that up, because basically how we fuse space, our model, and there's a couple different models you can take, but our model is like, we want everyone to feel like they're in the same space. So if they're in the same shaped room, like often conference rooms in corporate environments are, same size table, exact same furniture, exact same whiteboards, often across various cities, right? So if you're sitting in a couple chairs on one table, people in the other conference room will be sitting in other chairs. Now the question becomes is, one person's in a coffee shop, one person's in an Uber, one person's in a, and when the environments get quite different, right? So the way Spatial's modeled right now is that everyone's virtually in each other's space, but sometimes you could be in someone else's wall, because the room geometries might be different. And so we've played with some fun stuff where it's like, okay, what if we scale the environments to all fit with each other? But sometimes the topologies of the environment are different, too. Like, for example, if you're sitting on your computer table, like, this happens all in dog food, right? Because people will be joining, and like, they're facing our wall, like, because what we do is we find the largest wall in the room, We all key off that. Everyone just kind of uses that as their primary. So now you have a shared physical anchor in your environment. You have this shared digital wall that you can toss stuff up to, and it generally makes sense in all your environments. But we're also playing with a table-centric view because sometimes the table's the center of interaction. And, you know, also if you're sitting at a computer desk and you're facing a wall, you want that to be probably remapped to on the table of another person's room. So all these kind of funny geometries kind of take over. And then in terms of the intimacy level, yeah, it's funny because, yeah, you might be in a completely different vibe. I'm in an airplane. I'm in a seat of an airplane or whatever. And so we're still thinking about, I mean, one thing we've played with is like just we don't right now do a great job of revealing what the other person's environmental conditions necessarily are. They're just a disconnected avatar in our environment. Now, we've had so many requests for Hey, can we see what the other person's environment look like? Can we kind of even remote assist that other person or, you know, in the more enterprise context? But then in the personal context, what if I want to step into your space bubble? Let's go over to your house for a minute and hang out there and actually be in that environment. Or let's jump into my bubble and we can be in my, you know, and like just maybe subtly reveal the environment they're in and allow fluid going back and forth between the two. And imagine dialing up the AR VR dial there too, right? So as you jump into someone's environment, maybe you want to actually ratchet up the immersiveness. Like maybe start off with something where we're in AR, we're respectful of each other's environments, but hey, let's go to your place. Because in the future, AR and VR will be a superset. And now we're in fully VR, because I now want to be immersed fully in your place.
[00:32:02.553] Kent Bye: Yeah, I was going to ask about why AR versus VR, or if you're planning on doing both. Because I could imagine that, just like we're talking about here, having a sense of shared space, like in virtual reality, that's not so much of an issue. Because when you're unveiled within VR, you are completely shifting your context. And with AR, you're maintaining your existing context, but you're adding additional layers of metadata and information onto your existing context. But if you're in an Uber, you're still pretty much on an Uber. And you can have some level of situational awareness. But what are the trade-offs that you're looking at in terms of thinking about these different collaborations of a meeting? Like, when does it make sense to have an AR meeting? And when does it make sense to have everybody go into a same virtual environment within VR rather than AR?
[00:32:46.299] Anand Agarawala: I mean, I, you know, we kind of have strong beliefs on this. I mean, we actually had a VR version of Spatial and we scrapped it. I'm curious what you're taking. I mean, there's obviously the voice of VR. I always wonder when it'll become XR or... I have the domain. I just haven't flipped over it yet. Okay. One vote for that, but... Well, just because, I mean, I think VR is dissociative, and I think it's not for everyone. I mean, I find it like... People, when they come out of it, they feel... disoriented, disjarring, the real world feels muted because the colors aren't as bright and it's not a sensory overload and so and then you're bumping into stuff and like you know when you're in there like and so now I don't want to say there's no place for it right and you know I'm a fan generally but I think that AR I think it's really important to maintain a basis in the real world I think if you ground everything in the world now that said there are times when you want to immerse. And I think ultimately, the way technology will evolve is AR will be, and as the field of view gets bigger in AR, it's kind of VR anyway, just without the full enclosure to block out outside light. And as you mentioned, when I had the AR avatar overlay, which was not intended, but when I had the avatar overlay, you kind of start to forget what was going on underneath, because your brain just keys in on this new digital reality. So at larger fields of view, it's very similar, I would say, anyway. And some contexts where VR wins is where you want that level of immersion. And so I'll give you an example in our collaborative meeting scenario. In AR, if you're all in the same room, putting on VR is a downgrade because now I lose eye contact, I lose being next to you, I lose being able to use my laptop or my phone and be able to see my devices. So it's kind of a downgrade in that case. And you can't really imagine walking into a meeting with five people in it and they all have VR headsets. The idea is just kind of goofy, right? Whereas all those guys in AR headsets, guys or girls in AR headsets, you can see each other, whatever. Now if you're at home, that remote, joining that meeting, then you kind of want to feel like you're in that meeting or in that room potentially, right? And so I think then increasing level of immersion is actually a good thing. Now I think The way VR is built now, where there's such a hard transition back to the real world, because right now we mask as much light as possible just so the clearest display or whatever, but it's not that bad a thing to be able to peek in and maintain some context of reality, I think. Sometimes I even peek under the little hole of the VR headset to see my hands or look at something or whatever, get a look at something. So I think it doesn't need to be as hard a distinction. It's going to blend anyway.
[00:35:10.610] Kent Bye: Yeah, I think it's going to be context dependent and depending on what you're looking at. So for example, if you're doing a review of architecture, I would much prefer to do an architectural review within VR because you'd be able to be completely in that space and be able to not have to deal with all the kind of funky overlay and entire virtual reality with kind of a small field of view and with a lot of pollution coming from other lights and whatnot. So it's just like not quite as an immersive experience where you're not even able to really see the full virtual world. in, like, an AR device. But on the other hand, if you're in an AR device, you can start to see a 3D object in front of you. And if you're with other people, you could potentially kind of look at other people in the eye. And I'm not sure, I haven't used a HoloLens enough, but I know that being able to see other people's eyes and eye contact is a huge thing. And so if it's a little difficult to see that eye contact in AR, then, you know, you start to lose a little bit of that real affordance of being face-to-face and be able to have that eye contact with each other. But also just in terms of 3D interactions, and I found that doing like hologram placement and movement is okay with being able to move your hands. But if you wanted to go and do like 3D modeling, I would never want to use something like the HoloLens because it's just not going to have the amount of fidelity that you would need to have that level of precision. So if you're actually in a program where you're making the content, you know, maybe it makes more sense to have, like, 6DOF controllers that have buttons, like the Oculus Quest is going to have an amazing ability to have input. So you're going to have so much input control where you're able to make very specific spatial creations. So, like, Gravity Sketch is a good example where you'd be able to actually kind of concept out and paint something. You know, Tilt Brush is a popular one that's going to be launching on the Quest, and at some point Gravity Sketch as well. But there's going to be certain things like that. So very specific use cases where you're trying to do spatial design with your body. I think VR is going to be way better, at least for now, up until we get something like a 6DOF controller for the HoloLens, which I expect that, you know, there's going to be a certain amount. You can only do so much poking at holograms for so long until you start to get tired of it. Wouldn't it be nice to have a little bit more of a sophisticated 6-tof controller with some buttons and maybe a very similar to what the Oculus Quest has with their controllers to be able to do a little bit more of abstracted agency? I mean, natural intuitive interfaces are great, but they can be very fatiguing. So finding what are the blend between those things. So that's at least how I'm thinking about it and how there's going to be certain use cases and certain things that you're doing that I think in some ways it's going to be a better user experience to kind of flip over into a different technology stack.
[00:37:39.057] Anand Agarawala: Well, I think there's two things there, right? I think, like, there's input, and then there's display technology. Like, I think, for example, like, I, you know, you could give a, I mean, it's not hard to Bluetooth pair a 6DOF controller to a HoloLens, right? I mean, not 6DOF, but you could imagine, like, Magic Leap has a 6DOF controller, and it's paired with an AR headset, so you can kind of mix stuff up. But I think, I would say that, like, I can imagine a beautiful hand-crafting interface for, like, like, I view HoloLens AR hand-tracked input as, like, sketching with a pencil. and like VR, or it doesn't necessarily need to be VR, but like an Oculus style controller, more is like Maya, right? Or something where it's like super specialist interface. Like if you watch someone who's good at Maya, they're hot keyed out, they're using the hotbox, they're doing all kinds of stuff, marking menus and everything, and it's like, it does kind of feel like an extension of their body, but it's highly trained, and it's not necessarily like, Whereas paper, pencil, is completely accessible. And if you're a specialist, you can actually have a pretty high curve on how precise and fidelity, you know, but there are some, I mean, I think, yeah, I think with, you know, there's trade-offs, as you said, right? Like the Chinese philosophy way of saying it. There's always a trade-off, and I think I mean, I wouldn't say AR, I mean, it's possible that AR, I wouldn't say it's an AR versus VR thing there, I think that, like, that's actually an input quality thing, and I think, like, Holland's making the decision that we don't want you to have to carry as much, anything additional, and I think they're really pushing the envelope there, because, like, hey, hands are pretty damn good, but I think we can also, well, actually, may I add, Eye tracking, okay, because that the thing that gives us imprecision is because we're yes We're using hand tracked sensors, which are pretty fuzzy and great for gross gestures, but maybe not great for precise thing But by the way, we were doing air cursive there. That was pretty legible now that was not actually using eye tracking to determine intent you bring your eyes into the mix and Using your eyes plus hands as an input device has never really been done before and I think the possibilities are very interesting. I
[00:39:29.940] Kent Bye: Yeah, I was just going to say that it seems like strategically Microsoft is focusing on a very small niche frontline worker use case, which people in factories, construction work, jobs where they need to have their hands free because their hands are already carrying around different stuff. They can't be worrying about carrying around these different digital controllers to even interface with the technology. It needs to be able to work. for that frontline worker use case, which I think is good in the sense that it's pushing the technology into a very specific use case. However, when you start to look at the knowledge workers who do have these very high technical interfaces, then I think you are going to eventually want to have a very high level degree of precision for those sixth degree of freedom controllers. And the Magic Leap is using sort of electromagnetic controls. It's not like as precise as the Vive, which I'd say is probably the most precise. the Oculus with external controllers and then the Quest has an amazing ability for tracking but I'd imagine like Beat Saber might be like at least a phenomenological way of testing the different tracking technologies and when does it drop tracking I mean it's a little difficult to kind of pin down specific numbers on the different but phenomenologically I'd say like the Vive is probably the most accurate and then you know, the Oculus with external sensors and then with the internal sensors with the Rift S and the Quest are pretty good, but they're not going to be able to reach the same level of submillimeter accuracy in being able to do spatial design as something like an external sensor like a laser. So, yeah, I just feel like that because that's been the focus of Microsoft that I'd imagine that they haven't been as focused of being able to like create something that's equivalent of something like a computer vision tracked controller that is something like the Quest is using.
[00:41:10.265] Anand Agarawala: Yeah, I mean, I think I know a couple things. I mean, I think, yes. I mean, by the way, the Hollens is six off, but it's your finger. So what they're doing in the tracking is harder than the thing. But what I would say is like knowledge work. I kind of disagree that knowledge work requires precision. I think like knowledge where you do knowledge work, and you're probably in a browser and email, or you know what I mean? And like your whatever you use to edit your podcast. So I think like, you definitely need precision for technical I don't know how you call a category of like, whoever's using Maya, Cinema 4D, maybe After Effects and whatnot, but like, I would argue that knowledge workers generally, they need more cognitive real estate, they need tools to be able to, the general flow of knowledge work is explore ideas, triage or analyze those ideas, potentially collaborate, maybe get feedback on those ideas, potentially collaborate, and then come to a decision, right? I think that's the kind of typical knowledge workflow, so I don't see what part of that has to have precision, 6DOF, you know, input to level of 5. Now, does it need text entry? Yes, right? You probably need to, well, actually, I would even argue to level up, you need some way for the device to determine what you're looking for. or what you're trying to express, a.k.a. you're writing an email or something, right? But that doesn't need to be a 6DOF controller. I think text entry in this paradigm of like, you know, how we're using text fields and stuff like that, you need some way to express, but I would say that to do that loop I talked about where explore an idea, various facets of an idea, maybe get feedback on an idea, triage and come to a decision, I don't think you need the level of precision 6DOF you're talking about.
[00:42:37.343] Kent Bye: Well, what I would say is that it's really nice to have a physical button, depending on the task that you're doing. If I'm editing audio files and then I'm cutting out ums and ahs, you know, I couldn't imagine doing the same thing in a spatial interface where it's only 80% accurate, where I'd have to, like, do a lot of extra poking and stuff. So I feel like there's certain things where having the physical button for a specific task, where when you push the button, you know it's going to work, And I feel like with just the hand interfaces, there's a certain amount of like more fuzzy, kind of like it doesn't, like you poke the button two or three times for it to get it to work. So there's a certain amount of like, it's nice to have a button sometimes, so that's what I mean.
[00:43:14.356] Anand Agarawala: Yeah, but I think like the problem is, is that we are using mouse-driven UIs for fat fingers, right? And we had the same problem with, the iPhone was the first phone to respect the finger as a different type of input device. Big fat targets, big gross gestures like scrolling. And you can like, you know, you can kind of, they have interface, you know when you're editing a video and iMovie on your phone where it'll zoom in, and it's not great, but you know, and I'm not trying to say, I'm just trying to say that like, the devices, and I think this is typical, like in VR and AR, people, when they go to new media and they use the old interfaces, they don't really quite fit, and they're not tuned for the new input device, which I, in hand tracking, you know, if you were to just take that as an alien and look at that and what's the best interface we can design, it wouldn't be a point and click thing that we're given now. Yeah, pointing and clicking and precisely, you know, and dragging is not great with your fingers. I agree with you 100%. And I think if you're trying to use an interface that, you know, and I also agree with you that clicking something feels very satisfying. And by the way, you can pair a Bluetooth keyboard and mouse to a HoloLens and it feels really damn cool because I've actually web-browsed with like a 100-foot monitor before and it's cool because I have, all my tabs are like more reflective of how I feel they are organized in my mind and I'm doing the clicking and dragging, so I don't think it's exclusive.
[00:44:23.750] Kent Bye: Well, I think one of the things, both in talking to you and after doing the HoloLens 2 demo, was really focusing on the eye tracking. So you're looking at a little gem and you say, pop, and it pops. So it's this concept of context and being able to determine the context that you're at, what object you're looking at, and what objects and affordances are made available as you're looking at very specific things. So I feel like there's this whole realm of spatial computing that is about being contextually aware. And now that there's eye-tracking information that's there, you can start to maybe infer and maybe have a little bit more sophisticated conversational interfaces that are made available. And I find that exciting to see. If you're looking at something now, you can say something. And again, it starts to feel like magic of being able to almost have the technology read your mind. So you can give a little bit more ambiguous information, but paired with what you're looking at, start to extrapolate the intent of what you're meaning. And then from that, to be able to determine a deeper context. But generally, just identifying what those contexts are and to identify what affordances are available for each of those contexts feels like one of the things that has to be kind of figured out in this next wave of spatial computing.
[00:45:30.801] Anand Agarawala: And I really hope, I mean, I hope somebody does, like, a hack challenge for developers. Like, the eye tracking is such a dark horse of potential that no one's really talking about. I mean, I hope they do, like, a hackathon and come up with crazy eye tracking stuff, because if I give you a video, or if you're sitting next to someone, they don't say a word, and you just watch where their eyes go, you can tell a lot about what they're thinking and their mind state and what they're thinking about. Like, it's, you know, body language, eye contact is one of the strongest cues. It's such a strong and it's so close to your brain in terms of the whole loop that it gives you intent so much now. And I think like eye tracking interface, your eye moves way quicker than your fingers, you know, we could on a mouse or a hardware device, right? So I think the potential there is super strong, like refining that, you know, to make something designed really for the eye. And of course, the only thing you need to do is a clicker because you need to like, if you want to use the mouse cursor model, 2D pointer model, fast thing, click, I mean, tons of potential there. And yeah, also to infer intent beyond what like an Alexa could do. I mean, imagine instead of saying, Alexa, turn off the lights, like I'm looking at the lights and I just say off, right? Like you cut out so much more.
[00:46:34.977] Kent Bye: Yeah, obviously lots of privacy concerns there in terms of what you can infer from information from eye tracking. But yeah, I had a chance to do a demo of the iFluence. I think they were bought by Google. But just to see a whole user interface that's based upon eyes and eye tracking, I think that there's certain whole techniques that are out there that I've already started to flesh out what that looks like. But having multimodal interfaces, I think combining them, I think is going to be the big next frontier.
[00:46:59.444] Anand Agarawala: Absolutely. 100% agree.
[00:47:01.485] Kent Bye: And I guess for you, what are some of the either biggest open questions you're trying to answer or open problems you're trying to solve?
[00:47:08.367] Anand Agarawala: I mean, I think like, yeah, I guess there's several layers in which we think about that, right? Like long-term, short-term kind of thing. I mean, I think like short-term we're like, hey, can we, well, okay, let's start long-term. That's always more fun. I mean, long-term is, What kind of dent can we make, you know, in the universe, right? Like, I think, like, there's a huge sea change coming, and we've seen what happens with that when mobile happened, and I think there's a real chance to help shape and define how things will look for a long time, and hopefully enable people to be more... If everyone could be more creative and more collaborative, I think, like, how cool of a world would that be, right? And there's, like... Your ideals are embodied in your interfaces, I think. You know, they come through. And I think, like, the way computing evolved, we didn't really design a whole heck of a lot of the stuff. I mean, the iPhone was highly designed, but, like, von Neumann architecture, you know, like, the way, I mean, we just kind of had to work with the machine constraints. And now, with Augmented Reality, we can create whatever reality you want. So, anyway, I think the longer term, what we're trying to achieve is how much can we, you know, elevate. I mentioned the company mission is to elevate human creativity, productivity, and happiness, and really, How much of that can we do, right? And how much of this bold new world can we explore and leverage? I think medium and short term is like, you know, we're all thinking, hey, when does this AR, you know, if our thesis is AR is going to transform computing, when does that happen, right? And what does that arc look like and when do we make that happen? I mean, we are excited that we're one of the few folks that are kind of You know, AR is kind of, you know, a little, you know, first-line worker-y right now targeted, but we are, you know, we are really excited to bring it to the knowledge worker. And, you know, we are, that's us. I mean, we're building this product for ourselves too, right? And, yeah, I think one of the short-term problems is just like, and it's really fun because we want to make sure we're building a broad-based solution that's broadly applicable. And so that's why it's so cool to see all these different sorts of industries and use cases kind of pick it up. We want to be paper. We want to be a piece of paper, just neutral, super expressive. It's your white, you know, blank canvas.
[00:49:08.864] Kent Bye: Yeah. You mentioned that one of your company goals is for creativity. What are your metrics for success to know that you're being successful and helping to foster and generate creativity in different companies?
[00:49:20.810] Anand Agarawala: That's a real good question. I mean, that's tough. I'm not sure, you know, I feel like it may be just, well, we'll know if the users tell us, you know, like, hey, look at all the stuff I was able to make and, like, right now the user feedback's like more like, holy moly, this is amazing, this speaks my language, you know, and I think, like, I think the users will tell us. I think, you know, I think, like, we'll see stuff saying, like, I collaborate with this other musician on the other side of the world, I'm able to get into Spatial, I play my guitar, they play their thing, I feel like I'm in the same room with them, you know what I mean? I think we'll have to, it's very qualitative, you know, there's no real, like, clear metric on, like, creativity per minute or creativity clicks per user or something like that, so I don't really know how we're, I mean, I think, like, it's gonna be probably largely intuition and feedback driven.
[00:50:05.330] Kent Bye: And we're coming up on the end of day one of Microsoft Build, and you had a big opportunity today to demo Spatial in front of the entire audience here at Microsoft Build during the keynote. Maybe you could talk a bit about what that was like for you to be a part of that demo here at Microsoft Build, showing off this Spatial app using the HoloLens.
[00:50:23.423] Anand Agarawala: And it's so cool. I mean, we've been so lucky. Like, in two months, we've been on the stage with Satya now, CEO of Microsoft, twice, you know? So, that's unreal. I mean, we're so excited. I mean, not to shill for Microsoft too much, but the HoloLens is really a bold bet. I mean, no one else in tech really has, and we encourage them all the time, you know? If you're listening, you know, like, the other companies get in the ring, you know? I think Magic Leap, I'm really excited that there's another viewpoint on AR. I love that they're coming with this creative, psychedelic, you know, it's another creative viewpoint on what this is. And, you know, Microsoft's got their currently a bit more utilitarian, but it's cool to see other viewpoints. I think the beginning of a birth of an industry, you need those kind of diverse opinions. But in terms of what the experience is like, man, it's so cool because we've been dreaming about the HoloLens 2 for two years, right? And, like, to be part of the moment that it kind of was unveiled to the world, and then we just got one very recently. And it's like, holy crap, there's so much potential. Like we mentioned, we're just scratching the surface to get back on there and to show some more cool stuff that we just came up with. And the cool thing with Microsoft is actually, as a developer, it's one of the few full stacks that you can actually integrate with in the sense that we're building on their device, the HoloLens 2. their whole cloud stack, like teams and all that stuff, like, okay, you might use Slack, but Slack doesn't build a hardware device that I can easily, like, no one has that kind of full 365 solution to use their, you know, marketing parlance. But like, the way we can kind of integrate and provide like an end to end, like, you know, we're going to do something for G Suite and we're going to do something for Slack. So you do Slack login, and you get the Slack avatar from the photo, and you get the Slack channels, but you're probably still using Gmail, and you're probably still using Google Docs to some degree, or might be using Dropbox. And so, you know, there's obviously trade-offs with the Microsoft Stack, but it is end-to-end, and so it lets us craft a pretty compelling vision of the future, because we get identity, we get content, we get applications, productivity applications, we get hardware devices, And just from end to end can deliver a compelling solution. So for us, it's amazing, man. We're a 12-person startup in New York and San Francisco. And to be on this level of stage and to connect with everybody, get all the feedback, it's just nuts. It's so cool.
[00:52:35.410] Kent Bye: Great. And finally, what do you think the ultimate potential of spatial computing is? And what am I able to enable?
[00:52:44.680] Anand Agarawala: I always think about when I listen to your podcast, I always think about when it comes to that time, how am I going to answer that question if I ever get asked it? You're like the Terry Gross of VR, you know? It always changes in my mind every interview how I'd answer it, but I think the thing that comes to mind to me is Pacific Rim. You know, when the mechs get so big and complex, you really need two fused human minds to operate them, working kind of in unison. I mean, I think, like, the things that, like, I try to take inspiration from in terms of the type of interaction that spatial supports is, like, ballet or hockey or basketball. Like, when you have a unit of a team, five people moving up the ice at once, passing to each other, you know, almost, like, not communicating with words, but just moving as kind of one entity, you have this kind of collective intelligence. And I think that how we communicate with computers today is so far from that. And I think the potential with this new collaborative display technology, the fact that we can all kind of see the same reality now and express our ideas in the same ways, I think that is the really exciting thing, that we can just... There's so much friction in expressing, I feel like, your ideas and thoughts into projects and things in the world. You know, I mean, even if you're good at Photoshop and stuff, you'll still be crafting away or After Effects, you'll still be crafting away forever. I would love to kind of, you know, so I think for me, the ultimate potential is the ultimate collective intelligence, collaboration, communication platform, right? Be able to hopefully elevate our collective mind state by being able to create together at a scale that we've never been able to.
[00:54:15.796] Kent Bye: Great. It's a beautiful vision. And is there anything else that's left unsaid that you'd like to say to the immersive community?
[00:54:24.665] Anand Agarawala: Well, I think, yeah, I think, like, just, you know, I think for me, immersive community, one, think about, it's not just what we can build. Let's fast forward to see what are the downsides of this technology and think through those scenarios. So we have to, I mean, otherwise it's going to be too late, you know. I think it's really important to get ahead of this stuff. And, you know, ask those hard questions, because, like, when I get into AR, I ask myself, like, do I really want to be contributing to more zombification of people when, you know, scrolling their lives away on their phones? Think about the downsides and the impacts of what we're doing. I think that's really important. Two, think about new paradigms this enables. Don't just rest on your laurels and use the old 2D mouse interface. That's not appropriate here. We have eye tracking now. We have finger tracking. We're going to have to rethink how we do stuff. And now we have the ability to connect our minds in brand new ways. Even in VR, you'll see a ton of VR games take that first-person shooter they made on a PC, throw it on a VR, and like, is that exciting? No, it's like new stuff that's exciting, where you're like super hot, or something that you couldn't really do on a, you know, or motion, and everything's part of it. So I think, think big, I think is the other thing, you know, like, there's a brand new world here, and we get to be the first ones painting the brush strokes, so let's kind of figure it out, and then think of the dark sides, make sure we get ahead of them, and are doing the right thing there, and then think big and bold, and don't just copy the old stuff of yesterday, come up with new stuff.
[00:55:49.200] Kent Bye: Awesome. Great. Well, thank you so much for joining me today on the podcast. Thank you.
[00:55:52.742] Anand Agarawala: Thank you, man. It's awesome to be here.
[00:55:54.983] Kent Bye: So that was Anand Agarwala. He's the CEO of Spatial. So I have a number of different takeaways about this interview is that first of all, well, there is a part of me that's hesitant to always have an AR device on my face that I'm walking around all the time because of this vision of like, when do you want notifications? When do you want to be interrupted? You know, I think of Keiichi Matsuda's hyperreality, which is this vision where as you're walking around, it's basically like walking into a casino where you have all these pop-up ads and all this gamification of life. And it's this whole digital layer of reality that the intention of that seems to be not necessarily benefiting you to make you more connected to your life, but it seems to be other people that are interfacing with your life to be able to control and manipulate you in specific ways. Now, I think that's sort of a more dystopic vision. A more utopic vision is that maybe sometimes you're consenting to wanting to have notifications or to wanting to have more context and information about what's happening around in your environment. And it doesn't necessarily have to be such a polarity binary between We can never want to have any contextual information about where we're at, because sometimes we actually do want to know information. If you're walking around a city you've never been in before and you want to pull in some Yelp reviews or be able to search for a restaurant that's nearby, then maybe you want to just have that up in an interface. So I guess I've just been a little bit more hesitant to that vision of the future, just because sometimes when I'm out into the world, I just want to turn off my phone and be completely embedded and present to whatever is emerging. But I think to what Anand is saying is that we already, when we travel, we may take our laptops, or sometimes we may only just be traveling with our phones. And now that there's some of these self-contained VR devices that are out there, maybe some people will only start to travel with their Quest device. at home at night, have entertainment or play Beat Saber or be able to hang out with their friends while they're traveling, then maybe that's a self-contained unit that they're going to start to travel with. For a lot of people that having their phone with them, that has replaced the need to always have to have their computer. They can still have access to the internet and the main services that they need just from their phone. So what if those glasses devices, they get so sophisticated to the point where you're able to project a whole spatial environment around yourself so that you're able to have access to what is equivalent to a laptop with 10 different 4k monitors that are all around you. So to have that access to a spatial computing device, I think is actually gonna be something that is inevitable. Eventually the technology is gonna get that good where we can have the equivalent of this huge computer monitor in front of us. But the difference is that it's gonna be spatial, it's gonna be contextual, it's gonna have access to our eye tracking data to be able to look at things and to be able to speak affordances that are connected to the thing that we're looking at. And so it's gonna be much more natural and intuitive. And I'm already starting to see that with some of the early prototypes for spatial. So when I saw Spatial, I think it's still pretty early. I mean, even the hardware of the HoloLens 2 is not the production ready. And the demos that I was seeing are still kind of like the demos that you'd be seeing in a booth, very contrived in terms of, you know, very specific tasks that you're doing. You know, during the keynotes, again, they have a very specific demo that they're showing. You know, I'd really be curious at some point to see once this is production ready, it's deployed, to kind of peek in to see how people are actually using it in different ways. I think in certain use cases, there's going to be different workflows. You know, one of the things that Anand was saying is that they want to treat this spatial computing technologies like pen and paper, like it's just as accessible as a pen and paper. Well, the pen and paper, I'd say, is a tool just like any other computing technology is a tool. And that for some people, a pen and paper is the least accessible interface to be able to express your artistic vision because you have no artistic talent to be able to actually draw things out. Maybe your talent is to be able to take a picture in Photoshop and kind of munch things together or Illustrator. I don't necessarily think that the pen and paper is any easier to be able to express creative ideas than a computer technology for certain people. And so I think it's less about like creating this hierarchical, you know, this is easier or not easier or more democratized. I think depending on what your temperament is and whatever background you have, you're going to be able to have different ways that you're able to express yourself given whatever different tools that are out there. I see pen and paper as much as a tool as a computer or a spatial computing device. But where I see this going is in this really interesting direction of what does it mean to be creative and to have open innovation and to be able to communicate and collaborate with a team of people and to have this vision of what Anand was saying is how can you create this cohesive flow states as a team that you might see as a sports team playing basketball or hockey as we're in the middle of the Stanley Cup and the NBA finals right now. We're watching a lot of these teams that are reaching these states of peak performance where they're not verbally communicating, but they're using their bodies to be able to communicate with each other. And I think that's the vision of spatial computing is that we have these more nonlinear ways to be able to collaborate with each other and to be able to create things together, to be able to create things that we wouldn't be able to create on our own. And I think that's the exciting thing, is how these things can start to work in these collaborative communication environments. And I think that's why I think it's so interesting to see what Spatial is doing, because they're on the front lines of starting to deploy these new prototype technologies, like the HoloLens 2 with eye tracking and the gesture controls, and starting to work with these different companies to see what type of very specific business problems that this can start to solve. So I do think that there are some very clear use cases when it comes to anybody that's doing any type of spatial design where they have to do some sort of like review of these 3D objects. And I would say that I think there are going to be certain use cases where you want to be completely fully immersed within an environment where you are in an immersive virtual reality environment because the HoloLens, while it's greatly improved with its field of view, it's not going to be as good as doing architectural visualization where you're completely immersed within VR. And so some of the hesitations that Anand has in terms of this dissociative, not feeling fully embodied, like I think that as you start to get more and more of your embodiment within the virtual reality, then it's going to be a little bit less dissociative. Like right now you can usually only have your hands in the VR headset, but I think eventually we're going to start to find ways to put more and more of your embodiment into the virtual reality experience, especially with all the different advancements of computer vision and machine learning. as it's starting to track these hand track controllers, it's going to eventually be able to perhaps track more and more of your body to be able to put your body into these virtual representations. And that's some of the demos that Oculus and Facebook were showing at F8 was to show these people using just these 2D videos and cameras to be able to translate their embodied movements into these fully immersive virtual reality experiences. And that's using a lot of the pose detection AI algorithms that have been having all sorts of huge innovations over the last couple of years. So with any technology, there's going to be various different trade-offs. If you're face-to-face and you're trying to work on a 3D object that you are looking at, then something like having a HoloLens may be the better use case. But if you're remote, then maybe you do want to have a virtual reality experience that you still have that virtual representation, but you feel like you're maybe co-located in the environment rather than if you're in an Uber, you may want to feel like you're actually there face-to-face with some of these different experiences. So I think there's just going to be different contexts, different use cases, but that there's a use case for both VR and AR and if people only have access to like a tablet or an iPhone or Android. So the Spatial Anchor service from Azure that is integrated within Spatial, it allows you to see something within the HoloLens device to be able to then have like a tablet or an iPhone or Android and be able to still use ARKit or ARCore to be able to see that 3D object right there. So if you don't have access to a lot of different HoloLenses, then you can still use just phones or tablets. And that's one of the things that Julien Bourgogneau had said about competition for Microsoft isn't necessarily like these other VR AR devices. It's more of like the tablets can do quite a good job, especially as we have more proliferation of ARKit and ARCore out there, then sometimes having a window into the world that you hold up with both hands is going to be just as good as having the HoloLens for some people in some use cases. But sometimes just having the hands-free type of experience is something that is way better, especially if you are starting to do all these different gestures and wanting to actually work and interact collaboratively with other people with these 3D spatial objects. And I do think that there is going to eventually need to have a little bit more of a six degree of freedom controller with buttons. And I think that because there may have been a specific focus for Microsoft to be really going after those first line workers, whether it's people who are on like factory floors or construction workers, where they're really embedded into a context where they need to have their hands free. I think in some ways that's really pushing forward the computer vision and these conversational interfaces and it's really innovating and driving that area which I think is super important but I do think for knowledge workers who are doing repetitive tasks over and over again if someone's doing Maya or someone is working on these computer programs that are creative in some ways then having a button is extremely helpful because you don't want to be doing like fatiguing interfaces using your hands when it's not even 100% accurate. When you push a button, you know that button's gonna work. And depending on the task that you're doing, then that is just going to make a world of difference. And so I hope that as the HoloLens gets closer to actually being launched, either that's something that is included into the launch as an option, or that there's more Bluetooth-enabled devices that you're able to at least have access to a button. But ideally, you'd be able to actually have these 60-degree frame controllers as well that could be just as well tracked within these augmented reality experiences. experiences. So when it comes to knowledge, work and collaboration, I think that eventually those could be extremely helpful. I think there's going to be a lot of things you can do with just your hands. And it'll be interesting to see once the production quality of the HoloLens, as well as experiences like spatial, you know, if they're able to get to that level where it's not annoying, where it doesn't always trigger or work when you're using your hands, because the nature of the whole thing is that it has to have a little bit of latency or takes extra processing power to be able to figure all that out. And based upon the trajectory of where everything is going, it's absolutely going to get there. And finally, just to kind of reflect some of the deeper themes that were coming up in this conversation was sort of the more utopic and dystopic potentials, the good and the bad, and to be able to try to keep in mind the negative applications and what is being afforded with all the new amazing potentials and possibilities. And just by diving both into virtuality and artificial intelligence over the last five years, There seems to be like this mixed bag where it's always some weird combination of both and you're never going to be able to have this purely good Applications and purely evil applications. I think it's more of some strange combination and if anything it's more about how the humans are using it to be able to work and collaborate with each other and so I If anything, I find that these new immersive and experiential technologies are representing completely new paradigms for how we make sense of how things work. Just in the conversation, talking to Anand, we're moving away from just linear interfaces for how you have to be very specific in the different types of input that you give into a computer right now, either with the mouse or the keyboard, where it's very constrained with what kind of input it can take. and that we're moving into much more non-linear and open-ended conversational interfaces and embodied body language and gestures where you could be a little bit more fluid in how you express yourself. And so it's maybe allowing you to tap into more of those right brain creative types of functionalities a little bit easier because you don't have to translate what you're trying to do into a linearized interface and just be able to work with these different computers. And so the spatial computing revolution is this whole ways in which the computers are becoming much more natural and intuitive and human and that the computers are reacting to the human behavior rather than the human behavior having to be modulated in order to interface with the computer. So, that's all I have for today, and I just wanted to thank you for listening to the Voices of VR podcast, and if you enjoy the podcast, then please do spread the word, tell your friends, and consider becoming a member of the Patreon. This is a listeners-supported podcast, and so I do rely upon your donations in order to continue to bring you this coverage. So, you can become a member and donate today at patreon.com slash voicesofvr. Thanks for listening.