I interviewed Marcello Typrin Product Director at Reality Labs, at Meta Connect 2024 about the Hyperscape Demo. See more context in the rough transcript below.
This is a listener-supported podcast through the Voices of VR Patreon.
Music: Fatality
Podcast: Play in new window | Download
Rough Transcript
[00:00:05.458] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR Podcast. It's a podcast that looks at the future of spatial computing. You can support the podcast at patreon.com slash voicesofvr. So I had a chance to attend the MetaConnect 2024 conference and do a number of different demos on demo.com. day, including the Quest 3S, some of their AI demos, and some of their latest in the Ray-Ban Meta smart glasses, and also had a chance to get an early look at the Hyperscape demo that had released as an app the next day. It's essentially like a Gaussian splat cloud-streamed demo, where they're using presumably the A4-inch cloud-streaming technology in order to render out these really high-fidelity Gaussian splat captures of these different Artists Studios, as well as one of the old offices for Mark within the context of the campus there at his headquarters. So that's where we're coming on today's episode of the Voices of Your Art podcast. So this interview with Marcelo happened on Tuesday, September 24th, 2024. So with that, let's go ahead and dive right in.
[00:01:11.320] Marcello Typrin: I'm Marcelo Typern, product director at Reality Labs. We build the underlying technology platform that powers the devices and the experiences that Meta builds, and mixed reality, augmented reality, those kinds of experiences, including Hyperscape, which is what you were looking at just now.
[00:01:26.989] Kent Bye: Maybe you could give a bit more context as to your background and your journey into this space.
[00:01:32.079] Marcello Typrin: Yeah, so primarily in product management, I would say probably two and a half decades in this space. A lot of my work has been in zero to one. A lot of background in speech recognition. I led the work and launched the first Echo product at Amazon. Now I'm here at Reality Labs at Meta. Excited about the work we do here. A lot of this is also cutting edge zero to one work. I love places where you're trying to change the world through technology, but also focused on making user experiences better, really making them really resonate with humans. It's not technology for the sake of technology. It's technology that really improves the way we live, the experiences we have, and just how we live our daily lives.
[00:02:16.425] Kent Bye: Great. Maybe you could give a bit more context for Hyperscape and what your remit is for what you're exploring here.
[00:02:21.638] Marcello Typrin: Yeah, the premise for Hyperscape is that we believe that places from the physical world matter. They carry cultural meaning, symbolic, personal meaning, and the ability to actually travel into those places we think is deeply meaningful for people. And it's not just about, hey, I want to go check out You know, some of the things you saw today, like a studio or a museum, but I want to be able to go to someone's home. When someone invites you into their home, that's a deeply profound gesture. Come on over. I'm inviting you over to hang out. So that's not an easy thing to do. Getting on a plane, getting in a car takes a lot of time, cost, energy. And a 2D video call doesn't cut it either because it doesn't feel like you're there with them. So imagine being able to go, if you invite me over to your home, I could sit down on the couch with you or across the kitchen table from you. I could appreciate what your life is like by just seeing the surroundings that you live in. And then I could see you over time as an avatar. So it'll feel like I'm actually visiting you. And I think that's super important because that'll allow us to form a tighter connection, a stronger bond. And we think that's super important.
[00:03:33.353] Kent Bye: Well, I know there's been a lot of work over the years for different types of volumetric capture using neural networks with neural radiance fields. And now we have Gaussian splats. And so maybe you could talk a bit about settling upon Gaussian splats as the technology to really push forward the cutting edge for volumetric capture.
[00:03:49.392] Marcello Typrin: Yeah, we've tried all those things, and we've seen a lot more success in using Gaussian splats. And when I talk about success, we talk about, does it feel like you're actually there? And we've been able to get that visceral reaction. It's like, oh my God, I feel like I'm actually there. I feel like I actually want to sit down on that blue upholstered couch, or I need to watch out for that table, even though it doesn't physically exist in my own environment. We've had much greater success and building those kinds of experiences and getting those kinds of reactions using Gaussian splats and any other technology that we've been experimenting with. So we're heading down the Gaussian splat path.
[00:04:28.533] Kent Bye: Yeah, so in talking to the developers, it sounds like you've developed your own capture app with a Hyperscape app on a phone to be able to walk around for, say, 60 or 90 minutes to gather enough data. So usually with photogrammetry, you're essentially taking photos, and then that gets processed to be turned into a mesh. But with Gaussian Splats, you're basically doing another type of processing. And I know that there's been a lot of excitement and promise for Gaussian Splats to have this kind of new rendering pipeline system. And as far as I understand, this is being cloud rendered. So it's not being actually rendered on the headset. So yeah, maybe just describe a little bit about, yes, you're able to capture all the data. It may be efficient and small enough. But in order to actually render it, you need to actually have enough power in the GPUs that you have to kind of offload that into cloud rendering. So maybe you could just kind of describe a bit of that workflow.
[00:05:14.980] Marcello Typrin: Yeah, that's right. I think you hit on some of the key points. If you put all that GPU, all that compute horsepower on the headset, it becomes an unwieldy headset. So we're being very thoughtful, very deliberate about where we run these experiences. And in some ways, it makes a lot of sense to put them on the headset. But that adds cost, it increases the product design impact, the physical, or I guess in this case, the impact that it has around your head. It just becomes a heavier device, less ergonomic, less comfortable. So we think a lot about how do we move all that compute horsepower to a place that doesn't impact the headset itself. PC VR is an example from the past, but we like the cloud because that's ubiquitous. Just about everybody has access to a cloud service, internet access. And we think by putting it there, we can get all the GPU benefits without impacting the headset in terms of comfort and ergonomics.
[00:06:05.458] Kent Bye: Yeah, as I was going through this demo, I was able to move around the space and look around. And I didn't see any perceived latency. And so are you able to measure what's the difference if you were rendering this locally in, say, a PC VR versus rendering on the cloud? What type of metrics do you take to see, OK, it's below a certain threshold that might be imperceptible?
[00:06:27.076] Marcello Typrin: That's right. We do have thresholds when you launch this application, which will be rolling out across the United States in the coming days and weeks. We do run network checks. We want you to have a good experience. And we do have a threshold. The way we test that threshold is empirically. We say a certain upload and download, uplink and downlink speed. We get a good experience. People feel comfortable. We don't get jitter. We don't get lagginess and latency. Below that threshold, we tell the user, just like any other service provider providing some kind of an experience, whether it's video or otherwise, it's like, hey, your network conditions are degrading. You should just know that as you move forward. So we do check. And what we're finding is that a huge chunk of the population we have in our footprint in terms of data centers and where we're operating on the edge networks is actually delivering a pretty good experience for most of our users.
[00:07:15.435] Kent Bye: Well, I know that even with the streaming from PC VR to a Quest, you have the Air Link. And so it's essentially getting translated into a video format. Is that the same thing, where you're basically rendering it with the neural rendering on the cloud, but then essentially sending a video stream down? Is that what you're seeing?
[00:07:31.643] Marcello Typrin: MARK MANDELBACHER- On a frame-by-frame basis, every frame that we render at 72 or 90 frames per second, we send down to the headset. So it's kind of the same thing. Like, every single frame gets rendered, and then we push it down. the link from the edge network to your headset.
[00:07:45.731] Kent Bye: MARK MANDELMANN- And so I guess there's a certain amount of like, when I turn my head, then it would need a new frame. If I'm just looking forward, does it just detect that there's not any changes, like essentially like H.264 smart compression so that it doesn't need to actually send the necessary data?
[00:08:01.448] Marcello Typrin: That's right. We do some predictions in terms of where your head is moving in real time. And we know to stream down just that aspect or that chunk of the experience that needs to be rendered in real time. So we're not going to send down the entire thing. That's not a smart way to do this. But we do predict where your head's going to be a few milliseconds from now. And we begin teeing up that content and rendering it and then streaming it down to your headset so it's ready there. to meet you in your new pose or your new position as you look for it. And then it's not only going to be high fidelity, but you won't experience any latency or lag as you're turning. It'll be there ready for you.
[00:08:33.746] Kent Bye: And does that mean that there's also going to be like a Hyperscape-associated app for people to actually capture some of these captures?
[00:08:39.668] Marcello Typrin: Yeah. What you saw today is six curated experiences that we worked with some creators who are excited about the work we're doing. We believe creators, and we've gotten this feedback already, it's like, when are we going to have the capture tool? When are we going to be able to upload it? That's coming. And we have a long roadmap of things that we want to do, which includes one of the first things is like, how do we give creators the ability to do what you saw today and create that kind of thing for themselves?
[00:09:05.413] Kent Bye: OK, and maybe describe a little bit of the unique affordances of Gaussian splats in terms of when you do photogrammetry, you create a mesh, and then you have to create a texture that is pretty static, whereas it feels like there's a lot more dynamic reflections. And it just feels like it's able to handle hair and other aspects a little bit better. So how do you start to think about the unique affordances of Gaussian splats and what you can do in them that you can't do in, say, photogrammetry?
[00:09:30.498] Marcello Typrin: Right, so you hit on some of the interesting things. Gaussian splats gives you the fine details, but it doesn't have the structure. So one of the technologies that we develop is to look at the scene, the representation that's given you from Gaussian splats, and then create an understanding of that. Where are the edges of the space? What are the semantics of the space? Where is the chair? And then we put meshes, transparent meshes, on top of them, so textured but transparent, so you can still see the Gaussian splat, But then once you have the mesh, you now have the affordance that all creators in 3D understand how to use. I'm going to take a ball, hypothetically speaking, a virtual one, and throw it against something that is represented as a Gaussian splat with a mesh on it. The mesh allows the ball to understand that this is a surface. It must now bounce. So we're talking about bringing meshes and the more traditional methods of creating 3D experiences and overlaying them on top of the splats so that you can have that interactivity and make the splats a more familiar method of building and creating in 3D.
[00:10:37.014] Kent Bye: Have you experimented with any type of multiplayer types of experiences with these Gaussian spots? What I just saw was a solo experience. I'm thinking of something like Horizon Worlds, if you imagine in a future where you'd be able to render out a Gaussian spot and have multiple people in there at the same time.
[00:10:52.162] Marcello Typrin: We've done a lot of that kind of experimentation internally. Our vision is that in time, you'd be able to scan your own home, invite your friends over, and your friends don't have to live next door to you physically. They could be halfway around the world. And that's really what this is about. It's about creating experiences in places that are meaningful to you. And we talked about this a minute ago with the people who mean something to you. And we think that's a pretty powerful concept. Now, you mentioned metaverse and horizon worlds. We also like that idea. What we really like is the idea of bringing places that are magical and fantastical and only exist in a digital world with places that are existent in the physical world and bringing them next to each other and creating experiences where you can move from one to another in a way that feels very comfortable and natural.
[00:11:39.901] Kent Bye: I know that in talking to the developer, there's a very specific app to where you're doing the capture, but I'm wondering if you've also looked at like looking at archival footage or archival like photogrammetry, if you're looking at sort of things in the past, if that's something you also have processes to take, like, let's say an existing video and then translate it into a splat.
[00:11:56.547] Marcello Typrin: We're doing a lot of research in how do you take images that exist, and we talk about maybe sparse images. You know, back in the day, even when I was growing up, I didn't take a lot of photos of my bedroom, right? But there are a few. And it's interesting to think about how techniques using generative AI could hallucinate the missing bits and create not just... a two-dimensional kind of wallpaper of a space but also how do you create depth the z dimension so we're thinking a lot about from an applied research point of view is how to take these generative techniques and create and hallucinate places that feel real based on images that are actually of places that exist whether it's your bedroom Or not long ago, the Notre Dame Cathedral had a fire incident. Imagine being able to recreate that with images that exist in 2D and create a 3D representation of that.
[00:12:51.559] Kent Bye: And have you experimented with either shaders or generative AI style transfer to be able to give what ends up being a very photorealistic look and feel, but if you're able to add a layer of digital processing to either do shaders or other type of generative AI style transfer?
[00:13:07.475] Marcello Typrin: Yeah, I think you're hitting on something where we can use these captures as a baseline, as a starting point for creating brand new experiences. So we talked about my bedroom. I could use my bedroom as is, but I could also imagine how to expand or augment that, how to introduce maybe the natural lighting in my current situation and make that part of the Gaussian splat experience. So we do think a lot about how to use Gaussian splats in the captures as kind of a baseline and how to use generative approaches to build on that. And it's an interesting way to think about creating assets takes a lot of time and energy. Creating a scan of a space is pretty straightforward. If that can be your starting point for brand new experiences and brand new worlds, we think that's a great thing to do.
[00:13:52.933] Kent Bye: And what do you want to experience within these types of volumetric captures?
[00:13:56.774] Marcello Typrin: I want to visit my parents. in their home. They don't live nearby. That's what I want to do.
[00:14:05.857] Kent Bye: Great. And finally, what do you think the ultimate potential of these types of spatial computing experiences and devices might be and what they might be able to enable?
[00:14:16.229] Marcello Typrin: We touched on it a few times, and I think it's really about bringing people together. I believe that today we have 2D video experiences, but there's something still missing. There's a sense of a space that's being shared with people. And I think in our brains, that creates a much stronger sense of actually being with someone and connecting with someone than a flat 2D experience. It lets us engage in shared activities, which I can't really do in a 2D experience. And I think those things of feeling like we're in the same space, engaging in a shared activity that feels like a real one, That, I think, we don't really think much about it, but I think if we really kind of examined how we feel when we do those things, it creates stronger connections, creates stronger memories. And I think that's ultimately what we're all about.
[00:15:07.931] Kent Bye: Is there anything else that's left unsaid? Any final thoughts that you'd like to share to the broader immersive community?
[00:15:13.376] Marcello Typrin: No, I appreciate the questions. I think they were fantastic conversations. I especially appreciated, like, why are we doing this? It's not just the technology. It's about bringing people together in a way that really matters.
[00:15:24.608] Kent Bye: Awesome. Well, I really enjoyed the demo. I feel like the Gaussian spots, I was really excited just to hear this kind of new rendering pipeline that is able to be efficient enough to have high quality scans. And yeah, I just look forward kind of moving towards being able to eventually run these locally on headsets without the cloud rendering. But in the interim, I think it's a pretty solid experience to be able to capture these different scenes and to be able to And go places without having to go anywhere, if you know what I mean. So, yeah. Thanks again for joining me here today to help break it all down. So, thank you.
[00:15:55.087] Marcello Typrin: Thank you so much.
[00:15:56.411] Kent Bye: Thanks again for listening to the Voices of VR podcast, and I would like to invite you to join me on my Patreon. I've been doing the Voices of VR for over 10 years, and it's always been a little bit more of like a weird art project. I think of myself as like a knowledge artist, so I'm much more of an artist than a business person. But at the end of the day, I need to make this more of a sustainable venture. Just $5 or $10 a month would make a really big difference. I'm trying to reach $2,000 a month or $3,000 a month right now. I'm at $1,000 a month, which means that's my primary income. And I just need to get it to a sustainable level just to even continue this oral history art project that I've been doing for the last decade. And if you find value in it, then please do consider joining me on the Patreon at patreon.com slash voices of VR. Thanks for listening.