Alexander Porter & James George want to democratize volumetric capture with Depthkit. Depthkit combines depth sensor devices like the Kinect or RealSense cameras with high-end digital SLR cameras. They’re able to create a 3D mesh from the captured depth sensor data, and then fuse it together with the DSLR camera footage, which becomes the texture. This is an extremely affordable approach for independent filmmakers and VR experience creators to combine commercial off-the-shelf devices for volumetric capture in immersive narratives.
I had a chance to catch up with Alex in July where he talked about how Depthkit came about through a desire to merge photographic thinking and documentary work with computer science, as well as some of the early experiments with the CLOUDS documentary and Blackout VR
LISTEN TO THE VOICES OF VR PODCAST
Donate to the Fatality & Summer Trip
Rough Transcript
[00:00:05.452] Kent Bye: The Voices of VR Podcast. My name is Kent Bye and welcome to the Voices of VR Podcast. So volumetric capture of live action acting is something that is a really hot topic within Hollywood as well as for people who are wanting to create independent content within VR. So there's a lot of new digital light field technologies and photogrammetry techniques that are out there, but yet they're not really necessarily accessible for an independent creator. And so on today's episode I have Alexander Porter who is the co-founder of Depthkit which essentially uses a Kinect in association with a digital SLR so that you capture the depth information from the Kinect and then you're able to take the textures from the digital SLR camera and project it onto that mesh. And so you have high resolution, high color, and able to essentially have this DIY solution to do volumetric capture. And then at some point, the goal is to be able to do that in real time within a virtual environment. So in today's episode, I talked to Alexander about the evolution of this approach of doing a DIY volumetric capture with DepthKit and some of the early experiments and findings that they've been discovering in that process. So, that's what we'll be covering on today's episode of the Voices of VR podcast. But first, a quick word from our sponsor. This is a paid sponsored ad by the Intel Core i7 processor. VR really forced me to buy my first high-end gaming PC, and so Intel asked me to come talk about my process. So, my philosophy was to get the absolute best parts on everything, because I really don't want to have to worry about replacing components once the second gen headsets come out, and the VR min specs will inevitably go up at some point. So I did rigorous research online, looked at all the benchmarks, online reviews, and what I found was that the best CPU was the Intel Core i7 processor. But don't take my word for it, go do your own research, and I think what you'll find is that the i7 really is the best option that's out there. So this interview with Alexander happened in the office of Depthkit and Scatter in New York City on July 16th. So with that, let's go ahead and dive right in.
[00:02:18.844] Alexander Porter: My name is Alexander Porter, and I'm a co-founder of a company called Depthkit, and then meanwhile doing experiments and projects in VR with a company called Scatter. And my background is actually in photography originally, but it sort of emerged into reality capture and game design and what we're calling volumetric capture.
[00:02:37.940] Kent Bye: So tell me a bit about how Depthkit began.
[00:02:40.882] Alexander Porter: First of all, so Depthkit is a volumetric capture tool that we've been working on since 2011, which is a tool that we're seeing as an exciting way of capturing people for virtual reality and augmented reality. It started for me as part of a collaboration with James, who's my partner in both companies I mentioned. where I had a career in traditional photography and I had some kind of existential crisis that was totally driven by reading media theory and frustration with just the way things were going. I'd read this one book called Towards the Philosophy of Photography by a guy named Willem Flusser, and he was talking about the notion that a given tool can be very constraining, like a camera tells you how to use it. And around that time, James and I were friends and we were looking for ways to work together. James's background is in art and in computer science, and we were thinking of ways to merge photographic thinking and documentary work with computer science, and this was around the time that the first Kinect came out. We started to experiment around that. And so we did one first project where we combined a digital camera with the VersaConnect and rigged it up with a motorcycle battery and sort of walked out into the train system. And at the time it was a way to try to visualize advanced imaging systems that were starting to get rolled out in the metro system in New York, partly in response to the London bombings. It was fueled by concerns about safety. But the promise was that it was this futuristic capture technique that would allow computers to identify the sort of proverbial suspicious package and then identify and flag it to the authorities. And so our thinking was at that time, what would it be like to play creatively with a camera like that? And so using the Kinect combined with the camera was our way to start to at least aesthetically deal with that question. So we made a project around that and this was very much in the way that we were working. We were sort of critical design projects and that kind of thing. But those explorations expanded into a tool because as part of that collaboration we needed to expose a GUI and create a GUI and create an interface and create ways to use this camera that we conceived of. And so an extension of that, we released some of these tools to creative people in the form of something called the RGB Toolkit. And there was a lot of interest around that that had to do with visual effects experiments. And a lot of people were using it to tell the kinds of stories that we were telling about digital spaces and exploring dance and physicality and 3D spaces. So much of it had to do with anxieties about the future or excitement about the future or somewhere in between.
[00:05:14.223] Kent Bye: So it seems like what you're doing is using the depth insert camera of the Kinect and then using a digital SLR camera and then somehow taking that depth data, creating a mesh or point cloud, and then taking the image from the digital SLR camera and then projecting on. So maybe you could talk about what that gives you.
[00:05:34.075] Alexander Porter: Yeah, well put, very well put. Our thinking is that the Kinect emerged out of robotics and then the question of where given sort of rover is in a space, right? And out of that, it became this sort of interaction device. But it has the benefit of creating meshes at a frame rate that's close to a camera. And so the core philosophy is that by embracing the power of that tool where the computation of the geometry is happening actually on hardware and as a result very quick. And then combining that with color information from a cinematic oriented camera that has nice lens and all the kind of contrast and color that quality cameras have, that you're able to sort of overcome the limitations of the Kinect or the depth sensor you use. And you could say metaphorically overcome the limitations of the cinematic camera, which is that it's not a volumetric capture.
[00:06:27.133] Kent Bye: And so what are some of the big projects that the Depth Kit has been used in?
[00:06:32.014] Alexander Porter: So we used it for a host of different ones. There was this interesting moment where we would encounter a lot of different aesthetic communities because people would encounter the tool and the aesthetics at different times. And so we did a wave of rap videos, and we did a wave of fashion videos. And all of the communities that are obsessed with aesthetic novelty, we sort of did this cruise through all of them, which was very fun and very interesting. I made my first rap video. But I think most notably it was used for a project called Clouds, an interactive documentary. And interestingly, this sort of question of making tools or making projects emerged for us around that time where we were debating at the time of doing the Kickstarter, is this Kickstarter about funding the tool, or is it about funding the tool plus a project, or is it about funding the project alone? And on some level, I was sort of outvoted in the room and the project clouds emerged. And so I would say that's the sort of most notable early project where it was conceived sort of pre-VR and then it kind of matured by virtue of virtual reality. And then more recently, we're working on a project called Blackout, which is in progress, where we've extended our thinking to embracing multiple cameras and capturing people in 360 degrees. And the result then is you get volumetric captures, which are quite easy to capture, I'd say, and quite affordable to capture. And the result then is you have a human being that plays much like a video, where you can put them in Unity or put them in Unreal and press play on them. And the action that was captured is then presented.
[00:08:03.415] Kent Bye: The thing that I think is unique about the Depth Kit is that the geometry that you're getting back is stylized and kind of like this glitchy, low-fidelity, low-poly, almost noisy aesthetic. And I think that there's a happy balance of not being too close to the Uncanny Valley, but having it stylized enough, but not having it too noisy or too low fidelity. And so maybe you could talk about that kind of aesthetic choice and that challenge of the balance between giving some stylization, but not too bad, and then also the risks of going down the uncanny valley.
[00:08:38.916] Alexander Porter: Yeah, it's a question we think about all the time. So there's sort of a lot to it. I think our central point is that the ultimate goal is not necessarily verisimilitude, and that different projects sort of demand different aesthetics. And many of the projects we made to date, we sort of would embrace the inherent technical aesthetics of the approach we're taking, which is Maybe a compromised one like there are alternatives to this that are very exciting that use many many cameras and multi-view stereo, etc so the first projects that we did just embrace the noise and it's just say this is the way it is and We would be talking about our projects or presenting them and there would be always some guy in the back and he's like Why don't you make it perfect? you know, we're like, why don't you fill in the back? And our approach was, because that's not what we're doing right now. And because the particular project that we have actually agreed with the technique, which is a dodge, but I think it's an important one. And I think thinking that way, at least in the early days of virtual reality is very important, where you have to some degree, concede the different strengths of the tools you're using, and you have to adapt your thinking to some degree to those tools. And so our excitement now as we start to take this from a tool that emerged out of an art practice and give it to a more broad group of people who are now very excited about virtual reality, is extending that aesthetic range. Previously it was sort of digital glitchy aesthetics only and we are starting to push towards the Uncanny Valley, kind of admittedly, but then meanwhile we're excited about creating different filters and different effects and the ability to combine them in creative ways. such that you can expand the expressive range of it and people can holistically build a given world with a given kind of lighting logic and a given story logic for which the person fits. And that may not be a perfect looking person. It may be blurry or filtered in different ways, but we are also feeling the pressure to actually put our money where our mouth is or something like step towards at least getting rid of the noise and getting rid of edges and spikes and that kind of thing. And much of that we've succeeded with. But part of our thinking is the actual core philosophy is that we believe our tools will enable people who couldn't otherwise afford to, to be part of this ongoing dialogue. And we feel that volumetric capture and volumetric playback is an important part of that dialogue. And so by doing this, even if it's as yet imperfect, we know that many, many more people will actually be able to take part.
[00:11:05.503] Kent Bye: Yeah, the way that I conceptualize it is that looking at the graph of the Uncanny Valley, you have something that's an abstracted object that doesn't look like a human. And the more that you make that robot more and more look like a human, you get to that point where it falls off into the Uncanny Valley. And it looks like whenever you try to make a robot look too much like a human or whenever you have kind of like this zombie-like feeling, Part of that is because it looks dead. It looks like you're expecting it to have all sorts of different social cues and social behaviors and their body language and all of that. So it's actually a huge jump to be able to go from something that looks like a zombie or looks like two creepy humanoid robot to what we recognize as being a human. And so because of that, I think that the technologies that use photogrammetry that kind of get up to that to me, kind of falls into that Uncanny Valley. It's a little bit like striving towards that crossing the chasm of the Uncanny Valley. Like, they're not there yet. And I have to wonder, with volumetric capturing in particular, that even if you did get the aesthetics right, if you're in a VR experience with them and they're not sort of reacting to you in a certain way, then that in some ways, that lack of social interaction and responsiveness to you it can break presence and you can like be taken out of the experience of your mind is like kind of telling itself okay no this is not real this is not real this is not real and i think the advantage was something with the depth kit is that you can actually go the opposite direction a little bit more you know, kind of create it in more of a fantasy world. And by doing that, my hypothesis is that you would actually have more presence. And I think this is something that a lot of investors, a lot of people think they want striving towards photorealism, but they actually don't realize the trade-offs. And just talking to different researchers, and one's told me that the uncanny valley is in-dimensional. So in other words, like, the more that you strive towards that photorealism you have to go with having photorealistic haptics and sound and like it starts to create this disconnect between having something that looks photoreal but doesn't sound or feel like anywhere near that and so again that's another break in your mind that creates this disconnect and break in presence. so there's a lot of different trade-offs a lot of things that I think aren't naturally intuitive because you know I think a lot of people think they want this but you know it's actually like in terms of cultivating a sense of presence you can I think actually do a lot more by going that stylized route but also rather than spend thousands and thousands of dollars of this whole photogrammetry rig, you can start to do some rapid prototyping and just starting to play with this volumetric capture and storytelling using something that's beyond just a 2D and human capture and dealing with all the issues of creating like animated characters and stuff. You can actually get sort of like that human feeling by doing that direct capture.
[00:14:00.767] Alexander Porter: Yeah, yeah, precisely. And our hope with this is almost to do kind of rapid prototyping at the scale of communities. So there's many, many people. My feeling is that the answers to these questions will come partly out of good design thinking and iteration, but also audiences will sort of meet us in the middle, right? People will start to embrace things that they wouldn't before in the same way that, you know, we cut our films way faster than we used to. And people are totally okay with that. And even in some cases, they get very bored when they watch old films. Our thinking is that the answer sits somewhere in between. And also, I mean, in the near term, we also will create projects like Blackout that are in social scenarios that to some degree can justify odd behavior. So the behavior of not acknowledging sort of an actor or a character as present. It is weird in a game engine. It's also weird on the train. And so our thinking is to create contexts where we can, to some degree, normalize this, at least in the near term. And my hope is that there is sort of, to some degree, an emergent genre that embraces these kinds of things, where you might, it's almost like maybe being in a play, where you are enabled to walk up on the stage, or obviously the sort of tired, yet wonderful reference of things like Sleep No More. And just one comment on that is like speaking of zombies, I think part of the challenge is also setting up the social dynamics in the experience that you're creating such that people are not shocked if someone doesn't acknowledge their presence or in the same way that you'll find yourself in social situations out in the world where given the context, given the framing, say you walk into a theater, you know that you're not supposed to walk on the stage. There's a certain way to behave. And I remember having the first time I went to sleep no more. I remember watching the audience members and just feeling like they were zombies because their behavior felt so deeply strange to me. So I guess all that is sort of a long way of saying that I think creativity design will help us a lot. and then also designing social context where this can be justified. And eventually, given a little bit of training and thought, we'll get to a point where audiences are also enabled to participate, where we sort of say to them, look, this is this type of experience. Enjoy it as such. And my feeling is that oftentimes when you talk about breaking presence, it's usually something that you may not be aware of. that is out of place and it's not necessarily always about the kind of the capabilities of the technology. Oftentimes it's actually about design choices and when you talk about maybe the sound is a little bit off but you will suddenly evaluate maybe the capture because of that. Our thinking is to create these holistic experiences where everything's sort of all wrapped together and the logic is coherent. And my sort of assertion is that if you do that, and if you do that well, people can go there with you. People have enough sort of neuroplasticity and creativity and imagination that they'll actually go there with you.
[00:16:47.819] Kent Bye: What's the sort of model for selling this depth kit? Do you plan on selling a 3D mount for the Kinect sensor and then to be able to put the digital SLR camera on top of it? Or what's sort of the model that you're going to be moving towards?
[00:17:01.707] Alexander Porter: Yeah, good question. So we're in right now a closed beta program where we're evaluating, as we transition from being sort of artists who release a tool to artists who actually support a community and ask them what they want and then make tools for them. It sounds subtle, but it's actually a major shift. We're running this closed beta where we are in very close communication with a group of people who are paying for it, and they have all applied. And we then have approved a group of people, and we're trying to be very rigorous about casting as wide a net as possible with those people without having too many people present in the beta program. And then we're going through a design process right now where we're finding exactly what the tool needs to be. to account for the different communities that want to use these tools. So high-end studios are excited about using it, but also people who are working as individuals want to use it, and they want to use it quickly and affordably. So we're sort of dealing with those questions. In terms of the way cameras are mounted and that kind of thing, we have a new and better mount, and currently the most common sensor we're using is the Kinect version 2. We do support other sensors, but it's like wildly affordable for what it is. And there's a big gulf between that and the next best sensor in terms of price. And so we're sort of keeping ourselves pegged to the goal of making this affordable and accessible.
[00:18:19.521] Kent Bye: So back when I was working on the Oculus Mobile Game Jam, we did some work with FaceShift before it got bought by Apple. And so FaceShift, you basically use a depth sensor camera, you act, and then it captures sort of all these different numbers that represent how your face is moving. Well, the thing that I think a lot of motion capture has to deal with is a lot of the noise that then has to be cleaned up by hand. And it's sort of a laborious process to make it kind of fine tune it. So I imagine that anytime you're kind of doing these types of depth sensor camera stuff, you're getting like noisy data. And so there's a kind of pruning process. And so tell me a bit about that process of pruning. What kind of tools do you use? How long it takes to kind of clean up the capture before it's ready for prime time?
[00:19:06.903] Alexander Porter: Sure. Yeah, yeah, that's a great question. I loved FaceShift. I thought it was fantastic. We teach a class called Computational Portraiture at NYU ITP and always be sort of guiding people to use FaceShift. And we encourage people to use all kinds of different tools. But so in terms of the question of noise, that's inherent to any kind of capture, photographic or otherwise. And in terms of the way that someone would use the tool now, you would set up an external camera plus the Kinect. And in the near future, we'll be just supporting onboard color cameras as well. But you calibrate an external color camera. Oftentimes, people are using 4K cameras now so that you get incredible resolution in the video. You calibrate those two cameras together, and then you have this kind of hybrid camera, sort of RGBD camera, which is red, green, blue, and depth. And you use that to film, and you film with both simultaneously. Then you synchronize those two streams, the color and the depth. And that allows us to kind of treat it as a unified whole. And you now have this dimensional footage, basically. And so in the tool, you then can create, you can sort of segment, you can remove the background sides and that kind of thing. And we're in the process of releasing a series of tools that allow you to use the resolution, the high resolution of the color camera to reduce some of that noise, to reduce the kind of flickering on the edges, also internally. And then the net result of that is that you actually have a depth map that's the resolution of your color camera and so you obviously can't invent data where there's no data but we have our kind of core philosophy is this concept of sensor fusion where you take multiple sensors and you take the best of each and using this kind of exterior color camera we can trust if you think about pixels as x and y and the connect offering z right we can deeply trust a color camera because companies like Canon and all the rest have done an immaculate job of making these cameras. And then we sort of trust the depth sensor for Z. And it allows us to reduce a lot of that noise. It's not perfect. It's far from perfect.
[00:21:04.980] Kent Bye: Yeah, I think just from aesthetic choice, I think either the time that might be required to reduce the noise or just being comfortable with the noise, I think the upside to using a tool like that is that it's affordable and people could use it and start prototyping. The thing I would wonder though is whether or not it would be ready for like a final production. If people want to kind of experiment with it to get a sense of it, but whether or not the quality would be like good enough to have a coherent aesthetic. And I think that's where the design part would come in where you'd almost have to have this fusion between the filmmaking and the computer generated and kind of actually dialing down the photorealism for the film and video and adding a little bit more of a specific design aesthetic to make it coherent across all the different actors, but also the overall environment. So for me, that's sort of like the challenge of people making that aesthetic decision.
[00:21:59.109] Alexander Porter: Yeah, yeah, exactly. And the challenge we posed it, we always dodged that challenge by just saying, whatever, this is our project, you know, we're going to embrace it in the ways we choose. But our feeling now is that as it becomes sort of less ours, and it kind of belongs to a community and based on their needs and appetites, their aesthetics are very disparate. And our sense is that we actually have to kind of boldly homestead in the uncanny valley a little bit so that we have a central point from which people can create their aesthetics. One other thing I'd love to mention is that the approaches that we're taking are all inherently real-time. So we have, since we come from this background of creating this visual effects tool, the metaphors have been this kind of offline thing where you capture, you process, and then you display back in Unity. or wherever else. But all of the techniques we're using actually enables us to do real-time streaming of human beings as well. And so I'm very interested in this, where you actually get opportunities to have real actors who are actually responding to your presence where you are in a room, et cetera, and potentially give an experience that has a combination of actors and non-actors, or canned captures versus actual actors, so that you can kind of confuse people a little bit about that and create social scenarios that are more complex.
[00:23:12.782] Kent Bye: yeah right exactly sort of having like uh pre-recorded versus live actors right so kind of what we were talking about just a few moments ago that charlie hughes of university of central florida where they have like a classroom of like 20 people and then there's five of the characters that are being acted through the Wizard of Oz actor on the other side. And so it sounds like it's kind of what you're saying is that having a scene where you have these pre-recorded people, but some of them are actually live and real-time interaction. I think that's a really interesting idea. And I think that's actually a very strong use case that I think is unique from any other thing that's out there.
[00:23:49.455] Alexander Porter: Yeah, that's our thinking. And this particular feature is part of why we embrace these sensors now. We know from first-hand experience that there are a lot of very compelling high-resolution sensors that are going to emerge soon. And we are unwilling to sacrifice the sort of ability to do this real-time behavior. Because fundamentally, the work we do and the work we've done is in this real-time vein. The joke I've always had about doing these depth kit renders has been it's almost like There's this strange, awkward circle you do where you capture using a tool that's designed for interaction in real time, and then you take it into all of these visual effects type tools, and then it ends up just as a video. It has always been an inherent disappointment, and my analogy was like, It's almost as if you institute an amazing math and science program, and inspire the youth, and then build a space program, and then send a couple of them into space, send them to the moon. They go there, they pick up a moon rock, they bring it back to Earth, and then they show you that rock, and they say, look, a moon rock. And then someone on Earth is just like, no, that's a rock. It's like this elaborate round trip where the result is actually kind of just degraded and common. And so when you do this kind of elaborate round trip using these tools and you end up rendering out a video, there's something just disappointing about that or there has been to date. And then suddenly the emergence of HMDs and virtual reality as kind of normal behavior has created a moment where having these sort of live actors present in a space actually kind of justifies a lot of our choices and our choices not to use many, many cameras or anything like that.
[00:25:25.727] Kent Bye: Yeah. And I'm curious because you have a photography background and think about media at this high level, if you've thought about like maybe from a perspective of Marshall McLuhan or other kind of like implications of what this new tool will enable in terms of having depth to the images that makes it different than what we've had before.
[00:25:47.566] Alexander Porter: Yeah, yeah, yeah. McLuhan's a funny one because he's sort of accidentally insightful because he said so many things and so many snippets and people you kind of retroactively grab onto a couple and you guess that works. I think in terms of pegging some thinking to him, he often talked about the notion that a new medium will often contain an old one. It'll sort of be like an envelope for an old medium. And my thinking with virtual reality is like, we have to some degree naively done that by making spherical video. And it's not a huge, it's not a big judgment call. But industries and creative people are willing to say, I'm willing to use the tools that I know, and I'm talking about cinematic communities, I mean, I'm willing to use tools I know, cameras plus, you know, some basic editing, maybe a little bit of new kind of thing in order to make virtual reality experiences, but it's a strange translation from my perspective. but I don't want to sacrifice a lot of the thinking and the insight from a century of incredible filmmaking and and also genuinely interesting and compelling tech in the form of cameras. I don't want to sacrifice that and my thinking is that doing a kind of volumetric capture is a way to actually embrace that and for me I find that to be exciting because I'm able to play in this domain visually I'm still moving lights around, I'm still dealing with cameras, but the tools that I'm using are very different. Fundamentally, I'm giving up the tyranny of perspective and saying, It's not my problem exactly where you're seeing this from, in favor of creating social scenarios and situations and kind of laying things out spatially. And for me, that's deeply exciting. This all came after a bit of a crisis of faith with photography, where I was desperate to find new ways to use cameras to make them at least relevant for myself and my creative practice.
[00:27:29.549] Kent Bye: Yeah, Baobab Studios' Eric Darnell told me that the way that he thinks the difference between film and VR is that film is kind of like a director telling their singular perspective of their experience. Whereas within a VR experience, it's more about creating an experience so that the user can generate their own stories from and when I think about that I think about something like sleep no more and something that would be kind of impossible to really carry off within a film context because You're really giving people the local agency to be able to explore a hundred different rooms so you're basically giving them a hundred different rooms and 21 parallel narratives that are happening at the same time and just something that doesn't really translate to a linear medium it's just like in parallel and you have the choice to be able to figure out what to pay attention to and that's sort of an extreme example but in blackout you have sort of like this subway train and you're able to kind of tune into different people's thoughts and so you kind of have that similar type of parallel option to be able to roam around and listen but yet there's also as a director aesthetic choice deciding whether or not to just kind of create a linear experience by triggering it based upon the movement and so you know sort of this question is whether or not you just have it running in parallel or whether or not you kind of have this sequential linear experience of it so just have some of your thoughts and kind of the debates and tensions that you're actively trying to figure out.
[00:28:59.148] Alexander Porter: Yeah, I mean, yes, is sort of the answer to that. We're thinking about this all the time. Our position is that it's all about hybrids and some hybrid between this sort of horizontal mode that I think of just as like a cocktail party or something where everyone is sort of running in parallel and you can kind of wander around and tap in or tap out. we're trying to create something that sits in between that and a more linear experience. And the metaphor, and it's sort of a bad one, but the metaphor I've been using is, it's a bit like hosting a very small dinner party at your house, where if I turn the lights off, light some candles and serve dinner, like the entire context of the environment, the social environment totally shifts. Whereas if I sort of suddenly turn the lights on or, you know, wander out, like there's all these cues that you can create in the environment that take a sort of horizontal experience through different phases. And so this has been at least, and I'm sure this will sound naive in the future, but where we are with Blackout right now is making an effort such that all of the cues in the story, what people are saying, the lighting, the sound, etc., hopefully your experience as a viewer, that you are sort of brought through emotional states throughout the story. You kind of walk on the train, a little quizzical, just kind of dealing with VR at all in general, and then the train breaks down and there's this tone of suspicion and concern, which our hope is that We sort of play that out for a few beats and then transition the environment to this sort of exploratory mode where you have this ability to kind of tap into people's minds. And I'm always reticent to make a case for something before we've tried it at length and we haven't properly. But I can say at least that that's our kind of thinking at this time. And I would say the kind of just horizontal sort of cocktail party experience is not very interesting because some degree of control, kind of directorial control or creative control over an experience is important in order to just make it powerful.
[00:30:57.067] Kent Bye: Yeah, yeah. And I think there's a big tension and challenge between having interactivity and agency and telling a story. And there's a couple of ways to go, which is, you know, everything's running in parallel and looping, you're triggering it, like it's clearly being triggered. And then there's like, perhaps subtle triggering that's delayed enough. So you're not quite sure whether or not you're actually triggering or not. So it gives you a little bit of not being completely sure, but Chet Falsnak of Valve told me that, you know, one of the big challenges of interactive narrative sometimes is that you create an experience and people are triggering something, but they don't actually realize that they're triggering anything. And so if you have branching narratives, then that can be an issue. But if it's a linear story, it's not as much. But, you know, for you, how do you see this kind of tension between interactivity, character embodiment versus sort of like narrative storytelling and some of your explorations in that domain?
[00:31:49.437] Alexander Porter: Yeah, it's sort of an unsatisfying answer, but I think it's unresolved. But for me, that's profoundly exciting. It means that we get to explore this territory. I've always thought about this in terms of our work with Depthkit previously. We were, for better or for worse, using tools for purposes that they weren't intended to. And there's like new territory to some degree. And one of the profound privileges of working in new territory is that you get to do all the things now that will be like a total faux pas in six months. And if you don't give yourself that opportunity in your creative process, you will be completely stopped and nothing will evolve. And I think we're also giving each other that opportunity right now in terms of the community of VR creators, where I find myself relishing when people make mistakes, you know? Because I go, oh, that's very interesting. And it creates insight into maybe why that happened or what could be done slightly differently. And so our thinking with working on Blackout is exploring that, and it's just a process of testing. So we will come up with a tiny thesis about the way this should be done, and we'll have to validate it in the experience, and it works or it doesn't. And I think the fundamental challenge we'll deal with actually is how to create a method for, it sounds so simple, but selecting and choosing to hear from a certain person, a method that actually mimics the way we like to focus on things. Because our gaze is not the only way that we focus on things. And I've found that watching people use the current demo we have of Blackout, Oftentimes, I want to listen to someone and look at something else. And so, obviously, gaze-based interaction is not all of it. So there needs to be some degree of gaze and proximity and all the rest. And other times, you need an interjection in order to move the story along or in order to kind of maybe shock the viewer out of a kind of a lull on some level. And so for me, it's actually comforting that there's no one answer. There's kind of a new palette of interactive modes and techniques that you can use with people, some of which might actually be a little bit uncomfortable, some of which might feel totally natural to the point where they disappear. So no real solutions there, but I'm excited about this.
[00:33:59.193] Kent Bye: Yeah, I think one of the other big challenges is that even if you came up with the perfect method of how it's gonna be done, you know, five or 10 years from now, if it, you know, kind of resolves to a specific solution to time travel and then come back in time and give it to the audience, they probably wouldn't know how to watch it either. So I feel like the audience is actually evolving in the way that they're actually watching and experiencing these narrative experiences as well. I kind of think of this evolution of virtual reality as kind of like three legs on a stool where the technology was being developed but they needed the content and so they were iterating back and forth to figure out like each time there's a new technological capability that expanded what was possible in the content that was being created. And then with the consumer launch of VR, then you start adding in the audience. And so then it's sort of like the market is deciding what is successful and kind of driving what is resonating and what's not. And so it's like this thrashing still, still really early days of VR that that's still kind of being settled out. And that now that the audience is being introduced to these experiences, an additional feedback loop to both the technology as well as the content. mostly the content as the technology starts to settle down. But there's still innovations happening on the technology, still content developing. But yet, if we look at that same idea of that analogy of the stool and look at narrative, it's the same challenge of like, even if you were to be ahead of your time and come up with the perfect solution, audience might not be ready for it.
[00:35:29.840] Alexander Porter: Yeah, and it also might be lost on us, you know, who knows. The other thing is that it's incredibly context-specific, so we might come up with the perfect answer for Blackout, and it may not be reproducible for every experience. And I'm actually excited about a future where it's not reproducible for every experience, where specific genres emerge, and also we get to a point where audiences trust the medium base level enough that we can innovate and break these rules for a given creative experience and people are thrilled by that in the same way I'm thrilled when I watch a documentary that it's experimental or extraordinary or way too long or you know like just different and I'm very much looking forward to that future where trust of the creator is actually a under-considered factor, even when you're thinking about things like presence and breaking presence and that kind of thing. I believe there's, there are the very tangible and overt things, like the techniques and does it work, or is it a good story, is a bad story, too loud, too soft. But then there is, do I trust the sentiment and the aesthetic and the approach of the person who created this thing? And I actually believe that if there is some degree of trust there. Like if they trust the creator, you will allow more. You will allow more issues, problems, or whatever, and treat them as creative choices rather than shortfalls. And I'm excited to getting to a point where audiences start to trust the people that are creating this stuff and empower them to break the rules or empower them to do things in weird ways.
[00:36:55.152] Kent Bye: And finally, what do you see as kind of the ultimate potential of virtual reality and what it might be able to enable
[00:37:02.870] Alexander Porter: That's a funny one, isn't it? I think social experiences for me has to be the sort of most exciting outcome of this. I like to say that the most immersive, exhilarating, and terrifying experiences I've ever had are all social, and I'm looking forward to having those kinds of experiences. in my life outside of virtual reality, but also in my life inside of virtual and augmented reality. And in doing the work we've done on Depthkit, the most exciting thing has been the unpredictability of what evolves and what comes out of communities when they're enabled to interact with each other and talk to each other. And I think that there's a natural future for the presence of volumetric people or other kinds of people in virtual reality experiences socializing. And I'm just thrilled to watch the weirdness and the sort of exciting communities and cultures and odd behaviors that emerge.
[00:38:02.967] Kent Bye: Awesome. Well, thank you so much. Yeah, thanks so much. So that was Alexander Porter. He's the co-founder of DepthKit and Scatter. So I have a number of different takeaways about this interview is that first of all, one of the most striking things about this interview is the Alexander saying, we know that there's going to be a lot better depth sensor cameras that are going to be coming out here soon. And I think that that makes total sense. And something like the Kinect camera is something that was a very specific use case. There was a version one that was bundled with the Xbox consoles, and then version two didn't get automatically bundled. And I think it didn't have as much commercial success as Microsoft may have been hoping. I think a lot of the Kinect team that was working on that eventually went over to HoloLens and a lot of their efforts kind of moved over from using that same technology stack and kind of the third iteration of the Kinect sensors are embedded within the HoloLens. So I think that their current solution with the Kinect is something that is decent and looks okay. I think that there is a lot of noise that happens with that and I think there's a bit of an aesthetic decision that they have to make in order to work with that. Either they clean it up or might come up with some sort of algorithmic process to make it a little bit less jumpy. But when you're in a VR experience, I think that the thing that I really noticed is that if you're able to then kind of dial down the realism of the rest of the scene and have a bit of a stylized character, then I think it can actually work really well. Especially if they start to use some of the artificial intelligence style transfer techniques that have been coming out. I know that Prismic is one of the apps that uses some of this style transfer techniques and it's essentially where you say, hey, take this input of a style of this type of painting and then do all this edge detection and then essentially be able to transfer that style onto another image. I think if they're able to do that type of process within the avatar and be able to do kind of dynamic stylization of the avatar in combination with being able to do real-time streaming, I think they're going to have a really compelling use case for being able to do these kind of live theater type of experiences. But also just give access to a lot of independent creators who frankly don't have the resources to be able to do some of these higher-end volumetric capture systems like perhaps 8i or some of these digital light field cameras like the Lightro Emerge or even a higher-end 360 degree camera rig like either from Jaunt or from Google's Jump cameras. So these are solutions that are out there, but they're not really readily accessible for the average user. And something like this, if people already have a digital SLR camera that's really high-end, able to then get this mount and use a Kinect sensor, and for a very affordable price, start to do volumetric capture and start to really experiment with what's possible doing that. I think in the future, moving away from just a static experience of some of this volumetric capture and doing some of the more interactive entertainment things like either if it's a kind of branch narratives where it's being triggered or if you're able to Have a real time which I think sounds like is the eventual goal for the depth kit project So I think for independent creators who are willing to give up the tyranny of perspective, then something like the depth kit camera could be an affordable way for them to start to get bootstrapped into creating VR experiences. Especially if you start to do capture of people and then be able to put them into these virtual realms, then you wouldn't have to worry about going out and shooting on location. You can kind of shoot everything within the confines of your studio and then create whatever kind of imaginal virtual reality narrative experience that you want. So that's all that I have for today. I just wanted to thank you for joining me on the podcast today. And if you enjoy the podcast, then spread the word, tell your friends, and become a donor at patreon.com slash Voices of VR.