#738: Imverse’s Real-Time Volumetric Capture with Voxels

javier-bello-ruizThe Swiss-based company Imverse built a Voxel rendering engine that can do real-time volumetric capture. It powered the Elastic Time experience at New Frontier last year, and this year they collaborated with artist Maria Guta on Interlooper that allowed you to record and loop segments of your embodiment. I had a chance to talk with Ruiz about their real-time volumetric capture solution, some of the neuroscience inspiration for their project, and why he thinks voxels are the future of volumetric capture.

LISTEN TO THIS EPISODE OF THE VOICES OF VR PODCAST

This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality

Rough Transcript

[00:00:05.412] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices in VR podcast. So in my previous interview with Maria Guta, we talked about the content of Interlooper, which is this experience where you're able to do this volumetric capture of yourself, but you start to record yourself and loop it on top of each other, and you're interfacing both with pre-recorded holograms and actors that are interfacing with you in real time. So the technology that was really facilitating this was from inverse technologies and it's able to do this real-time volumetric capture using Depth sensor cameras and they're being translated into these voxels these volumetric pixels and so it is very much like having a low-fidelity Minecraft like experience where there's just a lot of blocky Cubes that are building up your body in a way that you could tell that is yourself just because it's correlated to how you're moving in real time and But it's definitely very low fidelity and stylized. But the advantage of that is that you're able to do like these types of real time interactions that you wouldn't be able to do if you were trying to do a higher fidelity depiction of that. Or I think eventually they're going to be able to add all sorts of different shaders and things like that on top of it so that you're able to have like these really crazy virtual embodiments of yourself. So I had a chance to talk to one of the founders of Inverse Technologies. It's Javier Belo Ruiz. And he talks about the cultivation and development of his technology and how it fits into the larger volumetric capture ecosystem. So that's what we're covering on today's episode of the Voices of VR podcast. So this interview with Javier happened on Saturday, January 26th, 2019 at the Sundance Film Festival in Park City, Utah. So with that, let's go ahead and dive right in.

[00:01:53.085] Javier Bello Ruiz: So I'm Javier, I'm the CEO and co-founder at Timbers. We are a Swiss 3D graphics company creating software for content creation in virtual reality, mixed reality and filmmaking.

[00:02:06.901] Kent Bye: Great, so I first saw the experience that you had last year, which is called Elastic Time, so maybe you could talk a bit about the first project that you were showing here at Sundance and what you're able to do with your volumetric capture holographic technology there.

[00:02:19.646] Javier Bello Ruiz: So in Elastic Time, past year, we were capturing your body in real time, so you were part of the interactive movie. and you were in control of a black hole. It was a mixed reality documentary about astrophysics. So while you were in control of the black hole, you could bend space and time inside of the observatory from Harvard. And at the same time, you were able to see yourself inside of this interactive experience.

[00:02:49.720] Kent Bye: So how did this software come about? What was the motivating factor and the story for why you created this?

[00:02:55.587] Javier Bello Ruiz: So actually we were creating virtual reality for a neuroscience research laboratory in Switzerland and we saw that content creation tools were especially complicated for people like psychologists but in general it's a process that is complicated for the creators. So it's very expensive, you require special equipment, so we decided that we wanted to create an alternative. We wanted to create a software to simplify the content creation pipeline. So we were mixing our background in computer science and voxel 3D graphics. with the learnings from neuroscience to make experiences in which you really feel immersed and present by the embodiment, for example by seeing your own body, but also you can create the experience itself with our software in a much easier way by using voxels rather than polygons.

[00:03:53.399] Kent Bye: What were some of the specific insights from the neuroscience research that you were taking and then trying to apply?

[00:03:58.982] Javier Bello Ruiz: So for example, I can speak of two, three examples. One is you can remember better things if your body is involved. So an experience will be more meaningful for you if you are interacting with your own body, if you feel present in this space. That's something very important. Another one is that virtual reality or mixed reality is a multi-sensory medium. It means that you not only rely on the visual information but on the interactivity that you can do with your own body, on the control that you can do of the story. We always look at our experiences as a multi-sensory integration that has to be as close as reality as possible, playing with all the visual effects and all crazy things that you can do in virtual reality.

[00:04:50.426] Kent Bye: Right. And so there is something about my experience of elastic time where because you're bending the shape of the environment, you're in some sense, you know, technically changing my experience of time because there's a space-time continuum. And so when you're warping space in like that, it starts to give you either time dilation or time acceleration where it's changing your perception of time. I started to experience that a little bit, especially when you're starting to rewind and play through it. But I'm just curious to hear from your own direct experience of being in these environments and starting to warp space-time in these different ways and what your experience of that was.

[00:05:23.427] Javier Bello Ruiz: Well, we have to say that it was an artistic representation of what it can mean to be close to a black hole and experiencing this bend in space and time. It's true that the lead artist of the project that we collaborated with, Mark Boulos, he did a residency at the Harvard Center for Astrophysics. So he got to know some of the astronomers and he tried to understand how he could represent this phenomena that is very complicated for regular people to experience and translate it into a virtual reality experience for the user to understand better what could it mean to be in this time manipulation and space vending.

[00:06:05.337] Kent Bye: And so I guess from last year, from Elastic Time, you have another experience here at Sundance called Interlooped. And so maybe you could tell me a bit about how that project came about.

[00:06:14.510] Javier Bello Ruiz: So Interloop is a collaboration with our company Inverse and Maria Guta, the lead artist. We met her during different festivals. She was a curator actually of a virtual reality festival in Switzerland and I think that her creative side with our technology was matching very well because she was looking for something in which you could experience your digital self how to see your body or see multiple copies of yourself in a space could affect your experience. And at the same time, she wanted to perform life inside of this virtual space. So it was the perfect fit with our capture system. That's what we wanted to do as well. We wanted to show that you can be the protagonist of your own movie, that you can have like a 3D Skype inside of an interactive experience. So we see that as the future of entertainment in which your interactivity, your presence and your decisions inside of this space are shaping your own experience. So it's something different for every person that tries the piece.

[00:07:23.142] Kent Bye: And so I'm wondering if you could talk a bit about the voxels and the choice to use voxels because when I think about a voxel I think of like a metaphor of like Minecraft where you have all these blocks and you're able to have, it looks like very blocky but the smaller and smaller you make the voxels I guess the closer it becomes to like a 3D pixel but it also has like a very low-fi pixelated effect where you're still able to get a sense of your embodiment but it's definitely stylized in the sense of I wasn't tricked or fooled into believing that this was actually my body, that it's like this is a digital representation, but it's still connected to my body enough that I could start to then adopt it as if it was my own body because I was able to see that synchrony. But I'm just curious to hear about the decision to use voxels and what voxels are and what they enable you to do.

[00:08:09.634] Javier Bello Ruiz: So, as you said, voxel is a volumetric pixel. We consider them as atoms. In that sense, you can represent 3D space or a 3D element in a way that is closer to reality. That's why we are using the metaphor with atoms. Because what happens with traditional 3D rendering with polygons is that every object, every visual effect that you plan, it has to be adapted to the connection between these polygons. So, in the case of voxels, the connectivity is always the same. They are atoms that you can manipulate always in the same way. So, for example, it's making it much easier to create a visual effect, because once you create a visual effect, it's a mathematical simulation that you can apply to every object. You don't have to consider how that object is made with how many polygons or in what shape. So that is one of the things that is simplifying very much for us the content creation and that's why we believe that voxels are going to be a big part of the future of 3D graphics. But in this case for the volumetric capture, for the real-time capture of your body, it's also helping us to have the live capture, the real-time capture because When you are using polygons for that, you have to recalculate all the time how the polygons are connected to make your body a solid object. With voxels, again using the metaphor two atoms, we just need to connect all of them together coming from the cameras that we set up around yourself and then it's making your body look solid. It's true as you said that right now the resolution it seems to be still a bit blocky. but we are sure that with the new cameras coming in the market, we are going to have much better resolution that could really feel like you are and you have your real body inside of the space. So we work with any camera in the market that has a depth sensor, so establishing collaboration with tech companies, we believe that we are going to have a great quality to be able to see yourself as if you were in reality inside of the experience.

[00:10:18.304] Kent Bye: And my impression is that you've had to create your own engine to be able to drive these voxels. And is this something that you foresee could be something that could have a voxel format that could be imported into something like Unreal Engine or Unity? Or do you feel like this is a technological roadmap where you have to kind of roll your own engine, and then if people want to do something within the context of creating an experience here, they have to use your specific tools in order to do that?

[00:10:42.850] Javier Bello Ruiz: Well, obviously for us what is more important is the creators. So yes, we have our own 3D engine that is working with Voxels, but we want to make it possible for them to export what they created in our engine and to use it with Unity or Unreal or other 3D modeling tools or other game engines that work currently with polygons. We truly believe that in the future voxels are going to be more relevant than polygons for 3D creation, but we want to be as much compatible as possible right now while we make the transition towards voxels. So it means that we could have this export functions or plugins connected with the different software, but we really expect that in the future people will adopt our technology to create 3D graphics in general, not only for virtual reality but you will have in your computer or in your smartphone a piece of our technology that will help to display 3D graphics on your screen.

[00:11:41.194] Kent Bye: I've done a lot of different interviews with people talking about volumetric capture in VR and so there's Windows Mixed Reality service, someone like Metastage is doing like this, lots of digital SLR cameras that are capturing someone that's performing on a stage and then they're able to do this post-processing and then produce stuff out from that. There's Depthkit that is doing a very similar kind of low fidelity capture, but I think it's less about using voxels and more about creating these texture video files and creating these ways to put them onto WebVR or put them into Unity files, but to almost create a map that is getting the depth information, but to blend that into a specific object that then can be imported in. then you have regular motion capture which I think is another big huge option in terms of just getting the raw data that you have for the points on the body and then translating on top of that all the texture and information and so it seems like there's different advantages and disadvantages to each of these different approaches you know if you wanted to do something that's interactive and dynamic then maybe something that's like motion capture you're able to then do it there. Just curious from your perspective, as you look at this landscape, what are some of the advantages or disadvantages for voxels compared to some of these other volumetric capture approaches?

[00:12:54.033] Javier Bello Ruiz: Well, I don't think that we have to choose one or the other in the sense that I see that avatars can be very important for experiences in which you want to be somebody else. Voxels in our case is providing this life capture, this possibility of being an active part of the experience with your own body. and the volumetric capture with high fidelity recording. It's also very interesting to capture actors and to place them inside of the experiences. So, in that sense, I think that people, when creating the experiences, they should choose what fits better, what they want to transmit to the public. What we see as a great advantage of Voxels is that we can actually import those other formats into our engine and also use them. So you don't necessarily need to use our volumetric capture For everything you could use Metastage to record some actors and place them inside of our engine. You could use then our capture system to integrate your body in real time and then you could have an avatar because you want to have a special character from a superhero movie and it has to look that way. So what Voxels are giving us in that sense is the possibility of importing all this different data transforming all this data in real time into our voxel structure because that's what it's about. It's a very flexible structure that we created to manipulate everything and to create the 3D space and the 3D experience. So you choose what you want to use but we think that voxels are going to be the data structure that is going to help the most when you want to create the experience and when you want to simplify the content creation pipeline.

[00:14:42.055] Kent Bye: So your company's called M-Verse, I-M-Verse, so like immersive and inverse, but instead of an N it's an M, so M-Verse. So is M-Verse creating just a software solution or are you creating a whole bundled 3D depth sensor cameras, whether it's like a Kinect version 2 or whatever the latest depth sensor camera is going to be? Do you foresee that people are going to do a bit of plug and play in terms of setting up all the different depth sensor cameras of whatever the ones they prefer, or is this something that you are planning on selling as like a bundled solution?

[00:15:13.679] Javier Bello Ruiz: So we imagine it in different ways. We want to use our core technology to really, as I said before, to be integrated at the level of the graphics card or your operating system, and then build on top of this our game engine. And to do that, we are planning a suite of applications on top of this game engine to address different parts of the content creation pipeline. So, for example, we have Inverse Live Maker that allows you to create 3D model, a volumetric space from a single 2D picture. So in comparison with photogrammetry, you don't need to take thousands of pictures, you just need one picture and our software will allow you to create the 3D model. And with Inverse Live Stage, our volumetric capture system, for now we are exploring collaborations with content creation studios. So in the future, yes, we imagine to bundle it with different camera systems, but for that we have to work in our partnerships and collaborations with companies like Microsoft or Intel to see how better to address the distribution. Again, our main concern is to make it easy for the content creators. So if they prefer to have a custom system, we can provide something more custom. But it will be ideal that our software is compatible with any depth sensor in the market.

[00:16:36.993] Kent Bye: Great. And so for Inverse, what are some of the either biggest open questions that you're trying to answer or open problems that you're trying to solve?

[00:16:46.135] Javier Bello Ruiz: Open questions that we want to answer is obviously the future of location-based entertainment. We see that it's becoming a fast-growing market and we think that even if our 3D graphics could be applied to many things, we want to apply it to this dynamic medium as virtual reality and mixed reality because it's where the Voxel solution can bring the most in dynamic and interactive experiences, but what we want to see as well is how is going to be the adoption of these voxel solutions. Is it going to be something easy for the different companies that are used to polygonal rendering to adopt our solutions? We want to talk to them, we want to collaborate with them to understand how we could make the integration as easy as possible.

[00:17:38.924] Kent Bye: Great, and finally, what do you think the ultimate potential of virtual and augmented reality are, and what they might be able to enable?

[00:17:48.711] Javier Bello Ruiz: It's a difficult question, or a very easy one, in the sense that you can do anything. We believe that this is the future of entertainment, or at least part of it, in which it will be a personalized entertainment, that you can be the protagonist, it's a social experience with your friends, It's something that you can experience across different locations. We feel that that could be very special for entertainment. But obviously, as I mentioned before, we were first creating virtual reality for neuroscience research. So there is a lot of applications for neuro rehabilitation, for helping neurological patients to recover faster, because you can create close to reality environment to control the parameters that can help them to recover faster. So applications are endless, training, business meeting across different countries. It's difficult to grasp how much is going to change the future thanks to virtual reality and mixed reality. Of course the adoption will happen first in different industries and in different solutions, but I truly believe that it's going to be something as important or more than right now using your smartphone. One example that we always were speaking about is now a kid is probably asking you How were you doing this before having a smartphone like you were really reading a map on a book? So we really think that in the future people will say how are you doing that without virtual reality? You really had to look at the screen. I don't know in in a space close with chairs. That seems very strange. I

[00:19:24.183] Kent Bye: Is there anything else that's left unsaid that you'd like to say to the immersive community?

[00:19:28.645] Javier Bello Ruiz: Well, what I have to say to them is that we are always looking to speak with all the content creators to understand what are your problems, to understand how better you can transmit your stories to the public and to work together for what we believe is the future of entertainment and 3D graphics.

[00:19:49.796] Kent Bye: Awesome. Great. Well, thank you so much. Thank you very much. So that was Javier Belo Ruiz. He's the CEO and co-founder of Inverse. So I have a number of different takeaways about this interview is that first of all, So there's many different volumetric capture solutions that are out there. And I actually have a number of interviews with the different technologies that are out there, including Christina Heller from Metastage that I did at Magic Leap LeapCon and talking to the founders of Depthkit that'll be airing later in this series of interviews that I did at Sundance, as well as doing some interviews with Andy Serkis with the different types of motion capture that he was doing, an interview that I also did at Magic Leap's LeapCon. But in this case, it seems like the strength of this type of volumetric capture is that it's very low fidelity. So the advantage is that you're able to do real time interactions that would just be impossible to do with any other technology. And I think there's advantages of that, especially when you talk about virtual telepresence, or if you're doing like these real time, crazy looped experiences that are very artistic. But in terms of neuroscience, I actually think that there are a lot of really compelling aspects of having this sense of embodied cognition. How can that be used for neurorehabilitation? So you're actually able to put yourself into these different environments and see your body's depiction. And sometimes you're able to do this changing of your body. And I don't know how easy it is to kind of manipulate the underlying movement of your body. a lot of principles of neuro rehabilitation is that you're only able to have a very restricted range of motion and that you're able to within a virtual environment, amplify that in some ways. And so what is the degree that you could rig some of these volumetric capture real time experiences, and then actually change the visual depiction of that in real time to give you this sense that you're able to give the visual feedback in your mind, that you're able to do things that you can't actually physically do yet. And I think that's one of the the major concepts of neurorehabilitation is that you're able to do that symbolic translation to be able to take a small range and extend it over a large range, but give you that visual feedback to train your brain and neuroplasticity to actually rehabilitate faster. But in terms of the other solutions that are out there, the things that come to mind are both the Intel Studios Capture Stage as well as the DepthKit. DepthKit I think it's going to be releasing some real-time streaming solutions at some point as well, just because they're able to release a whole series of tools that you're able to use these volumetric capture commercial off-the-shelf sensors and be able to capture these scenes and be able to do this artistic translation to put them into virtual reality. So there's this spectrum between photorealism and some sort of abstract depictions of reality that are close enough to the volumetric capture to give you a sense of what the thing actually is. And the Intel Capture Stage, I think, also uses voxels because you're able to have 10,000 square feet that they're able to capture at the same time. And they're able to take these volumetric voxel-based captures and then be able to do all sorts of shaders and visual depictions on top of it. The experience called Running, which I had a chance to talk to Kira Benzing, it features Reggie Watts. They were able to capture 12 or 13 dancers at the same time, and then they were able to copy and paste so it made it feel like you're at a dance party with like 50 people that were like dancing on the side of the walls and the ceilings and whatnot. But that's just an example of another experience that I think was using voxels for actually depicting these volumetric captures, but you're able to do all sorts of sophisticated shaders and just make them look super artistic and stylized. And that's more on this abstract end. And I think that our brains are in some ways going to be better suited to believe that these abstract depictions are real more than the photorealistic ones. I think that the photorealistic ones are going to have its place and they're likely going to be in the context of augmented reality where they're going to be in a context of being juxtaposed with a normal photorealistic environment in that it's going to be more an alignment to have an AR experience that has this kind of like metastage technology that's like a Windows mixed reality capture. that Christina Heller is the CEO of Metastage and you're seeing a number of different Metastage types of experiences that were being shown at Sundance but that's a photorealistic experience in that when you're in a virtual reality environment then when you see that then it's still I think hard for your mind to believe within a VR experience because Your brain just knows that you're in a virtual reality environment and that is easier for it to accept the reality of something that's a little bit more stylized or abstract, which is a bit of the reason why you see things like Pixar and these different animated companies that whenever they're showing humans, they're not photorealistic humans. They're actually very stylized in their own cartoony way, but it's just a subtle subconscious cue to your brain that you're not going to hold it to the same standards as you would as if it was. photorealistic human because if you start to do that then you start to fall into like this weird uncanny valley where your brain is actually expecting all this additional levels of emotional signaling that you're getting from the face that you're not able to get within a lower fidelity technologies that's not able to actually capture the full complexity of the human face. So I see that there's a similar type of tension when it comes to these volumetric capture solutions of where on the spectrum are you going to land on the photoreal versus on the more abstract. This is certainly way on the low-fidelity abstract end, especially when you're able to do like the real-time capture. So it's interesting to hear Javier say that, you know, the voxel is going to be a huge like paradigm shift when it comes to moving away from polygons. And yeah, it'll be interesting to see where that goes, especially if the volumetric pixels get smaller and smaller than are you able to depict different things that would be. way more difficult to depict if you're using polygons. And I guess there's a deeper question in terms of complexity and what are the trade-offs in terms of how complex these different geometries are able to show. Are you able to show more complex geometries giving these voxels? But it seems like the big strength of what they're really focusing on at this point at least is just doing real-time human capture. And so what's it mean for you to have this real-time volumetric voxel-based depictions of yourself And how much does it take for you to believe that these voxel representations are good enough for yourself to be able to allow yourself to believe that you have this different level of embodiment? So, that's all that I have for today, and I just wanted to thank you for listening to the Voices of VR podcast, and if you enjoy the podcast, then please do spread the word, tell your friends, and consider becoming a member of the Patreon. This is a listener-supported podcast, and so I do rely upon your donations in order to continue to bring you this coverage. So, you can become a member and donate today at patreon.com slash voicesofvr. Thanks for listening.

More from this show