Alexx Henry is a photographer who has created an array of dozens of cameras in order to create extremely high-resolution captures and avatars for people. He’s based out of LA and has a number of different clients from the movie and entertainment industry. But his vision is to be able to democratize this process for independent game and virtual reality experience developers with his xxArray project.
Alexx talks about how this is really a two-step process of first the capture using his xxArray photogrammetry rig, but then there’s the process of creating optimized avatars for either the film industry or a much more lower-poly and optimized version for virtual reality. From a photographer’s perspective, you always want to capture the highest quality and then you can downsample it from there if you’re just producing a low-resolution version for the web. Just the same, he advocates that it’s better to have an extremely high-resolution (like 22 Gigapixels of texture data) onhand in case you need that additional resolution later.
He talks about some of his visions for putting yourself into a virtual reality game or experience, but also some of the implications of identity and self-esteem to be able to have a more objectified experience of your body. He talks about some of the changes that one of his friends had with his self-image in being able to experience his high-resolution xxArray avatar within virtual reality.
One of the big debates that we have within this interview is the tradeoffs of going with photorealistic and hyperreal avatars with VR. It sounds like it’d be amazing, but there are many tradeoffs with the uncanny valley and it has the potential to send you off into a pit if you don’t have an equal amount of fidelity on the social behaviors and cues, interpersonal interactions, eye gaze, and overall believable movements and behaviors. If there is anything that’s off, then it can look creepy or uncanny. Richard Skarbez is probably the most comprehensive interview I’ve done on the uncanny valley where he advocates that the uncanny valley is n-dimensional.
Alexx is a clear advocate for high-fidelity avatars and that believes that there’s a lot of FUD and BS around our concepts and understanding of the Uncanny Valley. It shouldn’t be seen as a unapproachable boogie man, and he showed me the following example during the interview for how believable you can create an avatar within a virtual environment.
Jimmy's Avatar Gets Angry from alexxhenry on Vimeo.
At the end of the day, I’m glad that there’s people like Alexx who are bravely challenging the status quo and providing a technology stack for people to get a super high resolution capture scan and avatar of themselves. I think that there is a lot of really interesting possibilities for what could be done with self-image and identity, especially as a lot of technological hurdles about the uncanny valley are slowly figured out and solutions provided.
Become a Patron! Support The Voices of VR Podcast Patreon
Theme music: “Fatality” by Tigoolio
Subscribe to the Voices of VR podcast.
Rough Transcript
[00:00:05.412] Kent Bye: The Voices of VR Podcast.
[00:00:12.008] AlexHenry BlueVishnu: My name is Alex Henry and I run a company called Blue Vishnu and what we do is we built a system called the XX Array which is a 3D photogrammetry based capture system. We capture humans in ultra high detail 3D and we make avatars. We're interested in working with all sorts of folks from developers to regular consumers but we believe that 3D humans that you should have access to your avatars and that's important to us to make it available to everybody.
[00:00:42.023] Kent Bye: Great, so tell me a bit about the technology in terms of what does it require for someone to get one of these high capture scans, I guess you could call it.
[00:00:50.290] AlexHenry BlueVishnu: So that's actually a good way of phrasing it because we do the capture and the avatar is separate. The avatar is not an incredibly realistic version of you, an avatar is a representation of your likeness. Or at least that's how we define avatar. So your avatar can be a complete transformation of you and it could be a cartoon, it could be a set of photons, but we believe that it should be derived from a very realistic and accurate capture of your physical likeness. So that's kind of a complicated way of saying we're focused on the capture and capturing really high detail and very accurate scans, but we're also focused on the avatar creation portion of that, which is much more of a transformative and subjective process.
[00:01:34.898] Kent Bye: I guess the technology and if somebody wanted to actually get it, you know, like the process of like you take all these photos and then there is, I imagine, a whole post-processing process. So maybe just walk through the process of what it would take to actually get your scan of this system and then turn it into an actual avatar.
[00:01:52.985] AlexHenry BlueVishnu: It's super easy. First, get 90 cameras together. And no, I'm just kidding. We've done a lot of the hard work of making this multi-camera array as efficient as possible. And we're also doing the hard work of doing all the processing. So it's a process called photogrammetry. and we've built a system that is basically a room full of cameras pointing at you that are all oriented in a very specific way. And the technology is called photogrammetry. We did not invent photogrammetry. It's been around in some form since tens and tens of years, the 1960s it's been available. But the idea is basically this. If you take more than one photograph that's been captured from different positions at the same time, of an object or human, then you can look at that photograph and pick out different points. So the computer will pick out different points and then do that for every photograph. And then if you have enough points from the same part of the photograph that multiple cameras can pick up on, then you can actually place that point in 3D space. And if you do that with enough points, then you can create a 3D point cloud And then from there you can put together the geometry, you can connect the dots to make a mesh. And because we use high-resolution cameras in order to do that, you've got ridiculously high, like we got 22 gigapixels of texture data that we can reproject onto that mesh once we solve for it. So we're doing that part, that's kind of the hard part is getting all those cameras to work together and then all of the full pipeline that takes you from a capture all the way to an asset that's usable. We're taking on all of that and we're making it available to as many people as possible. This is not a terribly unfamiliar process for a movie studio or visual effects house because 3D scans are very needed for digital doubles for movies and AAA games are starting to use photogrammetry much more. So those are already our clients and we provide work for business clients. The thing is it's an expensive process. So it's not the kind of thing that an indie developer can really get their hands on because most people charge several thousand dollars for this kind of thing. Because we want to make this as available to as many people as possible. We're disrupting the pricing and we're running a Kickstarter, but we're the general idea is Kickstarter or not. We're making it available to folks at a very affordable price so that you can start developing with these high resolution assets yourself.
[00:04:26.666] Kent Bye: So yeah, I guess in the terms of VR, performance is a big consideration. And so if you have tens of thousands of points and vertices, how does it get optimized for VR then?
[00:04:37.732] AlexHenry BlueVishnu: We're getting hundreds of thousands of points. So our highest resolution is in several million that gets cleaned up. So our stuff is very dense. The capture is very dense. But part of the process of making that asset work for developers is a process where we create multiple levels of detail. Currently, we're processing all the way down to 2K on the low end for poly count, all the way up to half a million. I have a 3D print that I have in my hand, which is really nice detail. This is about the saturation point for what this printer is able to do for detail. And this was printed out at a half a million polygons. And the same thing is available for the texture maps and the normal maps and all of that. We've got multiple levels of detail that we can provide for that as well. So we're getting the really high resolution, but we're also creating lower resolution assets. And my background's in photography. And even if I knew something was just going to live on the web, I would still want to shoot with a medium format digital back because I want the source to be as high-end as possible, even if I know it's going to wind up on the web, because it's going to make a difference. Same thing with this. Even if you're going to only run a 2k poly for a game engine, the normal maps, the displacement maps, if you can use them, or the roughness, all of that comes from the very high resolution. So you're still able to extract a poor level detail if you're able to display it. You just don't have to actually run, you can use the maps, you don't have to run the high poly to get that effect.
[00:06:10.552] Kent Bye: And so what do you imagine kind of the primary use cases for people getting these high resolution captures? And what do you think the biggest thing you see people wanting to do with these?
[00:06:20.701] AlexHenry BlueVishnu: I'm going to not answer that question. Instead, I'm going to answer the question of what the most interesting things are. Because we get a lot of press pickup on the, you're going to be able to play yourself in a video game. And that's cool. That's super cool. I definitely want to see myself in GTA 5. running around doing things that I wouldn't be able to do in real life. But it's the things that we would never have thought of that developers are doing on the fringes that will have an incredible effect. And we would never have thought of them in a million years. But those are the really interesting things that I think are possible. I mean, the stuff that you're doing, for example, We'd love to have realistic avatars in that space. There's ways of connecting that are not necessarily a game but as an experience. There's ways of connecting with these avatars and seeing how people use them in a way that I wouldn't have thought of is just going to be the most exciting thing. I'll give you another example. Well, this is just a very trivial example. A friend of ours, Yehuda Duanez, is a wonderful artist. He was one of the folks that we captured fairly recently. And when we had him meet his avatar, now he's a very skinny guy. Nobody would accuse him of being overweight, but you're still seeing yourself in a form that you're not used to seeing. So he has a little bit of, he could use a week at the gym. I can say that. You won't be upset with me, Yehuda, will you? He's got a couple of love handles. The process of him actually seeing his own body outside of himself was a cathartic process because he was able to remove this level of like attachment and then objectify himself in a way that was incredibly healing for himself. He was able to let go of all the hang-ups he had about his own body that he would have in a mirror and then just sort of see it for what it is.
[00:08:15.956] Kent Bye: That's really interesting. And before we started recording, we were talking a little bit about catharsis theory in film. What is catharsis theory and how does it relate to 2D versus 3D?
[00:08:25.562] AlexHenry BlueVishnu: Sure, you're getting me into my film history background. So catharsis theory is a big part of film theory, which is basically catharsis is about empathizing. So catharsis theory means that if you see, like for a film particularly, one of the powers of a film is that you place yourself in the role of the main character, the protagonist, when he makes a win or she makes a win, you feel that through catharsis. So you're emoting with the protagonist cathartically, And that's one of the reasons why film is such a powerful art form. VR has taken that to a completely different level. It's not just catharsis. You are getting a level of presence that puts you there. And that's the real power of virtual reality, that it sort of breaks down that final layer that film was only able to do through empathy and can actually place you into an environment where you literally experience something directly as opposed to through catharsis. I think that's awesome.
[00:09:23.933] Kent Bye: Yeah, it's really interesting, and I've definitely experienced that in terms of actually being there versus sort of the projection or the distance. I feel like there can be a lot more violence in 2D films than you can kind of bear within the 3D environment. Even the London Heist Sony demo, you're kind of ducking behind things and shooting people, and it was like, for me, that was really weird. I felt like, oh my god, this is way different than a 2D medium of shooting people. I'd never felt that hesitation of participating in violence in a virtual environment in a Sony demo. Yeah, it was kind of disturbing, yeah.
[00:10:00.709] AlexHenry BlueVishnu: That's actually very interesting. I hope that's the case. I hope we don't get desensitized to that, because that can be very powerful to sort of not just be passively viewing violence like it's nothing and actually feel the real impact for what it is. Crazy. Yeah. I certainly hope that's the case.
[00:10:17.215] Kent Bye: Yeah, no, if that leans towards experiences with less violence, I think that's great as well.
[00:10:22.202] AlexHenry BlueVishnu: Well, and again, one of the things that we're trying to do is put you into the experience. So whether we're successful or not, somebody is going to make a path for you to be in the experience, the people that you love to be in the experience. And that's going to have a very different effect on, on you when you're actually a part of the experience, when you are the protagonist or somebody that you love is in the movie or on the whole experience. I think that's going to be a big game changer.
[00:10:48.544] Kent Bye: Yeah, so to get back to, you know, there's an exchange that I had with one of your former community managers, you know, going back and forth talking about this technology, and one of my initial gut reactions was like, whoa, this is really high fidelity, and it's dangerous to try to trough past the uncanny valley whenever you're working with virtual reality. And the thing that I've found is that the Uncanny Valley is n-dimensional, meaning that if you go low fidelity, then you can kind of have low expectations as far as what you expect in a VR experience. But as soon as you start to go higher and higher fidelity, then all of the social behaviors, all of the blinking, all of the interactions must be coded in, all the sound, all the haptics, you know, your body kind of expects it to be like reality. And with VR, it's like we're kind of far away from that. Or at least you have to do a lot of effort without sort of falling backwards with your back on the ground looking up into the uncanny valley. So I'm curious what your take on that is in terms of how to deal with the uncanny valley in this technology.
[00:11:50.610] AlexHenry BlueVishnu: Yeah, I'm going to go ahead and say that the Uncanny Valley is... I can't curse? A lot of people treat it as a boogeyman, and it's not. Sure, there is a context where you can view characters and you can view representations where it's really creepy, but it's all about context. So in general, the way that people describe the Uncanny Valley is this boogeyman that you just have to avoid. And it's not. It's really not a boogeyman. It's all about context. A really talented developer, John Hable, he has a blog called Filmic Worlds. He released a blog post within the last month or so where he argued the Uncanny Valley as FAA rejection. And I don't remember exactly what FAA stands for, so I'll look that up in a second. The general idea is that there's a part of your brain that is trained to recognize faces. It's in the fusiform gyrus, and there's just an enormous amount of processing power just devoted for that. And we don't really understand exactly what that algorithm is or exactly what those points are, or at least the neuroscientists that I've spoken to. haven't figured out exactly what, but there's a lot of good work being done in that space. But if your fusiform gyrus rejects that it's a face, then it's going to feel uncanny if it's supposed to look like a face. So what John argued for the first time was that Uncanny Valley is not this boogeyman, but it's really the rejection of the facial recognition part of your fusiform gyrus. It's a great blog post, definitely worth checking out, but it's a great example that it's all about context. And I would argue against what you just said in terms of that the closer you get to higher fidelity, the further away or deeper you dive into Uncanny Valley. Again, it's all about context. So.
[00:13:47.232] Kent Bye: Well, just to clarify that one point, it's just that once you move from low fidelity into higher fidelity, then all dimensions of your experience have to meet those expectations of the high fidelity.
[00:13:58.573] AlexHenry BlueVishnu: context comes into place because, and again, when people describe Uncanny Valley, it's not just one thing. Uncanny Valley can be, something can be uncanny that's just animation. And in fact, often my experience is when people see something that's very uncanny, a lot of it has to do with how something was rigged, if it was just poorly rigged. A lot of animation is being run with mocap data, which is very realistic. So go back to the earlier point, animation itself can have its own uncanniness. textures can have its own uncanniness, but it really all depends on the context. And I don't mean to sort of cheat by saying you have to experiment, but you have to experiment. I don't agree that higher fidelity gets you into a trap where you now need to have eyes that blink and eyes that water and haptics and all of this feedback. All of that will help, but I don't think it's a cavern that we need to go and cross like it's typically proposed to be.
[00:14:58.880] Kent Bye: Yeah, from my own personal experience of kind of working with facial animation is that in the process of putting something together in 10 days I could have used another 10 or 20 days just to clean it up because there are so many things that like break coherence because they look creepy. The eyes didn't track and so the guy looks psychotic. That was my thought. He looks psychotic because he's not blinking. And just the fact that he's not blinking, those little social cues, we're very attuned to that. And so it's a break in presence, is my experience, is that when you have a sense of presence and a coherence and immersion, as soon as there's one thing that's off, then the house of cards can fall down very easily.
[00:15:35.277] AlexHenry BlueVishnu: So I'm going to show something to you right now, which is perfect for this medium of a podcast because nobody gets to see. So I'm actually showing an animation that was done from one single T-pose capture. And it's an animation of somebody who's walking very naturally, turns his head, turns into the Incredible Hulk, and walks off stage. It was created by an incredible CG artist named Matt Dirksen. But the interesting thing about this is it was a single T-pose capture. There's no blinking that happens. There's no facial animation. There's no movement in the face. But I would argue that this was contextualized so that you don't need it. You feel like he's a full breathing, walking, blinking character. So, what do you think?
[00:16:21.917] Kent Bye: Well, what I would say is that this is a 2D medium and it looks amazing. But if this was in VR and I was engaging with this, it may look creepy because maybe he's not... I would feel like a ghost. I'll take that challenge. So, here's the thing. Like, if I'm interacting with this character, then maybe I would expect it to have, like, if I go like this, then, like, oh, he's not reacting to me. So, I feel like I'm not really here. So to have that sense of like, I'm really here and really present and plausible in this sense that everything makes sense, then if you have something that high fidelity, maybe you have to go and have all the different social cues of like interacting with this person. And that's sort of like making AI that is convincing, but there's certain social cues that if they're not being met, then that could be the thing that actually makes it more creepy or uncanny.
[00:17:07.563] AlexHenry BlueVishnu: Certainly, but with any new medium, you want to understand the limits and with any technology limits that you have, you can either focus on the weaknesses or the strengths. In this case, I would argue that you want to focus on the strengths. The Moore's law will say that we'll get all of those things that you're talking about. We'll get the haptics. Those will come together. Sorry, I got very distracted by that thing. There's like a GoPro robot walking on its own accord. That's creepy. Yes. It's very, that's uncanny. That's very uncanny. Apologies. But again, my argument is about the context. If you're able to focus on the strengths of the medium, you can do some really incredible stuff.
[00:17:47.973] Kent Bye: Yeah. And I think it's brave because it takes a lot of work to get to that other side of the uncanny valley. It's a brave thing. And I think that people need to be doing it. It may not be my path, but I'm glad that, you know, this is something that you're taking on and pushing forward with.
[00:18:01.520] AlexHenry BlueVishnu: Yeah, we don't have all the answers. I mean, what we're doing, I'll just be very clear, we're doing some experimentation, but our main focus is to provide these high-resolution captures that are going to work not only for now, but for years to come. We want to make this available so that people can experiment. I think it's going to be great to see how people use them.
[00:18:22.951] Kent Bye: Great. And finally, what do you see as the ultimate potential of virtual reality and what it might be able to enable?
[00:18:28.994] AlexHenry BlueVishnu: To make a human connection is very powerful. So, we talk about this magic three and it's really the two things that we do which is capture, we capture in very high resolution. Two, we transform, we do stuff with them and that's what happens when you take all those captures and you parameterize them, make it so that you can morph or change the body or animate them. And then the third is really what happens if you take those two and you do something incredible with it, you combine those in a way that's transformative. And that's the most interesting thing. I would love to see what a developer like you does with our captures, and would love to support that. Because I think if you can make a connection that doesn't exist in any other medium, it couldn't be done in film, it couldn't be done in a poem, it couldn't be done in a song, then you're really using this medium for its most powerful. Awesome.
[00:19:23.687] Kent Bye: Anything else that's left unsaid that you'd like to say?
[00:19:26.288] AlexHenry BlueVishnu: You know, there's a lot of talk about, like, we're at the beginning of a new era, and as a film history student, I studied the early days of film, and I got fairly intimate with the early experiments that were happening with filmmaking, and this is definitely happening today. keep an open mind, get experimental and stay foolish, as Jobs would say, and make big things happen. Because this is a time where just with will and some spit and elbow grease, you can do something that nobody else has ever done. You can find a new way to connect with somebody through this medium. You don't need a big budget, you just need to be experimenting. So that's my final thought. Awesome.
[00:20:07.053] Kent Bye: Thanks so much.
[00:20:07.853] AlexHenry BlueVishnu: Awesome. Thank you so much.
[00:20:09.447] Kent Bye: And thank you for listening! If you'd like to support the Voices of VR podcast, then please consider becoming a patron at patreon.com slash voicesofvr.