#124: Rod Haxton on VisiSonics’ RealSpace 3D audio licensed to Oculus, personalized HRTFs, & their Audio Panoramic Camera

Rod Haxton is the lead software developer for VisiSonics, which created the RealSpace™ 3D Audio technology that Oculus has licensed to put 3D audio into VR.

My experience is that having 3D audio in a VR experience is a huge component for creating a sense of immersion, especially when you’re able to go beyond panning the audio between the left and right channels as you turn your head. With RealSpace™ 3D Audio, they’re able to go beyond panning to simulate the elevation and whether the sound is in front or behind you. They process audio in a way that’s analogous of doing ray-tracing for ears where they take true material audio reflections and do calculations that are based upon Sabine’s reverberation equation.

Our ears filter sound in a way that helps us be able to locate the sound in space. Everyone’s ears are different, and VisiSonics can create a specific profile for your ears in what’s called a HRTF, or head-related transfer function.

They have a database of HRTFs, and use a default profile that works pretty well for 85% of the population. Rob talks about how VisiSonics has a patented a fast-capture for a personalized HRTF where they put speakers in your ears and have an array of microphones in a room. He sees a vision of a time in the future where you’d go into a studio to capture the HRTF data for your ears so that you could have a more realistic 3D audio experience in VR.

Rob also talks about:

  • Special considerations for spatializing audio & a special tool that they’ve developed to evaluate how well a sound will be spatialized.
  • Oculus’ SDK integration of RealSpace™ 3D Audio technology
  • Unity integration & VisiSonics direct integration
  • Options available for 3D audio that are provided by their SDK
  • Maximum number of objects that you could spatialize & what’s a reasonable number
  • Future features planned for the RealSpace™ 3D Audio SDK
  • Unreal Engine support coming soon
  • Originally funded by the DoD to help develop a way for nearly-blinded soldiers to do wayfinding
  • How Tesla is using their panoramic audio cameras to improve the sound profiles of cars
  • How Rod helped get RealSpace 3D audio into a game engine & how they connected with Oculus at GDC 2014
  • How they’ve developed a panoramic audio camera to be able to visualize how sound propagates
  • Good examples of 3D audio integration can be found in Technolust & demos from Unello Design & Bully! Entertainment
  • How poorly-implemented HRTFs had given them a bad name over time

This week, VisiSonics announced Unity 5 integration is now available in their latest v0.9.10 release

Theme music: “Fatality” by Tigoolio

Subscribe to the Voices of VR podcast.

Rough Transcript

[00:00:05.412] Kent Bye: The Voices of VR Podcast.

[00:00:12.012] Rod Haxton: My name is Rod Haxson. I'm the lead software developer at Visisonics. We do 3D audio. It's come out of the University of Maryland. It's a 10-year research by Dr. Ramani Duraswamy and Dimitri Zatkin. They've figured out how to do true 3D audio with HRTFs. I'm the software developer who wraps that all and puts it into a Unity plug-in. We also have a Wwise plug-in, and our technology's been licensed by Oculus.

[00:00:40.997] Kent Bye: Nice, and so maybe talk a bit more in terms of like what this enables you to do. What's the difference between just putting a sound in like Unity and then how does it sound different when you use your plugin?

[00:00:51.947] Rod Haxton: Okay, so our technology works over earphones. The difference between Unity or any other 3D audio that doesn't use HRTFs is it's basically they do panning left and right and then the sound You don't get elevation or front or back. It's usually just plastered right in the middle of your head. So with our technology, you're able to get elevation sounds in front of you, behind you, exactly where they are. That's why we call it Real Space 3D, because it's actually real. The technology has been around for 20 or so years, but it hadn't been done right. So HRTFs had gotten a bad name, but now it's being done properly. So basically, we take the true material reflections in the environment. So if you have a room with glass or brick or carpet, we take those material coefficients, run them through our algorithm. It's basically like ray tracing with vision except now it's being done with ears so we calculate all the energy and the reflections of sound and how it propagates and it could be a million and one permutations but we know the proper ones and focus on those and then pipe them back in through the earphones to your ears.

[00:02:04.106] Kent Bye: And so maybe you could describe and explain what the acronym HRTF means.

[00:02:08.769] Rod Haxton: It's head related transfer functions. So it's dealing with everyone's ears are different. So the way sound propagates through an environment, your ears play a role in that your shoulders, your neck, the environment. And so based off of Sabin's law of how sound propagates and reverberates off of materials and how long it takes to dissipate, we handle those calculations and then pump them back in. The genius behind that is Dimitri Zatkin.

[00:02:41.929] Kent Bye: I see. And so it seems like the ear has some sort of like way of modulating and filtering out frequencies that, you know, depending on where it's at in elevation or in yaw, left or right, it sort of is able to locate it based upon our own sort of filtering of frequencies is what it sounds like. And this is a way to just sort of mimic that. Is that correct?

[00:03:02.138] Rod Haxton: Right, everyone's ears are different. So you have some people with big ears, small ears, they're cupped, they're close to their heads. And so we have a database of HRTFs and the default one we use works for 85% of the population. In our lab, we have patented a fast capture way of capturing everyone's HRTF and so in the future we plan to modify that so that it could be in like a Best Buy or GameStop and people can go in and get their own personalized HRTFs and then what that does for us is then we get a database of that and we can sell it to like Dolby or anybody else and so folks can go in and get their HRTFs and then videos, movies, games, plug in their own personalized HRTF and it'll be the best for them.

[00:03:53.197] Kent Bye: So not only the IPD of the inter-pupillary distance of how far your eyes are apart, it sounds like you have a profile signature of someone's ear to be able to determine how they actually hear, you know, the filtered frequencies to be able to locate sound in XYZ coordinates. And so, what kind of values or how many numbers would it take to be able to describe somebody's ear profile in that way?

[00:04:14.691] Rod Haxton: Well, usually it's done with, um, the current way of doing it is there are microphones in someone's ear and speakers, an array of speakers around them, maybe up to 200 or 300 and that pumps it in. And so then it's captured for that person. But what we've done is just reverse that technology. So now the speakers are in your ear and the microphones, and we have a patent on that and the microphones are around you. And so where the current way of taking HRTFs may take two to three hours. Ours takes 15 minutes. And since everyone's HRTF is different, and it's also changing. So when you're 20 years old, your HRTF is different from when you're 50 years old. So, you know, your hearing starts to decrease a little bit. So it's sort of like those little tone applications they have where, oh, a 12-year-old can hear this, but a 30-year-old can't hear that. So it's based off of the HRTF.

[00:05:11.320] Kent Bye: I see, yeah, and it seems like, you know, Brian Hook was talking at Oculus Connect about implementing the Visisonics plugin and, you know, some audio considerations, and he was saying that if you just put a pure sine wave into this, that it's not going to work very well. It actually needs to have a broader range of frequencies in order for it to be filtered down to be able to be located. So maybe you could talk about, like, what kind of sounds and inputs would work with an HRTF and what ones will not.

[00:05:38.124] Rod Haxton: Okay, yeah, so to get the best spatialization, well, first of all we do wave files, we'll be probably moving over to AUG and MP3 as well, but to get the best spatialization in your sound, the sound energy should be over 20 Hz, because anything below that doesn't... spatialized very well so yeah we could spatialize the sound wave but it just wouldn't sound correct like say if it's over to the right side of your head you may point maybe 10 degrees off because it's not going to spatialize properly so We have a tool that can actually show developers like they just run their wave file through it and we show them the energy source and say okay this would be great for spatialization and maybe if it's not then they can bump up the energy in their wave file to get proper spatialization.

[00:06:29.432] Kent Bye: And so yeah, Oculus just at GDC this week had announced that they have a new audio SDK, and then Vizisonics, you have your own Unity SDK, and so maybe talk a bit about, you know, what should people be using? Should they use the Oculus, is that going to be the same, or should they go direct-to-source, or what would you recommend?

[00:06:47.256] Rod Haxton: Well, I'm not fully up to date on our licensing agreement with Oculus, but I know that Oculus has licensed our technology, they have our low-level engine, they've wrapped their SDK around that, so if you're using the Riff, I guess you just use Oculus' SDK because it's already built into the Riff, but our plugin works outside of Riff as well, so if you're doing a 3D game or if you have a 2D game that you want, 3D sound or any kind of application. It doesn't have to be a game and you want 3D sound, but you're not using the Rift like we're here partnering with OSVR. We're trying to work with other HMDs, then you could use our technology. So I haven't had a chance yet to download the Oculus SDK to see how it compares or difference from our plug-in. But since it's real space underneath, you'll get the same effects.

[00:07:36.422] Kent Bye: I see, yeah. And so, you know, I guess part of OSVR is trying to actually, you know, roll it once so it's a part of all of these other different platforms and not just the Oculus Rift. But, you know, if we take a step back and look at your SDK and what type of functions that you're exposing to developers, what are some of the things that a developer would be able to do to be able to spatialize audio?

[00:07:55.565] Rod Haxton: Okay, so if you're familiar with Unity, our plugin works the same way. We have an audio listener, we have an audio source, so in a matter of, I'd say, 10 minutes, you would have 3D sound in your game. The same way you'd put an audio source on an object in Unity, you put a real space 3D audio source on that object, and you can just keep the defaults, place a wave file on it, and you've got 3D sound in your game. We also allow scripting, so we have an API that On our audio sources you can have more than one audio clip assigned to that audio source, so if say you have dialogue and you want different avatar to say different things, then through scripting you can control that and play the next audio clip. Our API allows you to load audio clips on the fly. You can play the sound, stop the sound, pause the sound. We let you know the length of the audio, so if you want to cut into the clip, we allow you to set the min-max range. You can do that from the UI or through the API. There's a vast API there. It's in the manual. If you download our plugin, you get examples. You can do prefabs, you can instantiate on the fly.

[00:09:09.976] Kent Bye: Is there a trade-off between distance for how close something is to you relative to whether or not you should just do panning or do a fully spatialized audio?

[00:09:21.520] Rod Haxton: Yeah, on our plugin we allow you, if you wish you can toggle between non-3D and 3D, we have a min attenuation range as to how close you want to be able to hear the sound to the object, and we also allow you to set a max attenuation range. You can set whether or not you want it to be linear roll-off or a logarithmic roll-off. So you could set the max distance to be 10 meters and as you walk away from the audio source, the sound will attenuate lower, but you still get the spatialization as to the orientation of where you are. If you want a linear, like a fast just cut-off, then you can just set it to a linear roll-off and as soon as you hit that max range, it'll just cut off. But if you want a smooth, Roloff, then you would select a logarithmic Roloff. You can set up a virtual room. It's a shoebox-sized room, so you have a cube, a ceiling, a floor, a front wall, a back wall, a left and right wall, and you can set the material coefficients for glass, wood, plaster, and then we handle all the calculations on that. So you can set your environment up, and then the sound sources take on the properties of that room. the objects can move, they can be stationary, whatever.

[00:10:38.012] Kent Bye: What's the maximum number of audio objects that you would want to specialize in a given scene?

[00:10:43.083] Rod Haxton: We've gotten it up to, on the PC version, I just ran a test where just instantiating 3D audio sources, probably 60 or 70, but you'd never play all those at one time, but we're just wanting to see what we could do. It might be able to do even more. And we handle the calculations where if you're not in that sound source range, then we don't process the sound. So you can have, you know, pretty much unlimited if you place your sounds out, because in the real world, you're not going to have 20, objects all around or else you're just going to get noise anyway. So, it's up to the developer or the audio engineer to properly place their sounds in a location where they want it to be heard and not everything you want to be 3D sound. So, you can still have your ambient sounds as 2D or you can make them 3D. It's just up to the creative design of the developer or the audio engineers.

[00:11:37.630] Kent Bye: So, what are some of the new features or what's on the horizon in terms of what's not implemented yet?

[00:11:42.497] Rod Haxton: Okay, currently we don't have obstruction and occlusion. We have a beta of that working, so probably in our next release we'll have occlusion and obstruction. We plan to add generative audio, procedural audio capabilities to the plugin, even refine the engine to have a faster fill rate. Right now, each sound we process probably is 2% of the CPU, so we're not heavy on it, but if we could get even faster, we're looking at that. Probably incorporate more HRTFs into the database. Currently, we have five, and those work for the majority of the population, but once we get our own capturing of individual HRTFs, then it's beyond the limit. I keep a long list of things I want to add, and then we just prioritize them as to what we're going to put in.

[00:12:40.089] Kent Bye: What is the performance hit on, you know, obviously virtual reality is really hitting the edge of what the CPU and the GPU are able to do, so what are the implications of performance when you use something like HRTF?

[00:12:52.640] Rod Haxton: We haven't found any drawbacks because we process at a very low rate. So, you know, we know when a listener is out of range, so we stop processing on those audio sources. So, like I said, for each individual sound source, it's probably 2% of the CPU. We could probably get it to run on GPUs, but since we're going through the underlying audio engine of Unity, which is FMOD, So we don't want to get into that business of then trying to have to write our own audio engine for every platform. So we like the fact that Unity has FMOD and we can be above that and they handle all the low-level front work of working with every platform. So that way we can stay platform agnostic. And we also have a Wwise plug-in so that we can do the same and just let them handle that low-level work and we just stay above that and do what we do best.

[00:13:50.783] Kent Bye: Is there a plan support for Unreal Engine 4 then?

[00:13:53.785] Rod Haxton: Yeah, through Unreal Engine 4, we have the Wwise plug-in. And so that's how developers can get into UE4. We'll be releasing that on our web page in a couple of weeks. We're in beta with it right now. So we're just working with some select developers right now and getting beta feedback and trying to get that optimized and working properly with Wwise.

[00:14:18.992] Kent Bye: And you mentioned that the researcher had been working on this for like 10 years. And so virtual reality, I guess, has been around since the 60s. But was he using this in a VR context? Or what was some of this research being used for before it really caught on to VR?

[00:14:32.160] Rod Haxton: OK, so yes, Dr. Ramani Duraswamy, for the last 10 years at the University of Maryland, had been doing this sound for, it originally started with a DoD project where newly blinded soldiers. And the DoD came to say, how could you help soldiers find their way newly through this environment. So Dr. Dhiraswamy took his technology and made a prototype and so they took it out on the street and soldiers could find their way around basically like with a hearing aid with this 3D sound. So from that the company started where they have a spherical camera so it's five video cameras and 64 audio mics and it can take panoramic Video in 360 with the sound included and most of the cameras now on the market like jaunts or others they just do video so we and then you have to mix in the audio after ours does it live and right there. And the camera also is used in the auto industry where we visually show the user how the sound's propagating through the environment. So like for car dealers like Tesla, they use it for squeak rattles and bumps. So inside a car, they come off the production line and they have these noises and they want to know where it's coming from. So then they can add dampening properties to it so that, you know, the driver who's just paid for this luxury car doesn't hear little squeaks and rattles. and so they can visually see where the sound is coming from and then they can just easily go in and modify it and get it to not do what it's doing. We also can use it in concert halls for better acoustics, so you can actually clap your hands or say something and see how the sound is propagating. through the environment and showing all the reflections where it's bouncing off and through colors we show the intensity of that sound and how it's dissipating over time and so you can then drop that down and so that's also developed with the CTO of the company Adam O'Donovan. So he and Dr. Duraswamy did their PhDs for the camera and Dr. Duraswamy and Dmitri Zotkin did it for the sound and so all three started the company. And so from the camera, once I was hired and I saw the capabilities of 3D sound, then I had asked why hadn't it been placed into like a game engine and Adam mentioned that they had been trying to do it but didn't have the resources. So, I'd say two and a half years ago, the first attempt we did was we put it in the Unreal Engine, integrated it into the Unreal Engine, and I did a demo for that, and Sensex saw that demo and wanted us to do a demo for them for IDSEC. So we did it in Unreal, showed the capabilities of what we could do, and at that time, Brendan Areeb happened to be down at IDSEC with Oculus, so we showed it to him there and he listened, but I guess at that time, They were still focusing on their graphics and so our CEO Greg Wilkes came on board and last year at GDC we were in guerrilla mode. We were just walking around finding people, grabbing them, trying to get them to listen. I had a little shirt on that said where real space touch me if you want to hear a demo and we were just grabbing everybody and then Greg happened to grab Brendan and Brendan grabbed Palmer and they came over and listened to it and then from there negotiations I guess began. So we started with Unreal but Greg decided that there's a big indie user group through Unity, so we stopped our focus on Unreal and went to Unity and started developing the plugin for that. And so that got us more eyeballs and more beta testers. And so we've been working with a few select indie developers who are doing some really cool stuff. So we've been working with Blair Renard. He's doing Technolust. We worked with Aaron Lemke, and he's Anello Design, Eden River. We're working with Joe Chang. He's doing Phantasmo. We've worked with a couple other HMDs that are using our technology. I'm not going to say right now because I don't know if I have that liberty to say, but a few of them are testing us out as well.

[00:18:57.447] Kent Bye: Yeah, being able to visualize the sound wave propagation and slowing physics down and seeing that in a fully immersive virtual reality I think would be amazing just to sort of see how sound actually moves and it sounds like you have that. Do you have like full 3D models where you'd be able to actually throw that into a VR environment and actually see that happen?

[00:19:15.513] Rod Haxton: Yeah, and one other capability of the camera that I didn't mention was also surveillance. We can put the camera right in this room right now and everybody's talking around us and we do a technique called beamforming where we focus the microphones on just one area and you can block out all the other sounds and we can pick up say 30 meters across the hall the conversation that people are holding and so it can be used in that regards as well. We had talked to some casinos in Las Vegas and Adam and Greg spoke to them and they said yeah this is great technology because we could hear if people are cheating but what we don't want to hear is if they're using illegal money and so that would force us by law to report them to the authorities but we want their money so we don't want that. But yeah, being able to visualize the sound as well, that's why the company is called Vizisonics. You can visualize sound. And there are a couple of film production companies that have been talking with Adam and Greg and Romani about using our camera to be able to capture live video. But what we still have to add is we're working on a plug-in for, say, Pro Tools or the tools that these developers or audio engineers use so that we could actually... They can already capture it, but if they want to place more 3D sound objects in, we are going to allow them to edit that into their videos or, you know, film that they have taken using our camera. right off the bat gives you 360 panoramic view and Adam was recently in China where we had sold a camera over there and so he placed the camera on the Great Wall of China and filmed it right there so it's almost like a virtual tour experience you can go in. I think he's placed that up on the Oculus Share site so people can download that and experience it as well.

[00:21:03.304] Kent Bye: Has there been any VR experiences that really do an excellent job of integrating sound that you've seen that you'd recommend?

[00:21:09.966] Rod Haxton: Yeah, like I mentioned, Blair Rennard's Technolust, he's doing a cyberpunk type game. It's a really great game. He's put so much attention to detail in it that he uses the 3D sound very well in there and it gives it more of that immersive feeling like you're actually in this cyberpunk world. Aaron Lemke's Eden River is more of a relaxation, meditation type of environment. Actually, I forgot to mention Bully Entertainment. They've done this demo that we're doing here, Pure, and it's also more of a relaxation, float around. Bob Berkeley and Chris Apple and their team did a great job of showcasing our 3D sound audio, and so... It's basically up to the creative guys. We're middleware, so we don't do a great job of doing demos, but I guess one of the demos that I did, I guess, caught a lot of people's attention, and that was the Tuscany villa that Oculus made. And so we took that, and I placed a speaker inside the villa. We 3D sound the water fountain. got a little crazy and put a helicopter sound on the butterfly and that allowed people to easily see the capabilities of real space 3D and get them away from thinking that HRTFs were bad because before Oculus was in talks with us on their site people were saying okay well this is great you guys are doing great video great graphics but what about audio because that's the other half of it and so I got on their forum and said well check out real space 3D And people were like, well that's just HRTFs and that doesn't work and it'll never work, it's been tried before. And so actually it sort of scared us as a company where we got away from saying it was HRTF and we were like, we should say it's real sound or it's... you know, proper sound, or it's this and that, and it's like, well, you know, then that just skews it. We know that we do it right. Let's take ownership of that, say that HRTFs is better. And so now, once Oculus had licenses, then all the chatter of, oh, HRTFs doesn't work. went away. And it's like, okay, you know, and we're not the only ones in this space. There are a couple of other companies that, you know, are working in this space. And so HRTFs are getting ready to make a proper comeback, just as VR is. You know, it was panned in the 90s and everyone was afraid of doing it until Oculus stepped back out and Palmer showed that VR is well and now you see, you know, it's just taking off and that's great for us because the more HMDs, the better.

[00:23:48.265] Kent Bye: And finally, what do you see as the ultimate potential for virtual reality and what it might be able to enable?

[00:23:54.088] Rod Haxton: The sky's the limit, you know, there's so many applications that VR is going to take off in, not just games, but medicine and healthcare and architecture, even game modeling, you know. I'm sure Maya and all these other 3D graphics, they'll be able, you can go in and model your stuff in VR and, you know, it's just whatever the creative brain, there's some young kid right now, 15 years old, and his mother's you know, in his home in the basement and he's already crafting something right now that's going to be, you know, revolutionary in the next 20 years. So it's just a matter of, you know, everyone's creative juices and there's so many things right now that people are doing, you know, film, you look at alt space VR where you can go in and socialize. You know my brain There's so many possibilities and it's just great to be During this time, you know is back on the forefront of VR and helping to be a part of this new movement Great.

[00:24:58.508] Kent Bye: Well, thanks so much. All right. Thank you

More from this show