#399: OSSIC & 3D Audio as the Next Frontier of Immersion

Jason-RiggsTwo years ago, not very many people were thinking about how important audio was going to be for consumer VR. Jason Riggs was pitching what would eventually become OSSIC by claiming that it was going to be the Oculus for audio headphones. Two years later, his prophecy came true when his OSSIC X Headphones raised over $2.7 million dollars on Kickstarter, surpassing Oculus as the largest virtual reality crowdfunding campaign ever.

There’s clearly a lot of demand for high-end, immersive audio for VR, and Jason’s vision for where he wants to see the future of spatialized audio go has started to be realized. Beyond the VR applications of OSSIC X headphones, part of their success was that it could also have an immediate impact on existing 2D games with spatialized sound as well as recreating the sound of a home theater sound system.

I had a chance to catch up with OSSIC founder and CEO Jason Riggs at SVVR Conference where we talked about the technology behind OSSIC, dynamic HRTF measurements, how to quantitatively and qualitatively measure the accuracy of their 3D audio solution, and the challenges facing a potential open standard for 3D audio and that contain audio objects.

LISTEN TO THE VOICES OF VR PODCAST

Here’s the Kickstarter video that inspired over 10k contributors.
https://www.youtube.com/watch?v=MKHGxVG_7W8

Subscribe on iTunes

Donate to the Voices of VR Podcast Patreon

Music: Fatality & Summer Trip

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. My name is Kent Bye, and welcome to The Voices of VR Podcast. I'm continuing on with the audio theme this week. Yesterday I had Dolby Atmos and today I have Jason Riggs. He's the CEO and founder of the AUSIC headphones. So if you haven't heard of AUSIC, it's these headphones that are able to recreate a theater sound within your headphones. And so it's being able to specialize sound in a way that's way more sophisticated than most headphones that are out there. So, with virtual reality, the video is very important, but the audio just is able to really sell the scene and take the level of immersion and presence to the next level. So, ASIC is just coming off of a very successful Kickstarter campaign. In fact, the largest VR Kickstarter in history, just raising $2.7 million to be able to bring these headphones to market for both PC, but also mobile games as well. So we'll be talking about all that and, frankly, really geeking out about audio. Jason is a huge audio nerd, so we'll just be diving into it today on today's episode of the Voices of VR podcast. But first, a quick word from our sponsor. Today's episode is brought to you by The Virtual Reality Company. VRC is creating a lot of premier storytelling experiences and exploring this cross-section between art, story, and interactivity. They were responsible for creating the Martian VR experience, which was really the hottest ticket at Sundance, and a really smart balance between narrative and interactive. So if you'd like to watch a premier VR experience, then check out thevrcompany.com. So this interview with Jason Riggs happened at the Silicon Valley Virtual Reality Conference that was at the end of April. So with that, let's go ahead and dive right in.

[00:01:59.703] Jason Riggs: Cool. I'm Jason Riggs. I'm the CEO of AASIC. And we are really working on immersive and accurate 3D audio. And how do you do that in personal devices like headphones and earphones?

[00:02:11.147] Kent Bye: Great. So you just had a very successful Kickstarter. So maybe you could talk a bit about that and what that's going to enable you to do.

[00:02:18.554] Jason Riggs: Yeah, I mean, that was awesome. We're super excited that all 10,200 people showed up and bought some headphones. And one of the cool things about it is letting us tackle more platforms than we originally intended. So at first we were like, hey, we're going to solve a full 3D on PCs and on Macs. And now we're going to be able to also tackle Android and iOS and consoles, too. So, yeah, that's really helping us from a resource standpoint, being able to tackle this across platforms.

[00:02:46.735] Kent Bye: So maybe you could tell me about why are your headphones special? Like, what are you doing that makes it unique to VR?

[00:02:54.059] Jason Riggs: Yeah, so almost all 3D audio rendered over headphones and earphones is based on this sort of head-related transfer function model. And all that really means is for a sound at a given point in space, there are different signals that we get on each ear. And every point in space has different signals. But the other vector that's quite interesting is that that head-related transfer function is different for every person. So we have a bit of a snowflake thing going on with our anatomy. So human ears vary two to one in all their dimensions. And to date, all the 3D audio algorithms and all the kind of headphones that have done virtual surround sound have all been based on a one-size-fits-all algorithm, and it just doesn't work. All of them work quite well to get sounds out to the side, but when we get to the front and the back and up and down, there's a lot of confusion. And that's because we rely on the asymmetries of our individual anatomy to differentiate in that plane. And what we're doing that's a little bit unique is the headphones themselves are a sensor bed that measures your head size, your ear spacing, and calibrates to the shape of your ear so that we get your individualized head-related transfer function in it. And that makes the audio come from the right direction, it reduces confusion, and it makes it sound better.

[00:04:09.065] Kent Bye: And so does that mean that every time you put on a pair of Ausics that it's going to do some sort of scan of your ear and then feed that series of numbers or however you kind of boil it down? What's actually happening there?

[00:04:20.887] Jason Riggs: Yeah, sort of. We synthesize the HRTF and there are different components that add up to it, right? So, the thing like your head size and your ear spacing determines things like interaural time delay and interaural level difference. And so, the way we address those components is when you put this thing on, it measures how far apart your ears are. And that gives us a pretty good proxy for both your head size and your ear spacing. And that does go into the algorithm and it calculates these levels and delays dynamically. The ears we address a little bit differently, and that's maybe a long explanation, but we could scan your ear. We could measure it. In fact, we could put microphones in it. But if we did that, the first thing we'd have to do is uncalibrate the headphone that's on you. So when you put a headphone on, especially an over-the-ear headphone, there is a speaker in it, and it interacts with your ears and gives you some part of the cues you would get with a sound in space. So a partial HRTF of your pinna, that's your outer ear, you get some of that from a headphone. And so, What we're doing is we have an array of high frequency drivers in there and we in real time steer the high frequency component to hit your ear from the right angle. And what that allows us to do is not have to do that first uncalibration step, if you will, but also lets us do that in real time so that you don't have to do a scan or a measurement of your ear. Part of it is measured, but mostly what we're measuring is the head size and the ear spacing, and then the ear part is like real-time interaction to get the spatial cues of your ear just like you would from any sound in space, and then we have some correction from kind of far field to near field to make that work with this little grid of drivers, but that's how it does it.

[00:05:58.032] Kent Bye: Is there any software component or SDK, or is this something that is kind of like plug-and-play, you can plug the AUSIC headphones into any experience and then it's gonna just sound better?

[00:06:08.237] Jason Riggs: Yeah, I mean it depends. So there is, we're building SDKs for all the game engines and everything, and that's really a workaround to some extent to the problem that there haven't been standards that have been adopted by the industry for an output from a game to beyond five and seven channels. So, there are a lot of things. There are higher channel count formats, you know, there are proprietary things like Dolby Atmos, and in the cinema space, DTS-X, and on the broadcast side, MPEG-H, which itself is a hybrid of formats to support more channels, different order ambisonics, and object-based sounds. But that's kind of in flux. What is the way that experiences are going to pass out 3D content? And so we have kind of the old paradigm, which is like, hey, stereo, five-channel, seven-channel. So the way we address that is absolutely what you described. If you have a five-channel or a seven-channel output for an existing game, we virtualize the room and the seven speakers at the correct location, and we give you a sort of theater in headphone experience that goes way beyond what you would get with today's virtual surround sound headphones. So for your existing music, your existing games, your existing movies, that's how we're gonna address the world. Now, that's kind of a bridge, and so we have some demos here, we're playing 5.1 music, and it's pretty impressive when you really get that virtualized right and it sounds like a theater. That being said, the future is going beyond 5.1 and beyond 7.1. So in the demo we're giving here, it's the secret shopkeeper demo, it's called by Valve, and they were nice enough to give us access to that demo and let us plug everything in. In that case, we have both a 7.1 environmental mix going on in the background and about 50 objects that we render on 572 virtual locations and blend between. That's a very high-resolution spatial thing where we can get quite precise with the objects. So today the answer is those things are getting plugged into game engines. And that will be one solution, but the challenge is everyone's plugging in different things and some of them kind of work and some of them don't work as well or have limitations. I think in the future we'll start to standardize around 3D outputs so that they won't have to be one-off things plugged into experiences, but all games and experiences can start to say, hey, we output 22 channels, we output direct access to our objects, we output third-order ambisonics, whatever that is, and then we'll be able to address that with that high spatial resolution kind of uniformly as an industry.

[00:08:27.878] Kent Bye: So, what would your ideal solution for some sort of standardized format be? Like, what would all the things that you would need to be able to carry out what you're doing with this?

[00:08:37.781] Jason Riggs: I mean, I think we're agnostic to what that format is, but more spatial resolution is better. So, I mean, in the simplest case, we can think of most of this as mapping onto a sphere. Now, indeed, there are a lot of things we're also doing with depth, right? So it's not just a sphere or a shell. But a lot of those depth things can be encoded onto the signal. So things like level and the reverb and the reflections can still be encoded onto that sphere. And so a lot of these are kind of spherical formats. So on the channel side, we could go up from 5 channel and we can do 11 channel or 22 channel or 44 channel with certain locations in space kind of rendered onto this sphere, right? But doing speakers beyond like a 5.1 or 7.1, which is just a planar horizontal mix and now adding height potentially above and below. Ambisonics is a different way to do that. So that's by nature a spherical format that can increase in order and can give more spatial resolution as we go to higher order ambisonics. That works too. Or having direct access to the object. I don't know what the standard will be or what will sort of get adopted or locked into, but in any case, what we want is more spatial resolution. If we think about it, in most points in space, we can hear within two degree accuracy. So if we fill the whole sphere out with that, we may need something like 3,000 points in space to get up to the limit of our perception. Most content out in the world is stereo, especially in music. In games, at most, it's mostly 5.1 or 7.1. So it'd be like if we had a visual display that had 2 or 5 or 7 pixels. That's not a lot of resolution. And so from a spatial resolution standpoint, I think the main thing we need to think about is that we need more. But the ideal case would be adopting one of those formats or some of the standards that are coming out for how to pass direct information about the objects outside of the game engine, so that the rendering piece and the peripherals and the headphones and the things that we use can evolve on their own. And it doesn't have to be one thing that each person who creates an experience or game solves and plugs in on their own, because that's kind of a mess.

[00:10:35.365] Kent Bye: Yeah, and it sounds like, you know, in a lot of these domains with these mediums, there's often a proprietary solution. So in this case, there's Adobe Atmos with their whole system where they're taking all these different channels, but also having the ability to mix it within this virtual environment, and then they're able to kind of encode that down into this format and then to give it eventually into the headphones or the speakers, whatever the case might be. But on the other end is more of the approach of doing some sort of open standard in some ways. And it seems like ASIC would be advocating for that. Either working with, I would see, at least two big different use cases. Either you're coming from virtualized sound, which is coming from a game engine, or coming from live action recorded ambisonic field recordings that are then somehow mixed together. So it seems like there needs to be another layer of a tool set as well in order to start to even mix some of these and then deliver them.

[00:11:26.634] Jason Riggs: Yeah, I mean, there's absolutely even a spectrum between those two. I mean, if we look at the way a concert is recorded, right, we could have a ambisonic or a sound field mic, but that's not really the norm right now. So we might also want to combine that with close mics so that the person who's doing the engineering can sort of mix these. So now we're talking about a paradigm of 3D live capture combined with production, which is like a spectrum. And then on the other side, Like what you're mentioning, in game engines, we're taking a lot of mono recording sounds and we're physically placing them in this virtual environment and trying to recreate that. So that's one end of the spectrum. Pure 3D capture is another end of the spectrum. Especially in the pure 3D capture, I would say none of the tools are mature right now. People come to us all the time like, what 3D mic do we use and what sound field mic do we use? And we're like, well, we have this, we're trying this. And like, cool, now how do we put all that ambisonics into something? And it's like, well, most of that workflow is not there. And so, yes. And I mean, we're agnostic to the format. We want to support all the formats, right? Because we want full access. But we do have some challenges with people who have proprietary formats. We've gone to some of those companies and we've been asking for a year, will you give us a decoder so that we can decode your format? And the answer is like, oh, we don't have a decoder. We haven't built the software one. Those are all on chips and receivers or whatever. And it's like, that can't work. That can't be the answer. So you look at some of these that are coming out in games and the answer is, oh, you need a $30,000 home stereo to hear it. Like, that's insane. So yeah, we are in some ways advocates of open standards and being a little agnostic to that. You know, I think one exciting thing about headphones as a 3D rendering device is that it can free us from the constraint of Xboxes in our room and all of this physical hardware. and ultimately it has the ability to render really high-resolution 3D audio. Like in this demo we're doing, so we have a 7.1 bed, but as I mentioned, we have 572 virtual speakers that we're panning between, like no one's gonna have 572 speakers at home, right? But you can do that on headphones, and when it works right, I think ultimately headphones are going to be able to recreate 3D experiences that you just can't do on any grid of speakers.

[00:13:28.603] Kent Bye: Yeah, to me it seems like one of the biggest challenges for AUSEC moving forward is trying to get the people the tools that they can use to actually produce the content that people can use the full potential of what is capable in the headphones because, you know, you're able to take, you know, Valve's demo and be able to integrate it and use SDK and people are doing games but I'm thinking about in terms of like live-action film and like there's a big challenge of like what tool set do they use in order to actually produce the format that's going to feed in to give the best audio experience.

[00:13:59.070] Jason Riggs: Yeah, I mean we, I'll give you a different example. So we just joined this music tech incubation program with Abbey Road Studios and they have this program called Abbey Road Red and they're figuring out like what is the future of music, right? And so we go to Abbey Road and it's like the first recording studio in the world, right? The first dedicated recording studio and you walk in the door and it's like first stereo recording, 1931, you know, the plaque on the wall. It's like all this history of recording innovation. they have the same question, which is like, okay, what do we do beyond five channels and beyond this thing? What is the future of creating music in full 3D? Totally different, but again, the tools, in some ways, VR games have an advantage because game engines have already been a dynamic mixing thing where the character and the person moves and rotates and does all that, so the mixing has been dynamic. If we get into the film side and the music side, that's a new paradigm that you can turn your head around in the mix and it needs to react to you, right? And so to some extent, these tools are going to collide, the workflows are going to collide, and we're going to have to solve it all. And maybe games are a little bit ahead of that, but, you know, agreed. I think the exciting thing for us, though, is that, like, even this demo we have, you know, so back to Abbey Road, we're demoing Dark Side of the Moon here, right? And it's like a recording that was done in Quadraphonic in 1972. So it's like 45 years old and most people have never heard it in quadraphonic or surround sound. Because why? Because people didn't have quadraphonic record players or four speakers or they didn't have a super audio CD player and five speakers, right? So the cool thing is on headphones we can now play that as it was intended to sound all around you and I mean I think that's cool. And it turns out there were about 6,000 discs recorded in DVD audio and super audio CD and with surround sound music that a lot of people have never heard. And then of course on the game side we have 5.1 and we have 7.1 and these can be way more immersive than people have heard these things with virtual surround headphones. So I think that's like the bridge to the future is to just take things that are more immersive than stereo and let people experience them and go like, wow, this is pretty cool even in five channels. But to your point, the end of that bridge is like high resolution, high spatial resolution, a fully immersive 3D experience with height and depth and go breaking the mold of X channels or speakers.

[00:16:12.227] Kent Bye: And so are you doing any head tracking when you're on, are there any IMUs to be able to detect position, or is all that information coming from the virtual reality head-mounted display?

[00:16:20.909] Jason Riggs: Yeah, I mean, we do. We do have that built into the headphone. And so if you're experiencing any content without a HMD, OK, so you want to watch movies on a screen, you want to listen to a music-only experience, we have a tracker built into the unit. And yeah, it's super important, OK? I mean, not only does it give you directional cues and it kind of helps resolve confusions, you just can't get good out-of-head depth without a tracker. There's something about the illusion as soon as things turn with your head and your frame of reference that make you not believe that you're in an experience. I mean, imagine vigils on virtual reality if you didn't have tracking. It would just be a screen stuck to your eyes. And the sense of depth and the sense of presence could not exist without things seeming fixed in the environment. Audio is exactly the same. So you need the tracking, the sounds need to stay fixed in space. As soon as you turn that off, it starts to destroy the illusion of depth and kind of presence that you're in an environment and just feel like you have something attached to your head. So, super important. With HMD use, we're doing the tracking in the game engine and we're just using the tracking off the HMD. So like in this demo we're doing with the Vive, we turn our tracker off. We're using the Vive's tracker, it's doing the mix that way. But if we're mixing 5.1 music or 5.1 movies that you're watching on a flat screen, we still want to have the tracker. Even if your movement's minimal, if you're watching a movie and you're taking a set, just the fact that you may turn your head five degrees this way, it's enough to really give it that fixed perspective and make the center dialogue lock onto the screen. Even without a lot of movement, just having the perspective being right and being fixed to the room helps with the illusion.

[00:17:53.335] Kent Bye: What's one of your favorite experiences that you've had with audio?

[00:17:57.940] Jason Riggs: Okay, I mean, so the Dark Side of the Moon one, part of why we're demoing it, okay, I happen to love that album and it's like one of the Surround Sound albums that was great, and I think in part because it was originally mixed on Quadraphonic for Surround Sound. I mean, in fact, when they played that thing live, they had speakers behind the audience and they invented these joysticks to do 3D mixing to spin the sound around the people, right? So it was supposed to be that way. And so all the voices and the clocks and the effects in that album are supposed to go around you. And so many people have heard it in stereo and they've just missed the way it was originally recorded. And so when we finally got a, we dug up a super audio CD player and pulled it in and plugged it into our thing. And when we played it, I started playing the album and I'd heard the album a lot before on speakers, but. We started playing and our CTO Joy came up and was trying to interrupt me with something and I was just like, hey, get out of here. I literally listened to the whole album from the beginning to the end on the headphones because it's an awesome album. It just made me happy. We had the headphones dialed in and it was working and it just sounded like you were in a kick-ass listening room with the speakers around you. Just being able to listen to that whole album and experience it in Surround Sound was awesome on headphones.

[00:19:03.603] Kent Bye: Well, it seems like audio is often like afterthought in virtual reality. And so from your perspective, what does audio give you when it comes to adding that extra sense of presence?

[00:19:14.507] Jason Riggs: It is an afterthought. I mean, definitely my original pitch when we started the company not quite two years ago was I was like, you know, we're going to be Oculus for the ears. And it was, I had this graph I kept showing, which was visual immersion that went up like this and the audio like not. And like, audio is going to be the big story two years from now, everybody, you know, so luckily people are talking about it and that's happening. But what it gives you, think about it this way. Our auditory senses can take in 3D and we can perceive sounds within two degrees of accuracy almost everywhere in the sphere. Okay, it's different at different points in the sphere, but the point is, we have a natural 3D sense and that's sound, not visuals. Okay, yeah, we can perceive 3D with visuals, but When we put the head-mounted display in, we can only see about 10-15% of the sphere. So it's still a rectangle. It's a really great rectangle that we can move around and have depth perception and explore a 3D world, but we can only take in 10-15% of the sphere. So the other 85% of the sphere, the way we're going to take in that space and we're going to experience it is through sound. So, A, the sense of presence in the environment, just taking in that environment, knowing we're in a cathedral, like sound. You know, that's how we know it in real life, right? How we take it in, the reverberation and the sense of space. It's gonna be super important for presence. But the other part is storytelling. We saw so many bad experiences where they didn't have spatial audio in them and people missed, there was a character behind them or something above them. Even this shotkeeper demo that we have from Valve, the first version with no spatial audio, at the end a monster comes in and rips off the roof and roars at you. We gave that demo without spatial audio and found 90% of the people never looked at the monster. They just looked around, but it was very unnatural to look straight up. That's the last place you look. And when we put the spatial audio in, it flipped. About 90% of the people looked right at it. So it's the most subtle way to direct attention and do storytelling in 3D is to use the 3D sense we already have, which is sound, which can take in that sphere and know that we should direct your attention and turn around. And so I think that's what it's going to really bring to the table is the natural way to explore 3D and direct attention without kind of forcing it with big blinking lights and a lot of spotlights and weird things to move you around.

[00:21:19.990] Kent Bye: A lot of times when people are buying headphones the thing that they look at is like the frequency response of like how well does it deal with low frequencies and high frequencies and they look for like this kind of flat line and you know so for you as you're kind of developing these new immersive audio headphones what are some of those graphs of the future are real metrics that you're trying to boil down and tell the story of what the capabilities are?

[00:21:43.323] Jason Riggs: Yeah, so those things are neat, but headphones, depending how you measure them, you can totally change the shape of that curve, right? What ear coupler you use, how you measure them on a head, and if you look at every headphone out there in the world, they pretty much all say, like, frequency response 20 to 20 kilohertz. It's like a joke. You know, one thing that we look at that's quite different is we have this interesting collection of dummy heads and ears. We have about 30 sets of ears and seven heads and four torsos, and we can Mr. Potato Head them together to create all kinds of anatomy sets. And we take those anatomy sets and we measure actual speakers out in space at two meters at every point in the sphere. And then we do the same measurement with our headphone when we're trying to reproduce those points in space. And we make sure that we can get the exact same spectra measured at a person's ear canal with the headphones as we can on the speakers. So we match all the objective cues that would tell you where it is. And then we get that right, we go through subjective panel listening where we have people do blind testing and we put the points in space and they say where they believe they come from. And we do correlation scores where we see what is the accuracy of this putting things in space. So I think those are the kind of metrics we're going to be talking about when we're talking about can headphones recreate things accurately spatially. And it's just maybe a set of metrics that people aren't familiar with, but it is inter-world time differences, level differences, and frequency responses, but for each point in space. And they're all different, and those are the cues that let us know where something is in space.

[00:23:09.674] Kent Bye: Well, the interesting thing to me is that you're starting to get into the realm where you're no longer able to just kind of do this quantifiable measurement that you do actually kind of have to mix the things that you can physically measure, but also with the subjective qualitative experience. And so to me, that seems like it's a bit of a challenge as a company to be able to produce something that does require people actually using it and testing it and seeing it. So as you're doing this quality assurance, how do you ensure that, you know, what you're producing is actually working across a large spectrum of people?

[00:23:37.385] Jason Riggs: Yeah, we test it. But I mean, I think to some extent that is the history of audio, right? It's like you can't design any kind of audio product by the numbers. But I would say that there is a blend of art and science there. But fundamentally, the spatial hearing mechanism, it's pretty well understood. And I mean, there is 20 years of research supporting it. It is one of the challenges which is we each have different anatomy and so that's the challenge we're trying to solve is that in the past all of these things have been a one-size-fits-all HRTF algorithm and it doesn't make sense when a population of human adults ears vary in all dimensions two to one. So there is no algorithm that's going to work well for that range. And so what we do is we test it on a lot of sets of anatomy with a lot of dummy heads and make sure that we get the calibration right. And then ultimately have people listen, say where the sounds are coming from, and distill that down into accuracy score. And what we see is with the generic things, they only get to about 20 to 30% accuracy. And we're getting 90 to 95% accuracy across the whole range of people with different anatomies. There is a lot of science behind it, but ultimately, you kind of need to check it with people and make sure you're not fooling yourself.

[00:24:45.606] Kent Bye: So what do you want to experience with audio then?

[00:24:49.368] Jason Riggs: You know, I mean, I think ultimately we would love to create some of the future of 3D audio experiences that, you know, there's different things, right? Like audio, even in the music side, it's about audio reproduction. Fundamentally, that's where this all started. So can we recreate, you know, the sensation that you were in one of the great concert halls with the symphony in front of you and you can pick out every instrument exactly where they are and take in the sense of space. I think that's possible, but that really hasn't been possible on these sort of channel-based formats in the past. Or being in a coliseum, or being in one of the great cathedrals with a choir singing in an organ, right? This is a very 3D spatial experience that we take in and I think The formats, you know, I'm giving you a few music examples, but fundamentally those are examples about you being somewhere, right? And understanding, I'm in a cathedral. This is what a cathedral absolutely sounds like. And that's what audio can bring to VR, right? It can bring that sense of space that's like, oh my God, like, you know, the hair standing up on the back of my neck when the organ plays, you know, you usually don't get that with reproduced sound yet. So I think, You know, we could argue if the point is about being indistinguishable from reality, like maybe. I think if we had the ability to make something indistinguishable from reality, we could probably make things that were hyper real and surreal and beyond reality too. But that first thing is like kind of how for us, it's like how do we get sounds really accurately in space so everyone hears them at the right spot. And once we can do that, we can create 3D sound fields. And once we can do that, we can create an audio sense of presence on its own. And so that's kind of where we're going with it. But I think also, freeing these things up for the content creators to create amazing experiences. Like, we were playing with the Wave guys over there have this, you know, 3D DJ thing. It was pretty cool. And I think that's an exciting thing. Like, how would we create music in 3D? How would we mix it? How would we create an electronic or synth experience if we could just put sounds anywhere? And that's the question we've had. Like, what's the Tilt Brush experience for audio, right? Like, what would it be if we just gave you a kind of audio-acoustic sandbox and let you go to town and build stuff and play with things spatially? I think there's some fun stuff on the horizon.

[00:26:56.405] Kent Bye: I know there's a number of different audio plugins for Unity where they're starting to handle a lot of these specialized audio. And also in talking to Dolby Atmos, they have a full simulation of being able to set the size of the room and the material properties and reflection properties. Is that something that you think, as a company, AASIC would develop your own solution, SDK? Or is this something that you're going to be relying on these third-party formats or perhaps an eventual open standard to be able to do everything with the reflections and the room size and everything like that?

[00:27:28.333] Jason Riggs: I mean we're building that suite of tools only to make sure that we can be plugged into everything. We're also talking with a lot of the tools companies about interfacing with their tools. I mean fundamentally the business that we're building is how do we deal with the interaction between the device and the human and the variability in the human to get accurate 3D audio. But those plugins are all important. And so for us, it's like being open, you know, making sure that we can, to whatever extent all these different things are plugged into different experiences that we can interact with them. And I think that's kind of the bridge between the 5.1 and the 7.1 until we get some of these standards. But yeah, we want to be agnostic to that. We want to play with everybody to the extent that they would let us. It would be cool if some of those guys would make decoders public and let people actually access their stuff. But, you know, hopefully that's the future.

[00:28:18.766] Kent Bye: And finally, what do you see as kind of the ultimate potential of virtual reality and what it might be able to enable?

[00:28:25.390] Jason Riggs: I mean, you know, a lot of the things we're starting with right now are kind of around entertainment, right? And so it's like, what is the 3D virtual reality parallel to our existing world of games and cinema and music? And that's where we're starting. I think the potential goes way beyond that. It's probably early to say what the maturity looks like in that but of course both in VR and AR there's going to be huge opportunities for communication and telepresence and all these other things which like we haven't dug into deeply. We have people coming to us all the time with like just military things and simulation and training and like all kinds of applications that are way off our radar, but that require accurate 3D audio as part of it. So, I don't know. I mean, I think the world's wide open. It's kind of the Wild West, like we're just getting started here. I think we're going to see a lot of experiences that maybe didn't exist in those old paradigms that are going to be opened up. But for the most part, for us right now, it is entertainment and getting VR cinematics and games and figuring out what is possible with 3D music. I think that's where we're starting.

[00:29:28.693] Kent Bye: Is there anything else that's left unsaid that you'd like to say?

[00:29:31.675] Jason Riggs: You know, hey, thanks to everybody who backed our Kickstarter campaign. That was huge for us, and it's helping us go wider with it and support all these different platforms and hopefully make a little dent in the universe around 3D audio. Yeah, if you didn't back it, we have pre-orders on our website. But, yeah, I mean, you know, we just want to raise awareness around 3D audio and we believe the audio is going to be a really important piece of this and, you know, exciting to see where we'll be a year from now with some of the 3D audio experiences because it's evolving pretty rapidly.

[00:30:00.866] Kent Bye: Yeah, it's interesting just to kind of bring back to what you said of you want to be the Oculus of sound is that Oculus raised like $2.5 million and maybe you could talk a bit about how much you raised on Kickstarter.

[00:30:10.835] Jason Riggs: We raised 2.7. So I think it is the biggest Kickstarter VR campaign yet. And so, you know, that's awesome. And again, you know, thanks to everyone for supporting it. You know, I think it shows that people are interested in audio. Of course, we have a little bit maybe of an advantage over some of the visual pieces of VR. If you imagine when Oculus came on, it was like... cool you know it shipped and you had this like Tuscan Villa kind of demo and right like all that everything had to be created one of the cool things on the audio side is we do have a lot of immersive experiences it could be better already today that are the music that are the games today that are the things today and so I think that's kind of helping us that this bridge to the future the fact that people can get these headphones and have a kick-ass experience on Counter-Strike with more accurate 3d audio than they've ever had before And then they're also going to use it with their Oculus and Vive, right? If they have the Oculus yet or if they have the Vive yet or not, that's where it's going. But they can also have some pretty cool immersive audio experiences with the other content. And so I think that's one exciting thing about the audio side is that people are going to use this with visual VR. But I'm excited about what is audio VR? What is the 3D music experience? What is the 2016 equivalent of 1972's Dark Side of the Moon that was experienced all around you? What can we do now when we unlock the full 3D? And I think we're going to see some pretty kick-ass audio-only experiences and sort of music equivalent of VR. And what is that? It's not just the visuals, of course, a sense of presence, I think visuals and audio can tie together in an amazing way. But some of the coolest experiences I've seen, like we saw this Notes on Blindness, I don't know if you've seen that one, or like some of the things where you tone down the visuals a little and you put the person in the dark and you take some of that away and then let them experience the audio, you know, it's kind of fun. It's kind of fun to not, because sometimes, like even the demo we have here on the Shopkeeper one, It's like overwhelming, there's a lot of visual things, there's a lot of interactive pieces, so you're moving around, you're playing with things, you're taking things in, things are jumping out and popping at you and there's arrows and frog's tongues and jack-in-the-boxes and like, you know, the audio's cool, but that's like a sensory overload sort of experience. And and I think one thing that will be neat to see how people explore with the great audio visuals like toning the visuals down a little you know putting you in a room and shutting the lights off right and all of a sudden taking it in that something's approaching you and maybe There's lightning or a flash and a little hint of the visuals and then then letting you explore the audio I think that's where the 3d audio is gonna get people really really excited when when it gets dark Awesome.

[00:32:36.201] Kent Bye: Well, thank you so much Jason.

[00:32:37.362] Jason Riggs: Hey, no. Thank you. It was a pleasure

[00:32:40.058] Kent Bye: So that was Jason Riggs. He's the CEO and founder of the AASIC Headphones, which is trying to do to audio what the Oculus Rift did for VR. So I have a number of different takeaways about this interview is that first of all, I've had a number of different demos with the AUSIC headphones, and the one at Silicon Valley Virtual Reality Conference I think was a lot better than the one that I had previously at GDC. And the sound was really good. And the spatialization I think came through a lot stronger at SVVR than it was when I first had seen it at GDC. But I think the thing that's really interesting about a project like this is that they kind of sold it in a way for existing 2D games that will be able to immediately already be applicable to other games, but also just audiophiles and music lovers. Because you're essentially being able to recreate a theater sound within the headphones. And I had a chance to listen to the quadraphonic recording of The Dark Side of the Moon, and it did sound really amazing. And I'm overall really excited about the potential for headphones like this to be applied and integrated within VR experiences. Now the thing that makes me a little bit cautious to see how far this is going to actually go is that you're going to have to actually integrate the SDK and a lot within the software side within the experience in order to actually use the headphones. What that essentially means is that that's going to limit the amount of VR developers who are already kind of strapped for time and energy to be able to finish the features they've already got It's just yet another SDK and piece of hardware that they have to buy and integrate and do quality assurance testing So getting the software adoption, I think is going to be a big challenge and so it's part of the reason why I really wanted to talk about the larger open standard because Essentially where I think eventually we want to get to is the point where these game engines to be able to actually natively just export some sort of open standard format that is able to deliver this audio object sound files, which are including both the original sort of wave files that are from the individual sources, but being able to kind of dynamically mix it all together. So this is something that clearly Dolby Atmos has already figured out, and you can kind of get a sense from this interview that Dolby Atmos wasn't all that interested in collaborating and kind of handing over the keys to their kingdom to AASIC. And so I'm not actually really surprised by that. That is just their business model. That's their approach, their business. They're trying to stay in business by creating a technology that they're licensing out. And so that's just kind of par for the course. And like I mentioned in the Dolby Atmos interview, Neil Trivett just talking about like any sort of successful open standard has a proprietary competitor. So in audio space, I think Dolby Atmos is kind of the leading proprietary solution to be able to do something that works. It's going to have integrations for all the different Hollywood theaters, and you're going to have to pay a license to be able to use it per piece of media. They have all sorts of business models where you have to kind of deal with their sales people to get more information about that. But in terms of Ausik's perspective, I think in order for them to be really successful and to get to the next level, I think there's going to have to be a big push for open standards when it comes to audio. And that, I think, is still something that's quite a bit up in the air. And I'm sure that's actually already happening. I just haven't heard of anything specifically yet. If you're working on that, then please do reach out. I'd love to hear more about it and kind of feature it here on the podcast, because I think it's something that's both important and interesting to the overall VR industry. So the other takeaway that I had here is just that there was a lot of really super geeky audio speak in this podcast. In fact, I'll probably have to go back and listen to it again and Google some of the stuff. That's stuff that I'm not completely all fully aware of. If you listened to this and got lost at any point, don't worry, I didn't catch completely all of it either. But I got the overall gist and I really love where AASIC is going and want to really support their initiative of trying to create something that I think is the next frontier. We started with the visuals, with the HMDs, I think audio is the next frontier in terms of really creating these high-fidelity experiences with audio. And then the motion tracking I think is going to be a huge part and different innovations that are happening on that realm. And then finally with the haptics I think is going to be the things that are going to be the main sensory inputs that we're going to have that are going to drive the next generation and future of immersive experiences and virtual reality. So with that, I wanted to just thank you for joining me here on the podcast and geeking out about audio. We're halfway through our series of the week focusing on audio within VR. So if you enjoy this, then please do spread the word. Let your friends know. You can follow me on Twitter, at Kent By. this week actually at the International Joint Conference on Artificial Intelligence. And so you can check out Voices of AI and the Voices of AI podcast that I hope to be starting up here soon. And yeah, help spread the word, tell your friends, leave some reviews on iTunes. And if you want to help out financially, then that would be awesome. Please go to patreon.com slash Voices of VR.

More from this show