#494: OSSIC CEO: The Future of Music is Immersive & Interactive

OSSIC debuted their latest OSSIC X headphone prototype at CES this year with one of the best immersive audio demos that I’ve heard yet. OSSIC CEO Jason Riggs told me that their headphones do a dynamic calibration of your ears in order to render out near-field audio that is customized to your anatomy, and they had a new interactive audio sandbox environment where you could do a live mix of audio objects in a 360-degree environment at different heights and depths. OSSIC also was a participant in Abbey Road Studio’s Red Incubator looking at the future of music production, and Riggs makes the bold prediction that the future of music is going to be both immersive and interactive.

LISTEN TO THE VOICES OF VR PODCAST

We do a deep dive into immersive audio on today’s podcast where Riggs explains in detail their audio rendering pipeline and how their dynamic calibration of ear anatomy enables their integrated hardware to replicate near-field audio objects better than any other software solution. When audio objects are within 1 meter, then they use a dynamic head-related transfer function (HRTF) in order to calculate the proper interaural time differences (ITD) and interaural level differences (ILD) that are unique to your ear anatomy. Their dynamic calibration also helps to localize high frequency sounds from 1-2 kHz when they are in front, above, or behind you.

Riggs says that they’ve been collaborating with Abbey Road Studios in order to figure out the future of music, which Riggs believes that is going to be both immersive and interactive. There are two ends of the spectrum from audio production ranging from pure live capture and pure audio production, which happens to mirror the differences between passive 360 video capture and interative, real-time CGI games. Right now the music industry is solidly in the static, multi-channel-based audio, but that the future tools of audio production are going to look more like a real-time game engine than the existing fixed perspective and flat-world, audio mixing boards.

https://www.youtube.com/watch?v=UQY7rm8WyI8

OSSIC has started to work on figuring out the production pipeline for the passive, pure live capture end of the spectrum first. They’ve been using higher-order ambisonic microphones like the 32-element em32 Eigenmike microphone array from mh acoustics. They’re able to capture a lot more spatial resolution than with a standard 4-channel, first-order ambisonic microphone. Both of these approaches capture a sound sphere shell of a location with all of it’s directed and reflected sound properties that can transport you to another place.

But Riggs says that there’s a limited amount of depth information that can be captured and transmitted with this type of passive and non-volumetric ambisonic recording. The other end of the spectrum is pure audio production, which can do volumetric audio that is real-time and interactive by using audio objects in a simulated 3D space. OSSIC produced an interactive audio demos using Unity that is able to produce audio in the near-field of less than 1 meter distance.

The future of interactive music faces similar challenges to the similar tension between 360 videos and interactive game environments, which is that it’s difficult to balance the user’s agency with the process of creating authored compositions. Some ways to incorporate interactivity with a music experience is to allow the user to live mix an existing authored music composition with audio objects in a 3D space or to play an audio-reactive game like AudioShield that creates dynamic gameplay based upon the unique sound profile of each piece of music. These are ways to engage the agency of the user, but neither of these actually provide any meaningful way for the user to impact how the music composition unfolds. Finding that balance between authorship and interactivity is one of the biggest open questions about the future of music, and no one really knows what that will look like. The only thing that Riggs knows for sure is that real-time game engines like Unity or Unreal are going to be much more well-suited to facilitate this type of interaction than the existing tools of production of channel-based music.

Multi-channel ambisonic formats are becoming more standardized for the 360-videos platforms on Facebook and Google’s YouTube, but there still only output binaural stereo output. Riggs says that he’s been working behind the scenes to provide higher level fidelity outputs for integrated immersive hardware solutions like the OSSIC X since they’re currently not using the best spatialization process to get the best performance out of the OSSIC headphones.

As far as formats for the other end of pure production, there is no emerging standard for an open format of object-based audio. He hopes that that eventually this will come, and that there will be plug-ins for OSSIC headphones and software to be able to dynamically change the reflective properties of a virtualized room, or to be able to dynamically modulate properties of the audio objects.

As game engines eventually move to real-time, physics-based audio propagation models where sound is constructed in real-time, Riggs says that this will still need good spatialization with integrated hardware and software solutions otherwise it’ll just sound like good reverb without any localized cues.

At this point, audio is still taking a backseat to the visuals with a limited 2-3% budget of CPU capacity, and Riggs hopes that there will be a series of audio demos in 2017 that show the power of properly spatialized audio. OSSIC’s interactive sound demo at CES was the most impressive example of audio spatialization that I’ve heard so far, and they’re shaping up to be the real leader of immersive audio. Riggs said that they’ve got a lot of feedback from game studios that they don’t want to use a customized audio production solution by OSSIC, but they want to use their existing production pipeline and have OSSIC be compatible with that. So VR developers should be getting more information for how to best integrate with the OSSIC hardware in 2017 as their OSSIC X headphones will start shipping in Spring of this year.

Rough Transcript

[00:00:05.412] Kent Bye: The Voices of VR Podcast. My name is Kent Bye, and welcome to the Voices of VR podcast. So I went to the Consumer Electronics Show this year, and one of the most impressive demos that I saw all week was from AASIC. They're a 3D immersive audio company, and for the first time, they had an interactive demo that really showcased the power of their hardware and software audio spatialization solution. So AASIC raised over 2.7 million dollars on Kickstarter and they were showing some of the AASIC-X prototypes that were off the production line. But they also took the time to create an entire interactive audio demo. Up to that point, a lot of their demos that they had been showing were using repurposed demos from like Valve's secret shop. And to be honest, it wasn't really a great demonstration of what their audio platform is really capable of. This time they had this interactive audio demo that just really blew me away. It was really the first time that I was able to start to shut my eyes and really start to locate different objects in space. I had a chance to talk to the CEO of AASIC, Jason Riggs, where we really take a deep dive into immersive audio and the importance of the calibration steps that they're doing in order to start to work with the near field. And AASIC has also been participating in Abbey Road Studios' Red Incubator, where they've been able to explore the future of music production that's a lot more immersive and interactive. So we're talking about the latest immersive audio innovations from AASIC as well as the future of music on today's episode of the Voices of VR podcast. But first, a quick word from our sponsor. Today's episode is brought to you by the Silicon Valley Virtual Reality Conference and Expo. SUVR is the can't miss virtual reality event of the year. It brings together the full diversity of the virtual reality ecosystem, and I often tell people if they can only go to one VR conference, then be sure to make it SVVR. You'll just have a ton of networking opportunities and a huge expo floor that shows a wide range of all the different VR industries. SVVR 2017 is happening March 29th to 31st, so go to vrexpo.com to sign up today. So this interview with Jason happened at the Consumer Electronics Show happening in Las Vegas from January 5th to 8th, 2017. So, with that, let's go ahead and dive right in.

[00:02:39.105] Jason Riggs: Hey, Jason Riggs, CEO of ASIC, and we're working on immersive 3D audio. And particularly for VR, you know, it's really in some ways the killer app for immersive and 3D audio, right? We've broken the rectangle, we're putting you into the 3D world, and audio is your 3D sense. And so what we're working on is how can we really get that accurate and immersive spatial audio that also sounds great. So one of the big bottlenecks is spatial audio is based on the head-related transfer function, and a lot of the core that we're doing is really how to calibrate to the individual's HRTF through smart hardware and sensors within that that can learn about you. And so what we're demoing at the show here is the latest version of the ASIC X. This is kind of the first tooled version that's coming off of the early assembly line, and we'll be launching that product in spring. We did a big Kickstarter back in April and have been doing some pre-orders after that, so everyone's eagerly awaiting this first unit. So this is kind of our first demo of the tooled one with the full features and the microphones and all the next pieces on it. The other demos we're showing at the show, so we're starting to do more of our own content and build some experiences to really showcase what's possible in 3D audio. And so we have a little ASIC sandbox experience, which is a song with different objects that represent individual stems, and you can pick them up, move them around your head, move them around in space, activate them. So that's something we'll be showing here with HTC and their mixed reality setup with kind of the green screen. But I think we realized early on that it was going to be important to start to build some of our own demos to just really showcase what audio could be and could do in VR. And so this is kind of the first one where we have a full experience that we built, and people can play with the sounds. And in some ways, it's almost like a tilt brush for audio, but just let you manipulate the objects, let you manipulate the sounds, move them around in space. And it's pretty neat. One cool thing we did in this one is the head-related transfer function actually changes. In general, people have used these data sets that are all measured at one meter. And when you get out past a meter with the head-related transfer function, it's fairly consistent in terms of the deltas and delay and whatnot. So it does a good job at representing a shell. And then you use cues like volume and the propagation model and direct reflected sound and level to give you cues about how close something is. But in the near field, the HRTF actually changes, right? So if we bring something up to our right ear, it'll become very dominant. And so this version, now we have full integration where we have also depth-based HRTF. So when you're in that meter and you pick the object up and you bring it up to your right ear, if you kind of imagine clippers and you're getting a haircut, that effect you get when it comes by your right ear, you can get that. And I think that's really cool for presence when you start to pick up and interact with sounds and really bring it up to your ear and just feel like it's right there.

[00:05:17.707] Kent Bye: Yeah, I think having a good audio demo does really require a very specific software experience, but also potentially the hardware. I don't know, I haven't done this demo with other hardware to be able to know for sure, but what I can say is that this is the first time that I've been able to really locate sound in an experience in a way that I could close my eyes and look around and locate objects, and when I looked and say, okay, that's where I think it was, open my eye and be able to see that it was there. And so I feel like this is a good baseline for other hardware to measure against, whether it's other hardware solutions, but that what I noticed in particular is that the high frequencies are really easy to locate. And the other ones I could generally get, but that one especially stood out. But maybe you could talk about that, what you were trying to do in terms of designing that to see the different ranges of frequencies, and you're able to throw it to different depths, and you're able to move your body around it, So yeah, maybe just what you're trying to accomplish with that.

[00:06:13.151] Jason Riggs: Yeah, so I mean, the concept for the demo was really simple. It was just like, let's make an audio sandbox where we can manipulate sounds. In part because some of the demos we'd shown, we'd integrated with other people's games where they weren't designed with audio being the first piece of it. So there were so many things happening and so much. interaction and graphics that it was harder to concentrate in the audio. So we actually built this first tool. We're like, well, let's just build something simple where we put some objects in space and you can pick them up and you can put them wherever you want and you can bring them close and you can bring them far. And so we actually built a kind of like an internal tool for us to do testing and experimentation in VR. Then it was kind of fun, and so we're like, cool, let's build some graphics around it, let's put a little interaction and some physics in it, and make it something that people can use to also understand 3D audio and have their own experimentation. I mean, what you queued in on in the high frequencies is interesting, because if we really look at why the hardware and the calibration is important, it's really two things. It's getting the localization accuracy right and getting the sound quality right. But the thing about the localization accuracy that's interesting is, So, any generic HRTF algorithm can do a pretty good job about the right and the left. The right and the left is easy because we have ITD and ILD, so we have big differences in delay and level that let us locate the sound at the right and the left. So, at high frequencies on the right, our sound is occluded by our head, and so that creates this delta in level from the right to left ear, and then the delay path around it. And so, low and high frequencies, pretty easy to get right and left. But, when we get in front of us, above us, behind us, this whole conical sort of toroid, this donut in the middle, we actually rely on the asymmetries of our anatomy and the shape of our pinna to get that. And so, the pinna really come into effect when we start getting up above 1,000 Hz and even above 2,000 Hz. And that's one of the reasons that having high frequency components in the sound really makes it easier for us to localize the front, the back, the up and down, if we give you the right cues for your anatomy. And so the zones where you see the difference with the hardware doing the calibration are this central zone, front, back, up, down. And definitely high frequencies are one of the key parts that enable us to enable that differentiation. But that's just humans, right? If we had a low frequency sound in space, sometimes it's really hard to tell if that's in front of us or above us or behind us. It's not just a challenge on headphones, it's a challenge in the real world that our sensors are lined up left or right. So, when you're localizing things with the generic algorithm, the right and left are strong, and so if you move your head around and you're turning a whole lot, you can find it, but the way you're doing that is turning the left ear or right ear towards that sound and getting it into a zone that's more accurate. The challenge with that is we don't keep moving. It also fails us for height, because to localize that way with height, we'd have to tilt our head left, tilt our head right, kind of bringing our ear down by our shoulder, which is a behavior I've never seen anyone do in VR, and it's quite uncomfortable, actually. So there's this sort of idea that head tracking can help fix some of these ailments, and it does, in the case where the person is moving in the right plane, and they're moving within the interval of the sound. So, the gunshot goes off, head tracking will not help you localize it. If someone is talking for a minute and you keep turning your head around, you'll find them, but that's not the behavior we want. The behavior we'd like to have is, hey, something interesting is happening behind me and I instantly turn to it and I'm looking at where I expected it to be. anything else sort of has to form a disconnect in presence where you have to keep wobbling around to locate things. That's the reason that the individualization is important is to get these other zones and those zones are important. Now the other piece of it is sound quality and so like if you do the music demo we have set up that's a great one where you listen to music or something with the voice or things that were very sensitive about sound quality. And the problem when you listen to a generic HRTF model that's not your ears, is the spectral cues are happening at the wrong frequencies. And so what you get is a little bit of this, I don't know if this will play on the microphone, but it's basically comb filtering. So right now I'm just putting my fingers in front of my mouth, so if I do that again you'll just hear kind of how my voice changes. It's that same sort of thing. When we hear the reflections through someone else's ears at the wrong frequency, it boosts some frequencies, it cancels others that we're not used to. And that's what gives kind of the sound quality. And so our founders have been working in the audio space for 15 years. We did a lot of the top-selling gaming headphones and we licensed a lot of these HRTF-based surround sound algorithms from the Dolby's and DTS and SRS's of the world. We made these sort of generic algorithms. And they do something, okay? They get the sound out of the side. But the problem that we found with gamers was that more than half were turning them off because they said, we don't like the sound quality, it sounds weird, or it doesn't give me any advantage over stereo. And the reason it doesn't give you an advantage is because they mostly just give you this right and left again, but they fail in the other zones, and the other zones is what you would like to have. Left and right you can get with stereo headphones, you don't need a 3D algorithm to get that. So those are really the biggest deltas. But we absolutely want to make sure, like what we're looking at and kind of our vision is, hey, we want to bring immersive audio to all content and all platforms. And we think VR, it's not the only thing we're addressing, but it's sort of one of the best and highest use cases. And it's the exciting part around the conversation of 3D and immersion that I think is driving the interest. But you should expect your headphones to deliver a home theater experience if you're watching a movie on your tablet. And that's not what the world is today. We think VR is a great thing that's going to excite people about possibilities in music, possibilities in cinema, possibilities in gaming, and that these things are all kind of intersecting and reflecting back, right? So we're getting all the music studios asking us about what is the future of 3D music and what should we be doing to capture concerts, right? And so that's why we're most excited about VR. But we're definitely going to start putting out more of these demos to sort of pave the way of what is going to be possible with audio, because audio and hearing is our 3D sense, and it's just going to be crucial in the 3D worlds of the future.

[00:11:56.598] Kent Bye: Yeah, one of the things that you had mentioned earlier that I hadn't ever thought about or recognized or heard anybody else talk about was the convergence factor when you get out of a certain distance. I know that our stereoscopic vision people talk about these different zones. There's things in the near field and then a middle range and then At some point, whether it's a 20 to 30 meters somewhere in there, the pixels that you have to render versus the left and right eye are essentially the same. You don't have to have any stereoscopy. It doesn't give you anything that you're actually going to be able to perceive. It sounds like there's something similar with the ears, that there's some sort of sphere at a certain range that our ears no longer can tell any vocalization difference. So are there discrete zones that you feel are at different distances?

[00:12:38.657] Jason Riggs: Yeah, so we can perceive angular localization in different planes, but with kind of two-degree accuracy all over in space, so we're pretty good about this angle thing. That's where the head-related transfer function really comes in to allow us to properly spatialize the sound. And so, by that I mean, if a sound is supposed to be at a certain azimuth and elevation at this point in space, and we do blind testing, people perceive it at that point in space. So that's the first thing we're solving. So the depth thing, there's really a series of different cues that we use to perceive depth. And it turns out we are pretty good about it. If the sound is familiar to us, so like a human voice, if we're in a talking voice, we know how loud a human voice should be. So intensity alone gives us a first cue. If it's an unfamiliar sound, intensity won't be as good, right? But something like voice, we know how loud it normally is, we know the difference between talking and screaming. If we're a talking voice, we know it should be about this level, and it gives us one sense of depth. The other thing that's very powerful is the propagation model that we use. And so the key there is really direct to reflected sound. If I'm close to you, you're going to hear more direct sound. If I'm far away, you're going to have a higher ratio of reverb in the space. Now of course that interacts with how reverberant the space is too. So there's kind of a couple dimensions here. but in a dynamic sort of volumetric VR experience where we're moving up to things and we're moving far away, we cue into that change in direct or reflected sound. That does a great job at depths kind of beyond a meter. But to specifically answer your question about the zones, really those two things are the main things that are working when we're out past a meter. So usually HRTFs are measured at a meter in a laboratory and anything beyond that meter they look pretty similar. And the cues of the interaural time delays and the level differences and all these spectral things, that's kind of accepted as a far field out beyond a meter. When we get into the near field though, it sort of changes because the ears are on the side of our head. So it's not like we have ears all around our head. So if I bring a sound up in the near field right in front of me, like to my nose, it's still equidistant to the ear. So that the front and the back, the central zone, don't look as different from one meter as the sides do. Now, if I have a sound at the left, and it comes into this one meter zone, this ear starts to become very dominant, right? Because this ear is offset, so what we're getting is a big change in the interaural intensity difference, because this object's getting really close to my left ear, it's getting super loud. It's maybe similar on the right ear as it was in the far field. If we really start to think about in VR where you can pick up sounds, or you can lean your ear down to a sound, we want to be really accurate. And it really gives you that feeling. You expect it, but then it's that sort of surprising thing when you pick it up and you can interact with it. I think it's kind of fun. So this is the first demo where we built that depth-changing HRTF into it to really address those effects within that meter, which is great for you leaning your head into things, picking up objects, moving them in.

[00:15:31.977] Kent Bye: Yeah, there's the famous barbershop demo that this is probably the closest that I've experienced that within a VR experience. I haven't seen any other technology or software be able to really replicate that yet. But in terms of the pipeline, you put in a single audio sound, maybe you could talk through the different steps it has to go through to eventually get to your left and right ear, all the different algorithms to actually do all this magic of the spatialization that you're doing as well.

[00:16:00.020] Jason Riggs: Okay, I'll take a stab at that one. That one's a big one. Well, let's talk about maybe the content creation side first, and then a little bit about the rendering side and how it gets to your head if we split it into those two phases. And there's different paradigms here. So if we think about content creation, if I drew a spectrum, on one side I might draw something that's like pure live capture. And by that I mean, could we capture a full 360 event in real time the way it happened? And that's one sort of paradigm. And then at the other end, we may do something that I'll call pure production. And by that I mean, can we take individual sounds, maybe things that are recorded as high-fidelity mono sources or stems, and then we're now going to take those and we're going to put them wherever on space. So, if we're doing volumetric VR, where the user can move around within the experience, At the end of the day, a lot of the production elements are happening within the physics engine of the game engine. The whole way this works is that these intensities and positions and everything are happening. Now, with VR, no matter where we are in the spectrum, there's an interactive component of the human. in the simplest form, and again, maybe now it becomes sort of a matrix, right? But if we think about non-volumetric experiences, so something where we're creating a shell, so it's 360 cinema or something, but non-volumetric, the user can't move around, the user still can rotate. So at minimum, we need to take this shell that we're building of also the sound, not just the video, and we need to make sure that when they turn their head, it rotates counter to them, so it seems fixed with the video elements. That's maybe the simplest, least interactive VR audio experience. And that lends itself really well to blending with these sort of sound field mics that capture a real-time shell. So we're doing a lot of work, we did this project with Abbey Road Studio with the new Abbey Road Red Incubator, and we're just working on the future of music. So for six months we've been doing a lot of live capture, high order ambisonics, we have a 32 microphone ball, and so in that case you're really trying to directly capture the sound in a sphere. And what's great about that is that you get all of the reflections and all of the propagations also accurately captured spatially. So the coolest experiences we've done are where you're recording in a big studio like Abbey Road Studio 2 with an orchestra or a cathedral or something. What's amazing is not just getting the direct sound to be from the right angles, but getting that absolute sense of proper reverberation and space. When you sit in that type of recording, you actually feel like it doesn't matter, close your eyes, you're in a different space. And you can sense exactly how big that space is, exactly what that feels like. So that's going to be amazing for VR when we get the immersion and the sense of space. If you walk into a coliseum or arena and the hair kind of stands up on your deck and the crowd cheers and you're like, this is an arena without any visuals, I mean just the audio. But anyway, this is one sort of side of it is trying to capture things in 3D. On the production side, what we're doing, and I'll maybe just talk a little bit more on that side, but like the experience you're seeing here is in Unity. Most of these have some audio middleware. Unity uses FMOD as the default one, but FMOD and WISE are the two big middle layers there. So we're building plugins that go into that middleware that then you can use across Unity or across Unreal. We talked to a lot of the game companies from the big AAA guys to indie folks and I mean the key takeaway for us was really they don't want to disrupt their workflow. They don't want to change how they do everything. So they don't want us or anyone else to build a solution that's like, hey here's a whole new set of tools that everyone has to learn. So what we're trying to work on on that workflow and content creation side is how to make a plugin that is the least disruptive to the way that they're familiar doing it, right? And so the first plugin that we're building to do this, that's part of this experience here, basically goes into Fmod and Unity and it cues the position of everything in there. And so if you're a content creator, the way it works is you identify which objects you want to be rendered as discrete objects. and the ones that you don't go into a background channel mix, and that's just sort of for efficiency. And so we have one demo where we have 300 objects, and we have about 50 being live at a time, and that takes maybe 4 or 5% of CPU, and so that's a reasonable mix. But basically, we render those objects then discreetly, and so for each one of them, we're in real-time calculating the position of it. We have a really accurate HRTF set that's something like 500 angles, and then also depths on top of that. And so for each one of those objects, it's discreetly rendering that at the correct position and the correct depth. So in this one where we have four or five objects, those are all being rendered as objects. Now, we can do that in a generic sense with any kind of hardware where we provide the HRTF data set. And in this case, we're doing all of that processing within the game engine. The version that you're seeing here that's a little bit unique is when we're using the ASIC hardware with it, we're doing the real-time calibration to your individual anatomy. And so that's sort of a different layer that sits on top of that. And so there's input from the sensors on the ASIC, which is measuring your head size and ear spacing. That's going in to change what those HRTF set is, effectively, to match your head size and your body size. And then for the headphone, in real time, we bring eight channels out just for high frequencies. and the high frequency component is happening in real time and it's steering around your ears, if that makes sense. So wherever each one of those objects is, it's fed into that eight channel array from the right angle, and that just makes sure that the high frequencies hit your pinna or your outer ear from the right angle. And that's what gives you those right spectral cues. So it's definitely not as simple as just putting some speakers in a headphone, and in fact that's not really what the speakers are doing, but they're doing just the pinna piece of high frequency interaction. we can put a microphone in the headphone and capture that in one time and then we could move it all over to the game engine and just play it back on stereo headphones. But because that was kind of a weird behavior to require people to put microphones in their ears and do this setup, our goal was really do it in real time. And so that's why we do the ear part in real time with multiple channels. As we go forward though, what we're working on is really how can we bring this kind of calibration and technology to everyone if you have our hardware or you don't have our hardware. Okay, so the future for us is first these headphones. We're also working on how do you solve that for small form factors or in-ears or things that will make even more sense to be integrated with HMDs or VR and AR systems of the future. And then finally, how can we give you the best experience in software? And so, if you don't have ASIC headphones yet, can we give you a great generic starting point? Can we give you a method if that's taking a picture of your ears, if that's doing something really simple to get first level anatomy calibration before you have the hardware? So that's kind of our vision is to just solve this for everybody with whatever their systems are. And one method is great if you really care about audio and you want to spend a couple hundred dollars or more. We'd love you to buy integrated ASIC solution and we want to be the leaders in immersive 3D audio. For everyone else we'd still like to help solve this and so we're working on kind of the software pieces and all the back end and all the tools and everything to build up the ecosystem because We need to get people excited about creating good audio experiences in VR and we need to make it easy for them to create good audio experiences in VR so that people hear it and then start to demand it and that we get this sort of virtual cycle going. For us, our high-level thing, our whole strategy since we started it, I mean the first piece is we want to be the leaders in immersive 3D audio. And the first brand tenant and focus for our strategy is help everyone build this ecosystem. Help push things to more immersion, help push the audio part forward. In our first pitch we would show people our company deck was a big line with video. Here's what's happening in video, this hockey stick, right? We've just brought the world the most immersive video experience. You can now take a cell phone and stick it on your face and have a level of immersion that no one in the world ever had. Cool. And then the audio one, like dot, dot, dot, you know, behind it. Audio always follows video, right? Like silent pictures. I mean, movies didn't used to have audio, right? So the whole history. And so for us, it's like bringing that hockey stick to audio because we think it's a huge part of the equation. In some ways, should be a bigger part than it was in the rectangle world. The Rectangle World audio was like, oh, it's cool to add immersion, add a little surround sound, give you something extra. But now we're throwing you in the 3D world, and even though you have these great 3D visuals with depth perception, you're just still limited. A HoloLens, you can only see 5% of the sphere. You know, a great HMD, you're seeing 15% of the sphere. How are we gonna let you know all the other amazing things that are happening in 85%? And audio is that. It is our 3D sense, so.

[00:24:30.460] Kent Bye: In terms of audio production, the future of music, I see that similar spectrum, whether it's live, traditionally composed music, maybe it's environmental where you're hearing it in a specific space, but it's still generally not changing the way that music is composed. And I feel like at the other extreme of going completely digital, you start to have a little bit more ability to be able to do completely new compositions that are doing things that may be physically impossible to do in a real space in terms of where you're placing different audio objects that people are listening to. And so you start to think about when you go from 2D to a 3D composition mentality, what kind of things can you do? And then when you add in the interactivity, which is a different thing that Just in the film world, it's really the split between films and games is like this level of agency and interactivity and authored narratives of you have a very specific passive experience of just receiving it, but yet what happens when you open it up to that level of interactivity? What can you start to do with something that used to be an authored experience is now interactive, then what does that do to the future of music when you introduce agency into the equation? What have you started to discover in terms of the audio production pipeline and working with the Abbey Road Studios? And it feels like kind of starting at the 2D replicating that and then eventually moving into this new paradigm. I imagine that we're going to replicate what we've done before, before we start to really break out into what's new. But just curious to hear some of your thoughts of where this is going.

[00:26:01.558] Jason Riggs: Yeah, I mean, that's super interesting. In fact, I just a couple nights ago, I was listening to the radio and was listening to this interview with Brian Eno, who had done all this, you know, early sort of environmental work with everyone from, you know, David Bowie and Coldplay and you two. The interesting part, at the end of the interview, they're like, what is the future? And he's like, this gaming stuff, all this is going to be composed in an interactive sort of thing. And they're like, cool, what are you doing with it? And he's like, hey, I'm 67. I don't know what the hell to do with this stuff, but this is the thing. This is it, right? It's going to be this. And for us, again, at this beginning thing, really for us right now, If I'm out pitching people and talking about what the opportunity is, it's just two things. The future is interactive and immersive. You just have to believe this. Your advertisements right now on Facebook are on 360 videos, right? So look at this whole stack going from VR, but now reflecting back down into Facebook advertisements, right? I mean, so we think that that 360 content, and you know, Facebook has said this, in five years that 360 is going to be as native as video is today. and look how fast even video has replaced text, right? And so, for us, interactivity and immersion is it. And I think one of the challenges we see, and we've thought about how do we solve this, when we go into the music space, there are a lot of people who are building things with a very old set of tools, right? These mixers and sliders and digital audio workstations and things that were designed around a paradigm of a handful of channels and speakers and a very fixed perspective on the world, okay? At the other end, we have people in the game engine where everything was built around interactivity from day one. And so for us, the whole world of creation is going to start to look a lot more like game engine creation than it looks like the old way of doing audio things. Now, that brings up a lot of opportunities, and then part of the challenge is really, are people willing to change that paradigm and adapt and understand that they need to move over to this other side? Because we can build all kinds of tools for sliders and for the old world, and we do need to build workflows and tools to kind of bridge that gap. And I think those will be relevant for especially the more 360 cinema and 360 experiences that are less interactive, that where you're just turning, it's easier to adapt that fixed perspective. But the possibilities and things that are going to be introduced on the other side are going to be amazing for the creators. And I think it all starts to fall more into this game paradigm. And so we've been bringing some people in and kind of building our own small internal studio. Kedar here who had his little studio and we hired him and he's originally Blackout VR and he built this demo for us and a lot of that experience. We just hired Sally Kelleway who's an evangelist for FMOD and basically Her whole career and everything she believes in is just interactive audio, right? Interactive and immersive audio. You know, she's been doing game engine design and that was kind of her undergrad work and then went and did her master's in spatial audio and headphones and sort of this piece of it. But, you know, we're definitely trying to align ourselves both between the people we hire and the people we're working with and finding these next-generation cutting-edge artists who want to do something that pushes the boundary of interaction and immersion. I would say a lot of the first things we've been doing at Abbey Road are on the workflow on the other side. How do we get traditional music made in this space, but also really capturing the immersion in the 3D and the environments in the space. So, more on the immersion side. We haven't got to go as deep, I would say, on looking at what the opportunities are on the interaction side. But for us, that's the corner. Interactive and immersive. And I think the opportunities are going to be huge, not just in the traditional VR space. I mean, think about music right now. It's commoditized. Great, we have stereo music and everyone can get a $9 a month service where they all get the same thing and the same experience that we've had for a long time. But what about the experiences where you can absolutely be at the concert and have the live experience? What about the experiences where you can have a perspective that you've never had before? One of the test recordings we had done was this men's choir, I think the Westminster Chorus, and we dropped one of these 32 microphone arrays on top of the conductor's head. And so it's right up behind him. You are the conductor. The choir is basically a half circle around you on different layers. and I was listening to the recording, and the choir was amazing. You're listening like, oh my, this is, here I am, I'm in this space, I'm in the choir. But what blew me away is when the music stopped. It was like outtakes, they were doing the rehearsal, and they were doing recording. Conductor stops them, and you hear everybody like clearing their throat and coughing, and you can pick every single individual out, and just this kind of, and you're in this space, and you just feel like you're there, like you feel like you need to be quiet, you know, and you're feeling, And then it starts again, right? And it's like that experience you've had with live music or being in an orchestra or whatever, but it's just so compelling to hear it reproduced and transport you there. So I think that's kind of the range of immersive things we're going to see, but some of the interaction elements, even simple things like that, being able to be at the conductor's seat and have an experience that you never had with music before, being able to be in the choir, being able to move around. just things that we could start to open up for music. Of course, having the actual soundtrack be interactive, I mean, it's kind of, like this is a really simple demo, but you have four stems for music, you're moving them around in space, you can turn them on, but you're kind of mixing them in real time, spatially, and interacting with them. And so I think we're gonna see more of that type of highly interactive music experiences, where it's like, I wanna emphasize this, I wanna do this. I mean, for us, we play around with this all the time, and we're sort of like, I almost think like mixing, mixing in 3D. Why would we use sliders and a big board? For sure mixing and creation of audio in 3D has to be a virtual or augmented reality experience where we move the sounds around in space and put them where we want them and make them louder or make them quieter and so even building the tool sets. So I think we need to be thinking about what the tool sets are right now for nine-year-old kids. so that 10 years down the road, no one is using these kind of archaic things and plugging wires into whatever, and it's just totally not. Of course you just take your sound and move it around in space, and of course this is how it is, and you can build things where you look at it and emphasize different things, and you know, so it's exciting. I don't have all the answers, but I think that the palette is wide open for people. We just need to make the tools to make those sort of things easy for the group's wisdom to start to show what's possible with it.

[00:32:21.290] Kent Bye: Yeah, but in terms of the other side of production, what kind of output formats are you going to do to be able to replicate the live performances? Because I know that you could do an ambisonic recording, but yet, is that standardized to the point where you could actually attach that to a MOV file and be able to watch a video experience that has that level of spatialized audio, or I'm not sure if all the middleware or technologies really in the pipeline to have some sort of standardized codec to be able to handle spatialization where you're looking and able to translate that, or is it already there? Are there standard enough ambisonic recordings so that you can basically take that and input it into either a game engine or some sort of player that feeds into the ASA headphones that you can get that same sense of spatialization?

[00:33:09.573] Jason Riggs: Yeah, so I mean, a couple years ago, it was just absolutely the Wild West and not clear like what all the formats are going to be or what's happening. And so on the 360 side, ambisonics is starting to kind of win as the default. And I mean, part of the reason is that Facebook is supporting it and building tools to use it. Google and YouTube are supporting it. You can upload your ambisonic recordings onto YouTube now. One challenge is that they come out with kind of this built-in binaural render that's not amazing maybe, but those are things, you know, we're talking with all of these folks to see if we can kind of push and make it so that you can both have the pipeline to put the ambisonics in but also get the ambisonics back so that we can kind of evolve how it's rendered on the other side. Windows put support in for Ambisonic a long time ago and no one ever used it. It was almost like an Easter egg. So we've been working with Windows to actually sort all that out and make sure it does what it's supposed to do. So Ambisonics is cool. So Ambisonics is basically a spherical format that was created in the 70s and it's an open format. And what's neat about it is the spatial resolution can scale up and down uniformly on a sphere. So it's sort of based on, if you look at the actual orders of it, it looks like electron shells. And the math is all sort of based on these different orbital patterns and how you can add them on to each other to increase the resolution. But it has some limitations. So a lot of what we're hearing now about ambisonics is first-order ambisonics. And first-order ambisonics is still not a lot of spatial resolution. So first-order ambisonics is sort of like a four-channel format. that kind of gives you six directions in a simple way to think about it. So it's three orthogonal dipoles to get a little nerdy, and then one monopole, and then by subtracting those four channels, you can imagine that you could also get, when you combine a dipole and a monopole, you get a cardioid. And so a cardioid is like a first-order shape but not super directional, okay? This kind of heart shape that focuses in one area of pickup. So first-order ambisonics, and when we talk about ambisonics, there's different things. We talk about a microphone that picks up in this pattern, we can talk about it as an actual format of recording in this, and then we can, of course, decode it back and play it on some kind of renderer, headphones, speakers, whatever. So this is kind of the capture, the format, and then how we decode and playback are all important. Because you can make great ambisonics sound really bad with bad decoding and bad output systems. But the challenge with first order is it's still a little bit fuzzy, right? Because we have kind of these vague six directions that we get out of it. But it would kind of be like a six-channel array on the faces of a cube. It's kind of the resolution we can get. Now, a lot of the recordings we're doing, like we're using this Eigen mic, which is a 32 mic ball, you can do third or fourth order with it. That's a very different thing. And it needs a lot of work and the workflow is not perfected there. But when it's done right and you get that thing working right, it sounds pretty amazing. And so we just need to be careful with ambisonics because it's similar to saying channel-based audio. So channel-based audio, we could be talking about mono, stereo, we could be talking about a 22-channel ball. these are really different spatial resolutions, right? So for us, the main paradigms we're sort of looking at are channel-based things, okay? And so of course we have stereo, we have 5.1, we have a lot of 11.1 content, we have a 20-channel speaker ball that we mix things in, right? So we can do that with scaling of channels. Ambisonics is just a great way to uniformly scale up and down, and so we like high-order ambisonics, and we think that's gonna be it, but Both of those channels and ambisonics struggle with how they convey depth. There are some things we can do with both of them to do that, but I would tell you that the standard decoders and encoders, and the way people have thought about it, are mostly only thinking about angle. And then of course we can use intensity and reverb to give you some of those depth cues, but you're never going to get that accuracy, especially in the near field. You're rendering a shell, and the information you pass through that shell doesn't know anything discreet about the distance of those objects. So you have to kind of encode everything with the levels and the reverb to give you that sense of depth. So, I think it's great for 360, it's great for giving you this immersive, what we're used to in the cinema, which is being in that hemisphere, being in that sphere. I'm not sure that it will be as easy or as powerful for fully interactive volumetric VR experiences. In that paradigm, the object-based rendering can start to make a lot more sense, because you really have the individual control of the depth. But the challenge we see there is that if everyone makes a plugin that only spits out binaural audio, and that that binaural is bad, and that binaural doesn't know about what the rendering device is, and doesn't know about your anatomy, we can lose all the benefits of the object rendering anyway, impair the sound quality, and make something that we can't fix later. It's kind of stuck. You know, not to mention that headphones and earphones may not be the only thing. The reason we should want to use them is when we get 3D audio right on headphones and earphones, we have an infinite amount of points in space. We're not limited by the number of boxes or speakers. But that being said, there are a lot of things we can do with beamforming and sound projection when we project the sound to your ears and layer the binaural thing on that don't necessarily require the physical headphone or earphone. My point is, rendering hardware and human interaction is going to change and evolve to make this better. And we want to make sure that the formats we use and how we think about creating things allow the interfaces to evolve and get better on the other side. And so that's why we don't recommend two things. One, anything where you're recording with binaural heads or binaural pinna, you can't get rid of that. That causes cancellations in the sound. It will always sound like that person's ears. You can't fix it. Don't make recordings with ears. Okay, we put the ears on after. Put crappy ears on after. But make sure that you recorded it with a sphere or with a ball or with something. We can do all the spatial ear effects in post-production. So that's one recommendation. And then the second one is don't make experiences that only have binaural as an output. It's fine to have that as an output. Have it as an output. Hey, you probably have headphones, probably go ahead and use this. It kind of works with them. Cool, that's an output. Just make sure that we think about what are the other ways to do this. Can we also have an ambisonic output? Can we also have a plugin that allows either object information, metadata about the objects, other people to access the things with their plugins, or just a high-order ambisonic or multi-channel count? And I think a lot of this is going to get resolved by the middleware folks and we're working with them to help support these different things. So as long as there's kind of a flexible approach that allows everything else to evolve, I think we'll be in a good space. So it's not done yet. It's still super messy. It's not easy for everything to do these things right. But there are things that are getting aligned around and we're in a better position than we were two years ago.

[00:39:54.797] Kent Bye: Well, the other thing about the physics engine is that right now they're doing real-time physics for the visuals and the interactions, but yet talking to Ming Lin of the University of North Carolina, she's working eventually towards a real-time audio engine. You know, she's working on using machine learning to detect material properties of different spaces and then be able to have that very specific profile and then be able to do real-time sound interactions because right now I think a lot of the material properties within the object-oriented game engine is not quite as realistic as you might hear it and I think that real-time physics is not exactly the same physics as you have in the real world but it's in real time and so it's close enough so I imagine that perhaps with audio being at a higher frequency there may be a little bit more uncanniness that has to go through that uncanny valley before we get to that level of fidelity that we actually hear and can't actually tell the difference whether or not it was created within an engine or created within real life. But I think we're maybe 5, 10, and I don't know how long it's going to take to get there. But I think that's the roadmap is to start to move to real-time audio engines. And so I'm just curious to hear some of your thoughts on that.

[00:40:58.678] Jason Riggs: Yeah, so imagine if we split how we're going to create audio in any of these physics engines into two phases. One is really the physics of propagation. So propagation, occlusion, all the reflections that are happening in the space. If I take a source, it makes a sound. Some of that gets directly to you. That direct sound spatialization is really important to figure out where it's coming from. We cue mostly on the direct sound. But we also use the reflections to help us localize, but absolutely use them to pick up the sense of space, what kind of environment we're in, and all of this. What we're seeing with the really, as I mentioned, when we're doing these recordings in a church or a cathedral or something with a very specific sonic signature for all those reflections, when you get it right on there with this live capture and all of those come in correct spatially, I mean, just absolutely filigree there. The hair on your neck stands up. You were in a cathedral. That's what it should be like in the virtual space. It's nothing like that today. Now, real-time or not, because there's a whole spectrum of things that we can do. So, we can do pre-rendering on graphics cards. There's a trade-off between CPU and memory. Right, we could do incredible ray tracing pre-renderings if we had to if we don't have the CPU power. If we have a CPU or we have dedicated audio hardware. I mean, I built computers back in the 90s, you know, and I remember having an Aureal 3D card that was doing a pretty cool job of this when all the audio was in hardware and then it kind of all got integrated in Windows and that died. We actually took a pretty big step backwards. So there was a lot of work of this happening in the early days and now you've got folks like NVIDIA and AMD who are doing a lot of this propagation rendering. Here's the thing we found though. All the propagation in the world doesn't help at all if you don't have good spatialization. If all these reflections you can't tell where they're coming from, it's lost. I believe that accurate spatialization and great propagation is going to be a 1 plus 1 equals 10. but first you have to get the sound coming from the right spot. There's no point figuring out every single reflection and all of that, because what it sounds like without spatialization is just good reverb. And good reverb is great, it's good to know if you're in a bathroom or if you're in a dead space, you can get that with reverb, but that's all it sounds like. When all of the reflections and spatial cues come in accurately, it's a lot more than that. It actually helps you not just localize the sound, but just almost use your human sonar to perceive exactly what that space is like, to feel like you're absolutely in that. So I think that's the holy grail for immersion. I think the thing we need to do to make that not 20 years is build the demo. you know, build the demo that people hear where they come into it and they're like, oh, yeah, yep, I'm in a cathedral, I'm in an arena, this is what the thing is, you know, and so it's getting all of these players together. And so we're chiseling at it piece by piece, you know, maybe next year by CES we'll have one with amazing, crazy propagation integrated with the spatialization and be able to start to show how those elements combine.

[00:43:46.257] Kent Bye: Yeah, so I guess what I hear you saying is that right now you could take a mono sound input and do the propagation modeling as well as the spatialization, and that with the good hardware you'd be able to get most of the way there. And it sounds like doing the real-time physics engine of the audio side would mean that you could start to model different objects, and if you wanted to have collisions or things bump into each other, you would no longer have to necessarily record those sounds, but yet it could naturally be emergent from the environment. And so you have the audio sources coming from the process of these physics simulations rather than from recordings, which is where we're at right now.

[00:44:23.062] Jason Riggs: Yeah, I mean, that's a whole different level of sort of, you know, the same thing we're doing with physically informed algorithms to learn about the human, imagine that the other way around. And so we worked with some great people. We did some early work with one of the guys who's now at Google doing some of the stuff, but he did his PhD thesis in that and with things like modeling a drum, right? So you don't record the drum sound, you absolutely model the drum, and now as you turn it and tap on different parts of it, you get different sound. So audio should absolutely be rendered in that sort of way, and it's going to make it more interactive and more exciting. The challenge is, why I brought up the demo, is audio always takes sort of second seat to the visuals. And so we have to deal with this thing, first off, of getting a fraction of the CPU, you know. broader, longer term, getting the Intels and the AMDs and everything, not only to want to make things for it, because they all want to make chips, but to get people to want to buy a dedicated audio chip or more powerful CPUs and allocate something to them because the audio is important. And the way to do that is to show them the audio is important. So the chicken and the egg is to just start to build some of these things, get it in front of people, blow them away with it, and have them absolutely demand it. But I mean, that's sort of the cycle that needs to make it so this isn't a 20-year, like, yeah, audio's always cool, but we're always gonna get 2% of the CPU and no one really cares. So, step by step, but I think all of those parts, right? The propagation, the occlusion, the physical models for the piece, and accurately spatializing it so you hear it all coming from where it's supposed to. That's the future.

[00:45:48.661] Kent Bye: Awesome. And finally, what do you see as kind of the ultimate potential of virtual reality and what it might be able to enable?

[00:45:56.546] Jason Riggs: Yeah, I mean, that's a big one. That's probably, I don't know if I can even do that one justice. But I think I'll just give you one scenario in my mind. When we started, we did a kind of a 10-year vision for us. And we were like, yeah, what is this audio thing really going to look like? And we were like, you know what? You're going to have something, if that's hardware, if that's a little thing that sticks in your ear. But you absolutely are going to be able to control your experience between bringing any kind of media. So for VR and me, it's sort of the ultimate escapist media of replacing reality with a different one. if that's travel, if that's anything, whatever reality we want to imagine, you can replace your one at this instance with something else. And so for audio, for us, it was that. All of your entertainment, all of your experience, anything, we need to make sure that your world can disappear and you can absolutely have an accurate version or a hyper-realistic version of some other kind of world, whatever that is. Then the augmented reality version for us was sort of the blending between the two. And so, you know, from an audio standpoint, it's that. If we can give you something as realistic as what you experience, I mean, experience the show. We can hear people everywhere. You know how big this thing is just based on standing here, right? You get a sense of the scale and all this. You don't get that through the NVR. So being able to transport you and you're at the show and you close your eyes and you're like, yep, I'm at CES, I get it, this is what it is, right? That's sort of the end goal for us. A lot of smarter people than me are going to figure out all the different things we want to do when we can transport you anywhere, put you in any kind of experience that we've never even thought of yet. Of course, bringing you into simulations of reality, but now bringing you into things and letting you do things that you can't do in reality, I suspect that's all part of it. But then the interesting part for us is that other spectrum as we blend that and do the mixed reality version, what does that enable? If I'm sitting in my living room, can I press a button and have the most amazing home theater system come up out of my floor and it sounds like I have million-dollar speakers in my room and I see them but my wife doesn't. And I'm having this amazing experience like this and it's totally different than the experience she's having. All those kind of things. But all these little pieces have to be solved.

[00:47:52.976] Kent Bye: Awesome. Well, thank you so much. Cool. Hey, thanks a lot. I appreciate it. So that was Jason Riggs. He's the CEO of AASIC, and they were showing off their latest prototype, the AASIC-X, at the Consumer Electronics Show. So, I have a number of different takeaways about this interview. First of all, this was by far the best immersive audio demo that I've had a chance to hear. And it's difficult for me to really attribute whether it's the hardware and the specific calibration that's happening in my ears and whether or not it was just a great audio software demo. What I can say is I was able to shut my eyes and be able to localize sound by just turn my head around and then open my eyes and then see where the objects were located. And it was also really amazing to be able to take these audio objects and to just put them around my head and to really hear that near field. And I haven't been able to hear any other headphones or demos be able to do that as well. It'd be great to, at some point, have AUSIC be able to release this demo, potentially allow other plugins to be able to put their audio specialization solutions in there, be able to compare and contrast it with other just normal headphones versus their headphones, to be able to really see the impact of having an integrated hardware solution that's able to be customized to my anatomy versus generalized solutions versus other kind of software specialization solutions. So Jason is the first person to really talk about the importance of depth when it comes to audio specialization. And what he's saying is that they're actually using a depth-based head-related transfer function. That's the HRTF. So that as you move your head around, anything that's within a meter of you, they're actually dynamically doing this calibration of your ear and then feeding those parameters back into the software so that it could actually do this depth-based HRTF translation to the sound so it actually makes it sound like it would sound like if it were within one meter of you. It sounds like most of the audio spatialization algorithms are kind of assuming that you're having objects that are being spatialized that are one meter and beyond, and that once you get that close, then that's where Jason's claiming that a lot of these algorithms start to break down, and that With the AUS-X headphones, it's dynamically measuring the anatomy of your ear and then in real time feeding that into the algorithm such that when you have these objects in the near field, you can actually really hear them. And I can definitely say that I've never seen any audio demo being able to really accurately be able to do that. And it just sounded amazing. So Jason says that there's a number of different cues that we're listening for in order to start to spatialize and tell the depth of how far something is away from you. And that includes everything from the volume, the propagation model, the ratio between the direct or reflected sound, as well as the level, that all is giving you different clues. And that the hardware calibration is important because in order to get the localization right and the sound quality right, then they have to be able to do these specific calculations of the ITD and the ILD. That's the interaural time difference and the interaural level difference. So essentially the differences in the delays between your two ears and the levels of intensity when things are close to your face. And that in the higher frequencies, the shape of your ear is actually helping determine how that spatialization is happening. And so they're able to start to detect the asymmetries of your ears to be able to better spatialize high frequency objects when they're either in front of you, above you, or behind you. And Jason was just saying that in most of the HRTF solutions that gamers tend to turn them off just because there's no added benefit between just using basic stereo. And Jason's saying that's because not having that HRTF correct for your ear actually just starts to have a different spectral profile for what frequencies are being filtered out. And so it just sounds muddied and the sound quality isn't as good. And so the gamers just ended up turning it off. So Ausik has been working with Abbey Road Studios' Red Incubator program to be able to really look at the future of music. And Jason says that the future of music is going to be immersive and interactive. So the immersive part, I think, is something that is going to be universal between the two ends of the spectrum. And the interactive part is going to be on one end of that spectrum. So in the two ends of the spectrum, one is the existing pure live capture, and that is the non-volumetric and passive, essentially putting these higher order ambisonic balls. They're happening to be using the Eige Mic EM32, which has these 32 microphones, which is the third or fourth order of ambisonics. And that is essentially capturing a sphere of audio. And that you're able to get a lot higher level of spatial resolution as you turn around and be able to localize sounds. But Jason's saying that they actually doesn't do a great job with depth and being able to really tell how far away different objects are. Now on the other end of where audio is going is on the more game engine. This is the volumetric and interactive and the pure production. This is the object-based audio model where you're essentially being able to put audio objects within a 3D space and then from there start to spatialize it. And you're able to actually also have this interactive component to it where you can start to dynamically participate with your agency within a specific experience. And the demo that AASIC was showing at CES really showed the capabilities of that. And right now that was a little bit more of like a game environment rather than listening to an authored music composition. This is a tension and a dynamic that I think is being mirrored between, you know, films and then games. What's the difference between a game and a film? One is where you're able to express your agency and interact with an environment. And a film is much more passive and you're really receiving a story. And so much of music composition has been in this paradigm of just completely passive consumption of whatever music is being put forth. And what does it mean to be able to start to introduce the level of interactivity and agency within that experience? One example could be like if you were the actual conductor and you're able to kind of do this live mix and be able to point to people to sing louder or softer. So that is just one level where you maybe have some dynamic interactive mixing, but this is one of the biggest open questions in terms of what does it mean for music to be interactive. The immersive part I think is clear. This is just that you put on the headphones and you just feel like you're completely transported to that place and it's able to capture the sound field of all the direct and reflected sound and it sounds like you're actually in a cathedral when you shut your eyes. That's when Jason says it. He can shut his eyes and it just makes the hair on the back of his neck stand up. So this realm of passive audio production has been in these fixed perspective and using these mixers and these kind of linear pipeline to be able to produce audio. And that is the dominant paradigm right now. And AASIC is essentially saying we need to create the tools to be able to move into a realm where the music is actually generated within this more real-time game engine type of environment. They want to be able to build the tools so that the 9-year-old today can be able to produce the music 10 years from now when they're 19 and be able to create these fully immersive and interactive types of soundscapes. So the final point that I just wanted to make is that right now there's an assumption that any of the spatialization that's happening is getting rendered out to a binaural feed, so essentially a left and right channel. But Jason's saying that some of these audio hardware headphones, like the AUSIC, may be able to handle these higher level orders of ambisonic outputs. So not only will they want to potentially have these higher order levels of ambisonics, but when you start to go into that other extreme of the object-oriented model, then there could be a software layer where you want to actually dynamically interact with some of the metadata and the objects and the material properties to be able to have some sort of plug-in within the rendering pipeline. such that it's not just rendering out to a stereo audio feed, but that you may want to dynamically change either the room properties or the material properties or reflection. So being able to dynamically interact with the objects and the material and room properties is something that the Dolby Atmos format allows you to do. So perhaps eventually this is something that AUSIC would also be able to handle once there's some sort of standardized format that is open enough for people to be able to interact with that. So that's something that's just more distant on the horizon. I mean, right now, everybody's just assuming that you want the stereo output. But Jason's essentially saying, in the future, we're going to want more and more sophisticated ways to be able to get closer to the raw data to have more of an impact for how that's actually rendered. So that's all that I have for today. I just wanted to thank you for listening to the Voices of VR podcast. And if you enjoy the podcast, then please do spread the word, tell your friends, and become a donor to the Patreon. Just a few dollars a month makes a huge difference. So go to patreon.com slash Voices of VR. Thanks for listening.

Play episode

#494: OSSIC CEO: The Future of Music is Immersive & Interactive

Rough Transcript

More from this show

#1165: XR Installation “Ikhet (Sound Pyramid)” Combines Immersive Sound, Visceral Haptics, & Diffracted Kaleidoscopic Visuals

#1158: Searching for Post-Colonial Identity in an Experimental Oral Knowledge & Multi-Media Project “Ghana Airways”

#1085: Site-Specific Immersive Audio Piece “Radio Ghost” Changed the Way I See Malls

Menu

Play episode

#494: OSSIC CEO: The Future of Music is Immersive & Interactive

Share this

Rough Transcript

More from this show

#1165: XR Installation “Ikhet (Sound Pyramid)” Combines Immersive Sound, Visceral Haptics, & Diffracted Kaleidoscopic Visuals

#1158: Searching for Post-Colonial Identity in an Experimental Oral Knowledge & Multi-Media Project “Ghana Airways”

#1085: Site-Specific Immersive Audio Piece “Radio Ghost” Changed the Way I See Malls

Menu

Share this