Oliver Kadel’s 1618 Digital specializes in spatial audio sound design and engineering across linear immersive storytelling and interactive immersive experiences. Kadel also co-hosts The Immersive Audio Podcast with Monica Bolles, and I was recently featured in episode #109. I sat down with Kadel to get a survey of some of the spatial audio formats and production pipelines ranging from 3rd order ambisonics, object-oriented audio in game engines, working with the audio formats for Apple Immersive Video, Dolby Atmos, MPEG-H Audio, and emerging open standards like the Immersive Audio Model & Format (IAMF), which is “an audio container specification designed to revolutionize immersive audio experiences across a wide range of applications, including streaming, gaming, augmented reality (AR), virtual reality (VR), and traditional broadcasting.” We talk about some of his favorite immersive audio productions, his process of getting a Ph.D. studying the impact of spatial audio on cognitive load, and some of the future trends of spatial audio.
Podcast: Play in new window | Download
Rough Transcript
[00:00:05.458] Kent Bye: The Voices of VR podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR podcast. It's a podcast that looks at the structures and forms of immersive storytelling and the future of spatial computing. You can support the podcast at patreon.com. So in today's episode, we're going to be doing a deep dive into spatial audio with Oliver Cadell. He's a sound designer, a sound engineer, and founder of 1618digital.com. And he's been working on a lot of different immersive storytelling projects over the years. I've met him at Ventus Immersive, featuring a number of different projects that have been there over the years. Also, they've been featured on Oculus TV, now MetaQuest TV, but also on Apple Vision Pro. He's been working on the Adventure series and doing the audio on those. And so he's got a broad range of working on both ambisonic formats and all these emerging formats. In Big H, there's a discussion around what the emerging open source format with the immersive audio model and format that's being edited up by google but also like game engine audio with unity and unreal engine and some of the different specific considerations of creating these spatialized audio there he also runs the immersive audio podcast and has published over 100 episodes i actually had interviewed him and then he interviewed me and so i'm on a previous episode that aired at the end of the year last year i'll put a link to the show notes that you can go listen to that So I was sharing a little bit more of my own personal experiences about special audio, but I wanted to get a lay of the land of what's happening in the realm of special audio on today's episode of the Voices of VR podcast. So this interview with Oliver happened on Wednesday, December 11th, 2024. So with that, let's go ahead and dive right in.
[00:01:47.109] Oliver Kadel: My name is Oliver Cadell. I am a sound designer and sound engineer. I work predominantly in spatial and interactive audio for immersive media, such as VR, AR, mixed reality, 180, 360, immersive films, gaming, virtual training and location-based installations, you name it. I'm a founder and audio lead at 1618 Digital. We are based here in London. We've been going for just over 10 years now. We're a small team. We offer a full production pipeline for location sound recording to sound design and post-production to game engine sound implementation. We've just recently partnered with Bleed. And here at Bleed, we are a client facing post-production facility that services traditional productions as well, such as feature films, docs and TV, indie productions, music and so on. And now also immersive content as well. I'm also conducting postgraduate research as part of my PhD at the Audio Lab at University of York. My thesis is on spatial audio impact on cognitive load and memory retention in the context of virtual training in VR. And similarly to you, Kent, we also have a podcast, very imaginatively called Immersive Audio Podcast. Myself and my co-host, Monica Bowles, talk to industry leaders, companies, academics, and artists, all things spatial audio. It's been going for almost seven years. And earlier this year, we celebrated our 100th episode anniversary at South by Southwest in Austin.
[00:03:26.424] Kent Bye: Very cool. And so maybe you could give a bit more context as to your background and your journey into the space.
[00:03:33.348] Oliver Kadel: I was born and raised in Latvia, in Baltics. I came to the UK in 2004 to pursue my education in sound engineering, essentially. Initially, I was based in Kent, which is a county in England. Then I moved to London in 2010 when I applied for university. After completing my bachelor's degree, I was offered a scholarship for my master's in 2013. This is where I started doing production sound and sound to picture. And during my studies, I got into working with surround sound mixing for films and music and also recording ambisonics. And then around that time, this early experience with Spatial Audio led me to some opportunities to get involved with immersive media productions back in early days of XR around 2015. At the time, the 360 film productions for brands, documentaries and training were very common. Still call it the golden era of 360. But very quickly, the industry started to shift. They focus on interactive content and then the ratio between linear productions and interactive sort of flipped over the years.
[00:04:52.212] Kent Bye: And so when you were starting to get into spatial audio, I know one of the things that as I've been covering XR over the years, there isn't necessarily like a standard audio format for spatial audio. If you're using a game engine, then you might use Unity or Unreal or I know there's like ambisonics, there's like Dolby Atmos, and I know that you've been working on a number of different types of formats. And so I'm wondering if you could maybe, before we start to dive into some of the projects, how do you start to see the different formats that are out there? If you feel like there's some emerging standards that are coming out in terms of immersive audio or ambisonics. I know that YouTube and Facebook, you know, they each had their own spatial audio platform. formats that they were supporting in the context of 180 or 360 video. But I'd love to hear some of your ground setting in terms of the different standards or formats that you see in terms of the work that you're doing in spatial audio, ranging from the existing surround sound into the more spatialized and then object-oriented sounds as well.
[00:05:56.145] Oliver Kadel: We definitely have seen an evolution of audio formats and also the emergence of new ones. I guess it's useful to look at it from a couple of angles. As you've mentioned, there's a game engine workflow paradigm where we've got real-time object rendering, spatialization with binaural decoders and maybe even virtual acoustics and so on. And then we also have this linear post-production paradigm where we use formats such as Dolby Atmos, Ambisonics, MPEG-H. Sometimes we use a combination of multiple formats. For example, Facebook 360 that is supported on Meta Platform can support two parallel playback streams, one 2D, which essentially acts as a headlocked stem where you can place non-diegetic music or narration in addition to an ambisonic stem where you can spatialize your diegetic elements and ambient tracks and so on. So I guess we know further or better off in terms of diversity of these formats. It's just a case of understanding what works where, choosing the best options. And sometimes just it's a simple case of knowing how to navigate a particular platform for publishing and distribution and working with that particular format.
[00:07:23.737] Kent Bye: Yeah, it seems like that depending on where you're going to be publishing it, then it's going to dictate what formats are even supported. Because I know that there was some ambisonics that were supported on, say, YouTube. And then I got some comments that said that it was degraded or it wasn't working anymore. So you have the object-oriented, let's call it the game engine approach, where you can put individual audio sounds anywhere in 3D space. And then the Dolby Atmos is sort of doing that, but it's a proprietary format that is only encoded and decoded if you have a license for it, which you start to see that a lot more on Apple Vision Pro. And then Ambisonic seems like it is a format that has been around for a long, long time, but has seen quite a resurgence when it comes to actually being able to render it out with these 6-dof or 3-dof tracked headsets that can actually... take those individual audio streams that are like four different audio streams that then are kind of like muxed together to create a sound field. And then I haven't really come across MPEG-H. And so maybe you could explain a little bit, like what is MPEG-H and how does that fit into the rest of these different formats?
[00:08:30.285] Oliver Kadel: Well, I think the difference is that some of these formats and codecs, they're not open source. So they need to be licensed by the equipment manufacturers and software application developers in order to use them. And that comes with the cost and therefore the mass adoption of those codecs could be limited by that. Dolby Atmos is definitely one of the most ubiquitous ones, one of the most successful ones because has been widely adopted in the cinema industry and then subsequently by Apple Music and beyond that. And MPGage was developed by Brandhofer Institution. I don't want to go through the list of areas and devices where it's currently being adopted because I don't want to be inaccurate, but obviously it's not as ubiquitous as Dolby Atmos, for example. But I know, for example, that Sony 360 is Format does use MPEG-H with their proprietary binaural decoder. Ambisonics, on the other hand, is probably one of the most ubiquitous and widely used formats that's sort of appreciated and is utilized across all of the above because you can integrate ambisonics in your production pipeline regardless. It's a very, I would say, in my opinion, easy format to work with from capture, as in on the location recording, location sound with camera as well, tracks, all the way to implementing ambisonics as part of your post-production process in DAW. And also we've seen the evolution of tools for ambisonics and the increase of spatial audio resolution and the use of higher-order ambisonics is very common as well. Apple Spatial Audio uses Ambisonics and with new Dolby tools, you can integrate Ambisonics and do conversions with encoding and decoding. So yeah, there's no easy answer or easy summary here. I think we see a number of options that have advantages or disadvantages on technical level, but also Ultimately, these technologies being adopted and pushed by certain technology companies that control publishing and distribution platforms. And you're absolutely right. The YouTube spatial audio is currently broken. I was really surprised to see some of the pieces that were published over the years no longer decoded. So they completely abandoned the support, but newly announced codec that is currently being developed by Google and its partners. which is called Immersive Audio Model and Format, IMF. I hope that maybe the new codec will be re-implemented and it's going to be an open source codec. So it would be interesting to see how that impacts things going forward.
[00:11:16.184] Kent Bye: OK, well, that's I guess good news and bad news of like there are some projects that I had done a number of years ago and then saw a bunch of comments because I have an ambisonic sound test that I did on YouTube. And then all these people were commenting, oh, it's broken, it's broken. So then I was like, oh, crap. And I also did another experiment where I did Beethoven's Fifth Concerto. with the circle of fifths. And so for each note that was being played, it was basically ambisonically put into a 3D sound field. But when I think about ambisonics, I think about it kind of analogous to a 3DOF headset where you're able to have your head rotation, but you're not able to move through space. just because the sound field is centered into one location. Whereas opposed to say something like a game engine where you're able to position these sound sources and these sound emitters in 3D space so that you could actually have a 6DOF experience with that. And I'm wondering if your experiences with Dolby Atmos, like if you feel like that's sort of a hybrid or if you feel like it's more of like recreating that sound field, you know, I guess it sort of depends on Your speaker system, if you have a soundbar, if you actually have like a Dolby Atmos enabled array of speakers. But even then, usually if you're watching TV, you're not really moving around through six off spaces. But I'm curious to hear your take on that. the experience of some of these different spatial audio formats. If the Dolby Atmos is actually more similar to the ambisonics in the sense that it's creating a sound field, but you really can't move through it in terms of how it's currently being implemented across these different platforms.
[00:12:51.357] Oliver Kadel: Well, first, I just want to make a quick comment about the YouTube. There are other third party websites and kind of small independent platforms that allow you to upload ambisonic content. So if, for example, you know, for test purposes or to showcase something and the old school manual side loading of ambisonics onto headset also available as an avenue. But going back to the point, yeah, you're absolutely right. So I personally think the use of ambisonics in the game engine environment maybe is not as common as, for example, immersive films, which is very convenient and computationally efficient format to work with. And it's perfect for head rotation with head-mounted display for 3DOF content. That's not to say that ambisonics can't be used in game engine. You know, obviously, I think most sound designers and developers use object placement and spatialization paradigm with maybe virtual acoustics. But sometimes you do have a library of ambisonic recordings of different environments that could be also a very efficient way to convey the sense of space and ambience without necessarily being bogged down about the SIG stuff conundrum because if it doesn't contain complex spatial cues that are baked into a particular space within the sound field, then it's not an issue. Anything that requires a visual reference and interactivity or that complexity of whether it's an object or an event, you would convey that through use of objects versus ambisonic files. And largely speaking, there are third-party tools that can support ambisonics decoding in game engines, although maybe up to first order, which is four channels, four components. I believe Blue Ripple Sound can do internal app mixing up to third order, 16 channels. But also there are middleware options such as Wyze that can allow you to go even higher than that. And then going back to Dolby Atmos, for me personally, I think different projects, different use cases require different tools and different formats. And that's, it's just as simple as that. And certain formats just excel for particular use cases, even from game engine to game engine, you probably have a different set of tools that you like to use. And it's the same for other examples. But what I like about Dolby Atmos is the ease of switching between two maybe mixing paradigms, one on speakers where you can do something for TV release or theatrical release and you can switch to headphones and work on something that is for bespoke application or a streaming service.
[00:15:40.543] Kent Bye: What I've seen is that depending on the output format is going to determine what type of tools and processes that you're using. And so I did want to ask a question around the Apple immersive video, because I know that you've had a chance to do some of the spatial audio within the context of some of these Apple immersive video experiences with the Apple immersive series. being one of the first that had been released. And so in terms of the types of spatial audio that Apple Immersive Video supports, can you elaborate a little bit on the different types of formats that you tend to be drawn to for that specific output format?
[00:16:17.478] Oliver Kadel: I personally believe that interoperability of these spatial formats, whether they're open source or proprietary, is going to be a key to success and kind of collaborative spirit across the industry. And also, if we're going more specifically discussing a codec functionality, again, I think from the content creator's point of view, I think the flexibility and the ability to incorporate multiple features and say 2D and playback stream, higher down the sonics and object rendering would be the ideal case scenario because then you tick all the boxes and it's good for content creators and it's good for consumers because the end product is of appropriate standard and perhaps even newly improved standard. That's what I would like to say is I think these are the key things that we should be aiming for as an industry as a whole.
[00:17:11.738] Kent Bye: Because when you're watching an immersive video, they don't really want to have your head moving around anyway, because it gives you sort of a bad experience when you are doing a six DOF action on a three DOF content, because it breaks the whole experience. So I can imagine that all the formats that are optimized towards that, and then maybe even stuff beyond that is going to be kind of thrown in there. So...
[00:17:32.836] Oliver Kadel: I'll say one thing on that quickly. Despite 180 video format, the audio still remains 360 because you don't want to have like an acoustic hall, some sort of auditory gap behind you. So it really actually helps to sell the immersion and cultivated sense of presence and space with 3D audio being implemented within 3DOF experience, even though visually it's 180. And with the lower hemisphere as well, all of that is a good thing. And it's unlikely that you would see some kind of like front-based audio format that neglects all dimensions succeeding in this kind of context.
[00:18:14.298] Kent Bye: Okay, that's a good point just in terms of with the 180 immersive video, I've noticed that in some of the experiences you get penalized if you turn your head. And so you get trained to just look straightforward without even turning left or right. Because if you turn your head left or right too much, you start to see the edge of the film. I think if it was 270, I'd encourage a little bit of turning your head without penalizing. feeling like you were seeing the edge to break that visual immersion, but it makes total sense that the audio would not face those same constraints to have that full 360 audio sphere. Okay, well, so I wanted to also just have you go through your own career and working with Spatial Audio and give me some of the highlights in terms of the different projects in terms of either big projects that you think are worth mentioning or milestones or trends within the industry. So yeah, just love to hear you speak around a little bit of a retrospective of the different types of projects that you've worked on in your career and working in the industry in XR.
[00:19:11.710] Oliver Kadel: First of all, I must say it's been quite a ride as anybody who's been involved in XR industry over the years would have seen this extraordinary evolution from Oculus DK1, DK2 times in 2013, 15 to now Apple Vision Pro, everything that happened in between. As for technological shift in audio, it's not been dissimilar. In the early days, there was a complete lack of tools for spatial audio post-production and even on the recording side, the options were very limited compared to what came up on the market since. I remember the emergence of Two Big Ears, which was subsequently acquired by Facebook, now Meta. This was one of the pivotal moments when these tools were made publicly available for free for the community, which made the process of 3D mixing and encoding immersive films much more accessible for anyone. And there was a direct outlet to the platform such as YouTube and Facebook. But of course, there were other tools such as BlueRipple Sound and others quickly emerged afterwards. And there's a healthy selection of different choices at the moment. And also, it's worth mentioning that besides YouTube and Facebook, there were also platforms like Steam and PlayStations and still are. A lot has changed since, as you already mentioned. YouTube spatial audio is currently broken and not that anybody cares about posting 360 videos on there at the moment. The new immersive audio model and formats IMF codec is being developed by Google and its partners and it's going to be an open source. Just covered the codecs and it's going to be interesting what impact it's going to have. So Facebook since became Meta, which fully embraced this future vision of extended reality and released a number of head-mounted display models, which I believe is still the largest count compared to other hardware models. Obviously, the latest Quest 3S is a fantastic value for money for the entry-level mixed reality headset. Before we go to that other point, like over the years, we worked on a ton of projects that were released on Meta platform. And we've been lucky enough to work on a whole list of immersive experiences with David Attenborough about natural history. These were commissioned by Meta and produced by Atlantic Productions. Now, again, we see a resurgence of immersive films on the back of Apple Vision Pro release. And it's an important moment for the industry as it really demonstrates further opportunities for the linear immersive storytelling. And I believe we'll see more that grow in the near future. Contrary to maybe some people thinking that the era of linear immersive content is over, perhaps we see a different scenario unfolding. what i really enjoyed in particular is the diversity of the briefs and as i've said before anything from natural history to vr animations to gamified experiences to branded installations to education and virtual training and i think the world of immersive content is as wide as rich as the life itself and it continues to develop that way and um I know that we met in person a couple of times at the Venice Immersive Island and always illustrates all the latest, the greatest from the community. You see how diverse the projects have been, how innovative. And as an industry, we're maturing and we see less gimmicks. We truly see experiences that wouldn't be possible without spatial computing and immersive technologies. And that's really exciting. Yeah, on a personal level, we've been lucky to work on a whole list of immersive experiences with David Attenborough. And those included First Life, Kingdom of Plants, Micro Monsters, Conquest of the Skies. These 180, 360 footage-based and animation-based 3DOF experiences about natural history have In terms of sound design and creative freedom, this is as good as it gets. You get to sonify these worlds and these events and objects that kind of operate or exist on non-human scales in terms of size or time. And I think that's really satisfying if you can create that illusion that essentially suspends disbelief. Obviously, if you pair it with spatial audio and all the creativity that comes with that, it's a particularly enjoyable process. I would say from sound design point of view, maybe these were my favorite projects to work with. Another one, unique aspect of audio professionals who work in immersive compared to maybe traditional where things are more structured and departmentalized. We kind of evolved from this place. We were involved early with projects and maybe sometimes even contribute creatively planning a project and experience that often involves traveling to the location. You know, I've got a whole bunch of crazy stories where, you know, being detained by Turkish army and then having tea and coffee with the Turkish army commander. all the way to being in the jungle with a female guerrilla fighter, dancing to communist music, all kinds of craziness. So these are unforgettable experiences that you get to go and experience and record and be there and feel the place and feel the emotions of those people and those stories and then take care of that. And not only from technical point of view, the things that just necessities, but also continue to develop Whatever that is, whether that's a story, the emotion, those kind of more nuanced things, continue to develop and enhance that in post-production and see the project till the end. And it's a very unique and privileged position to be in. And I think probably by far the most enjoyable and satisfying aspect of the work we do.
[00:25:25.839] Kent Bye: So speaking of Venice, there's been a couple of pieces that have shown at Venice that end up being a little bit more like Unity-based or at least game engine-based with Sixtoff Sound. We're going to see something like Alice in VR Wonderland or Letters from John C. I'm wondering if you could speak to the process that you go through when you work with pieces that are more Unity-based and how that changes your workflow in terms of the audio production, if you are still able to add like ambisonic audio mixes, or if you tend to more lean upon the affordances of the game engine that allow you this more object-oriented sound design?
[00:26:04.221] Oliver Kadel: Yeah, it's a great question. And we can talk about Letters from Drancy, which was part of the three films about Holocaust commissioned by Illinois Center of Education based in Chicago and produced by ECT Films. This is a perfect example. It was extremely rewarding and enjoyable experience to work on these immensely powerful films, but also a very delicate topic and contributors as well who are very old. The names of these titles, Letters from Drancy, Walk to Westerbork and Escape to Shanghai. These films are roughly 20 minutes long each and they were produced by a large number of crew all over the world across the period of 18 months. And in terms of capturing sound and interviews for the scenes that were captured on camera, that was specifically 360. We would go and record audio on a location with ambisonics and spot mics using radio lavalier microphones and ambisonics placed as close to the camera as possible because that essentially represents the position of a sound field, your field of view. but also capturing a bunch of wild tracks that then could be useful for post-production. So in a linear paradigm, essentially the scene is built on an ambisonic recording, which may or may not have spatial cues baked in. If it does contain diegetic elements like an interview person on camera, then obviously... That would be in sync. A radio spot mic would be spatialized as an object according to the object position in 3D space and aligned with its same position with an ambisonic recording. And then that could be further enhanced or supplemented with narration, music, or additional sound design elements. And these films had a mix of linear 360 footage and the sound recorded, post-produced the way I just described, And it also had six DoF elements, which were made with motion capture and animation. And obviously that would have been done in Unity. In fact, all these films were compiled in Unity. So this is where it changes a little bit because essentially these are memories, these are imaginary elements. You start working with them with digital silence in vacuum. You have a lot of creative freedom, I suppose, working closely with the directors, Darren Emerson, Charlotte Mikkelberg and Marie Matheson to develop a soundtrack that meets an objective. This is where we would rely on... ADR, sound design, whether it's recorded specifically for the project or utilize something from the libraries, you essentially recreate that scene from scratch. One of the more memorable scenes is our protagonist is escaping from one country to another and they're crossing the border, they're in the back of the truck. There's a Nazi soldier who's searching through the track with a torch and the dog's barking and there's a recording of a real actor speaking in German and you sitting there in the back of the dusty track with them and you feeling the weight of the situation and the risk. And obviously using hyper-realistic sound design approach and mix would be the way we would convey that feeling. Versus in the same film, versus to a situation where maybe there was a scene where the mother is taken away by militia and taken to the concentration camp, which is a very sad scene. And there was not much else other than the voice of Marian Deichmann, our protagonist, narrating this experience because she's talking from her own memories and music and just wind. And you're just feeling the emptiness and the purity of that moment. The way we would switch to Unity workflow would be in a following way. So if we know that the experience will be largely experienced in a seated position, and even though it's technically sixed off, the audiences wouldn't be walking around. They would essentially just be looking around, being in the same spot. So we can use that to our advantage and pre-mix everything to 360 video, rendering 360 video from Unity as a 360 VR animation. In a similar way we would do with live captured footage, we would spatialize objects and use whatever we want, package it all in and send it back to Game Engine. if however this did involve six degrees of freedom in a more meaningful way then we would export individual sound effects and stems in a granular way as much as possible where every footstep every voice every element would exist as a separate sound object which would be then attached to a visual object in Unity and then rendered real time according to your position in space.
[00:31:31.601] Kent Bye: Yeah, that's really helpful just to hear. And as you were speaking around letters from Johnsy, these memories of being in the back of the truck were so vivid just because it is such an intense scene in the context of the narrative, but also your vision is so occluded that you don't see a lot, but you actually hear a lot of that sound design. So yeah, that moment had already been in my mind as you were talking about these different experiences. And then I'm really glad that you took the time to kind of elaborate that and also your workflow and process. And so when I think about ambisonic audio, you had talked around the different orders. So the first order would be four channels. So that's four wave files that then get muxed together to create a sound field. And the second order would be eight channel. And then the third order ambisonics would be 16 channels. And I've noticed that most of the places that you're distributing it only maybe support first order, maybe second order, but I'm not sure if you've ever found the higher order is worth it or that if it depends on that display format, like if you had an array of speakers, then maybe it makes sense to do like a third order ambisonics, but I'm curious from your perspective if you tend to only do first order with four channels or if you go up to eight channels and then mix down to four channels love to hear some of your thoughts on the different orders of ambisonics that are out there and as a spatial audio expert what you've tended to see in your own processes absolutely so i would say that first order would be the most ubiquitous
[00:33:02.219] Oliver Kadel: However, the difference between first order and third order is substantial. And even for uninvited audience, they'll be able to perceive auditory difference very clearly. There's more depth, there's more detail, obviously something always worth implementing if technology, the platform allows to do so. In most cases, even when we know that the experience is being delivered as first order ambisonics or contains elements of first order ambisonics, we still do post-production in higher order ambisonics, at least third order to future proof. But often it's also the case of the experience being distributed across multiple platforms. We just have to add deliverables to different scenarios. When it comes to high-order ambisonics beyond deferred, this is my personal opinion. I know some industry experts would agree with me, some may disagree. I'm talking in the context of efficiency and what's available out there that is more of an industry standard rather than a technically advanced academic research where things are being done in a completely different scale and purpose. But when it comes to everyday life, I would say beyond third order ambisonics, we start to face the law of diminishing returns where maybe the auditory difference is almost negligible or unperceptible by a regular consumer and therefore is not worth all this additional work and resources to be able to execute that. And in fact, speaking on commercial level, the ability to publish something beyond fifth order is pretty much non-existent. So... Generally speaking, most of the projects that we worked on and are currently out there in the world are centered somewhere around between first, second and third order ambisonics with additional elements such as 2D playback stream, for example. In a world of game engines and audio middleware like Wyze, you can work with up to fifth order, I believe. So that option is also available in that world of gaming or interactive experiences that are made with the use of game engines. There's still a clear distinction between game engines and digital audio workstation workflow.
[00:35:23.652] Kent Bye: Okay. Yeah, that makes sense. And you tend to see something similar in terms of you shoot in the highest resolution that you can, and then you may down mix it to whatever the output format is going to be, but you want to future proof it a little bit, but that's really helpful to hear because I've only familiar with first order. And so it's great to hear that you're looking at third order and then potentially even more as you go forward. One other question I want to ask is in terms of when we see a lot of different experiences on the festival circuit, there's a couple of them that had audio as a centerpiece, like installations at IFA doc lab that comes to mind where you actually have an array of speakers that go beyond like just the headset, but By and far, I see most of the display through either a VR headset. Back a number of years ago, there was the Bose AR frames, and now we have the Ray-Ban Meta smart glasses that have more spatial audio. But I still don't see a lot of audio-specific exhibitions or projects. And I'm curious, from your perspective, when you start to think about... the real milestone audio installations with spatial audio or even like projects that are out there. What are some of your favorite spatial audio experiences or projects that you've seen that you think push the edge of what's possible with spatial audio?
[00:36:39.295] Oliver Kadel: Yeah, it's a good question. Without mentioning anything too specific, I think the most powerful spatial audio experiences that are audio-centric or without any digital visual elements display are possibly live music of any kind of genre or venue setup, art installations and performances, whether they're headphone-based or speaker-based. Also theater, immersive theater and various amalgamations in that space. But I agree in terms of in XR world, we tend to see more where spatial audio is combined with visuals, projections and videos or animations and... Maybe that's just a natural side effect of immersive content. People think other media, other technologies first and then sound kind of dances along to complement that. Maybe the use case of audio-only immersive experiences is limited. That's not to say that it doesn't have any value, but I think maybe it's more effective, more powerful to make something that combines multiple technologies, i.e. engaging multiple senses, because we're entering the realm where it has a cumulative power and this effect of being immersed and present in that experience, you need to engage with as many senses as possible. And this is why we also see the addition of haptics and olfactory and all kinds of other interesting elements or even like onboarding, anything that can enhance the experience and make it more compelling.
[00:38:20.655] Kent Bye: One last question before we start to wrap up, I wanted to ask you around your PhD research and looking at some of the different effects for the effectiveness and learning and cognitive load and just the impact of spatial audio when it comes to some of the different research that you're looking at as you get your PhD. I love if you took a few moments to elaborate a little bit more on what you're finding and what the impact is when you start to look at spatial audio in the context of these immersive learning environments.
[00:38:47.615] Oliver Kadel: I've been involved with my research for three years. I recently had my progression panel. The next phase involves data analysis, which is perhaps one of the most exciting elements, but also more technologically complex to work with. So my study on impact of spatial audio on cognitive load and memory retention has been deployed about 18 months ago. My industry partner is Bodyswaps. It's a social training company based here in London. They do a whole range of social training scenarios from job interview to racial equality to public speaking and presentation skills. And this is the training module that we collaborated on and decided to implement the study. It's a very audio rich learning experience with multiple chapters, different subjects and different non-playable character coaches delivering different types of content. And there are multiple self-assessment points where you self-assess, self-measure before and after learning experience. and also includes a number of simulations where you can practice and deliver what you've learned and the device records your voice and video within VR that you can look at yourself afterwards. So it's very data rich, anything from gazing to the success rate to qualitative elements of the exit survey where we asked questions how clear it was for user to receive the instructions from coaches how natural the environments felt that they spend time in and what was the repetition rate versus completion rate and the list goes on and on there's a lot of data that we have it's been anonymously collected from those participants who opted in We also have control and the next phase is essentially to analyze data. We're using Python and MATLAB to look at any statistically significant patterns and differences between two versions of the experience. But this is something that's currently in the process and whatever the results, I'm sure to present it to the community. not least as part of my PhD commitment, but also for community at large so that we can look at that element together.
[00:41:24.136] Kent Bye: Awesome. And finally, what do you think the ultimate potential of XR spatial computing and spatial audio might be and what it might be able to enable?
[00:41:36.083] Oliver Kadel: It's a great question and it's a difficult one. And there were so many great ideas voiced on this podcast previously. I'll say one or two things. I think the use of spatial audio will continue to grow and mature across our industry. This means higher spatial audio resolution, high quality spatial audio rendering across different publishing and distribution platforms and methods. Inevitably, over time, it will become an industry standard, which ultimately improves the overall quality of applications, art experiences, entertainment, or indeed even efficacy in education and training. As for the users, I see more elaborate personalization features and not just personalized HRTFs or head-related transfer functions. We're all essentially neurodivergent to some extent, and therefore we have different preferences in pretty much all areas of our lives. how we consume and navigate stuff in this digital domain whether it's fully immersive or something that's banal or just functional is really no exception a good example could be apple airpods pro earbuds now paired with your phone you can get access to very advanced audiometry analysis right which typically would be done by a medical professional at the external facility for a good fee But now you can perform that in minutes at your home and benefit from personalized playback and even hearing loss compensation, which essentially functions as a high-end hearing aids device. But we can take that further and we can think about hearing loss prevention maybe as part of this digital reproduction of spatial audio over headphones powered by machine learning, because I think that's an inevitable element in the equation. Or perhaps optimize 3D audio presentation, which can reduce cognitive load or augment certain properties of sonic stimuli that convey to our brain that can enhance memory. So I think there's still plenty in our field that remains unexplored and it's an exciting time to be around.
[00:43:50.755] Kent Bye: Awesome. And is there anything else that's left unsaid that you'd like to say to the broader immersive community?
[00:43:56.383] Oliver Kadel: Yeah, I think lastly, in my personal view, I believe the immersive audio sector is in a good shape. Over the recent years, we saw a lot of activity in research and development, close tie collaborations between academia and the industry, global tech giants investing heavily in the in-house expertise, IP, hardware development. We saw this punk convergence across different fields of our industry. Now spatial audio isn't just for XR and immersive entertainment, but also for education and training, music and podcasting, telepresence, live sound, theater, even virtual prototyping and simulations, and even civil engineering. And the list goes on and on. And I think the general view of content makers and users is that expectations and standards have been risen. And thankfully, when we speak at the conferences and industry events, we're no longer discussing why stereo sound is not good enough for VR. And those kind of basics are in the past. And that's a good thing for everybody, for the larger XR community and general consumers who will have an ultimate impact on whether or not this whole movement can become fully mainstream. And this hopefully creates more professional job opportunities for current and future generations of audio professionals.
[00:45:21.488] Kent Bye: Awesome. Well, Oliver, thanks so much for joining me here today on the Voices of VR podcast. I know that audio for me is like one of the most powerful aspects, immersive experiences. Sometimes when it's done really well, it's invisible and it's operating at this unconscious level that, you know, when I go through immersive experiences, sometimes when I come out, the audio is something that is not at the top of my mind in terms of speech. speaking about it. And then when it's done really, really well, I feel like that it can really have this huge impact when it comes to creating these deep levels of immersion. So it's sort of a blessing and a curse to be working in a field where if you do it really right, no one really notices the impact of it. But I can definitely say that, you know, as you were going back and talking around like the scene from Letters from John C, where the visuals were so sparse, the thing that made that scene so powerful were all the sound fields and audio sound design of that experience. And Just also, I feel like the audio is the thing that is probably the most underappreciated in the larger industry, especially when it comes to like the amount of CPU that's dedicated to creating it. But I think over time, we're going to see more and more of spatial audio taking more of a centerpiece in terms of recognizing the importance of creating this deep level of immersion. And yeah, it's just really fascinating to hear from your perspective, all the different things that you're working on in terms of the space and all these amazing projects that you're also a part of as well. So highly recommend folks go check out the 1618 digital website to see some of the different experiences to go check out for themselves. And yeah, just really appreciate you taking the time to help break all this stuff down today here on the podcast.
[00:46:55.551] Oliver Kadel: Thank you so much.
[00:46:56.946] Kent Bye: So that was Oliver Cattell. He's a sound designer and sound engineer and the founder of 1618 Digital. So I have a number of different takeaways about this interview is that first of all, well, it's just really interesting to hear some of the different workflows and pipeline and how, you know, he's doing a lot of work with linear media and immersive stories. And so there's a lot of use of ambisonics, which we don't hear a ton around when we hear about like Unreal Engine or Unity based projects because it's so object oriented, but it's ambisonics is something that so first order second order third order he says he's usually producing stuff in third order so first order is four second order is double that so eight double that 16 so third order is 16 different channels he said it goes all the way up to fifth order so that's like 64 channels but he said that there's kind of a diminishing returns after you get beyond just a third order of producing ambisonics And so when I watched the Apple immersive video, because it's 180, I don't find myself moving my head around as much. And so sometimes it was, I'm listening to it. My perception isn't good enough to really identify all the different specialization just because mostly it's easier to identify when you are moving your head around and you can really hear it, but it's, Because they're having such a headlocked position, it's harder for me to perceive the fully immersive spatial dimensions of that audio. Some of the other productions that he's done on, say, the Unity-based projects or some of the stuff that he's done with David Attenborough, because it's more of a 360 context environment, it's easier for me at least to perceive some of the different spatial audio that's being produced. Although he did say that even though you are watching 180 video, that there's no constraints of having just 180 front facing audio that you can actually be immersed into all the different audio fields. And so, yeah, I think, again, audio is one of those things that's very subtle. When you do it wrong, you really notice. And then if you do it really correctly, then it can be operating at such an unconscious level. We talk a little bit more about that in the conversation that I have with him and Monica Bowles on the Immersive Audio podcast podcast. I guess one of the other things that I was really excited to hear was that there's work on an open standard. There hasn't been an open standard that's out there. And so this immersive audio model and format, IAMF, is super exciting just to hear that there's going to be a little bit more of a standardization around that. A little unfortunate that YouTube has a broken implementation of ambisonic audio within the context of YouTube. And maybe as time goes on, they'll start to implement more and more of this, especially as they start to get into Android XR. And it's just also really interesting to hear a little bit more of like the history and some of the very early plugins that you could start to produce and author some spatial audio within the context of these game engines. And yeah, I think with the Apple Vision Pro, there's been a little bit more of a resurgence of this linear immersive video in the context of Apple Vision Pro. It's mostly 180 video, but even within the context of the MetaQuest, you can start to dive into the MetaQuest TV and dive into all sorts of different 360 and 180 video productions. And You can go check out the 1618 Digital and see the list of all the different projects that they've worked on over the years to go check out some of these different experiences. The David Attenborough ones are ones that I particularly enjoyed and I've had a chance to see at Venice Immersive Film Festival, as well as the letters from John C. and the whole trilogy as well was also really well done. So that's all I have for today, and I just wanted to thank you for listening to the Voices of VR podcast. And if you enjoyed the podcast, then please do spread the word, tell your friends, and consider becoming a member of the Patreon. This is a listed supported podcast, and so I do rely upon donations from people like yourself in order to continue to bring you this coverage. So you can become a member and donate today at patreon.com slash voicesofvr. Thanks for listening.