I interviewed The Vivid Unknown creator John Fitzgerald at IDFA DocLab 2023. See more context in the rough transcript below.
This is a listener-supported podcast through the Voices of VR Patreon.
Music: Fatality
Podcast: Play in new window | Download
Rough Transcript
[00:00:05.452] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR Podcast. It's a podcast that looks at the structures and forms of immersive storytelling and the future of spatial computing. You can support the podcast at patreon.com slash voicesofvr. So this is episode 8 of 19 of looking at different digital storytelling and immersive nonfiction pieces at IFA DocLab 2023. Today's episode is with The Vivid Unknown by John Fitzgerald. So this is a three-channel video art installation that is basically an AI film that's being trained on the 1982 documentary Koyaanisqatsi by Godfrey Reggio, which, quote, hypnotically explored the ever-increasing tension between nature, humans, and technology, unquote. It's a film that is really observational. There's not any narration and just juxtaposing these images of humanity and nature and the relationship between the two, man's impact on nature. And it has this amazing musical score by Philip Glass. And so John Fitzgerald got a hold of Godfrey Reggio, the director of Koyaanisqatsi and was able to get permission to train an AI model based upon the different images that are shown throughout the course of Koyaanisqatsi. And so then he created this three channel installation that has like 12 hours of footage that is created from this AI model and the audience is able to go up and move around and so it's tracking the different audience members position in space and kind of averaging them out in some sort of background Algorithms, so the way that the collective audience is moving around and interacting with the piece also changes the pace and the ways that the piece is actually shown. So there's a whole other collective agency dimension of this installation, but at the core, it's a three-channel video arts installation that is showing these stable diffusion-esque translations of Kuyanaan Skazi. So that's what we're covering on today's episode of the Voices of VR podcast. So this interview with John happened on Tuesday, November 14th, 2023 at IFA DocLab in Amsterdam, Netherlands. So with that, let's go ahead and dive right in.
[00:02:13.931] John Fitzgerald: Hi, my name's John Fitzgerald. I'm an artist and filmmaker, primarily working in immersive and interactive content.
[00:02:23.235] Kent Bye: Maybe give a bit more context as to your background and your journey into the space.
[00:02:28.318] John Fitzgerald: Yeah, I studied filmmaking in college and immediately was attracted to the documentary form and at the same time was working in After Effects and like 2.5D compositing and image creation. I spent a bunch of years traveling and worked with a bunch of different film communities in Bolivia and afterwards in Brazil and returned back to New York. I guess it was about 2014 that I co-founded Sensorium with another artist and filmmaker named Matthew Niederhauser. So we've been based in New York with our company there for about 10 years. We have an interesting model. It's a bit different than a lot of other companies that rushed into VR. We just kept a really small team. He and I, no overhead, and we would kind of juggle commercial work and then creative projects and we would pay ourselves for the commercial work and then get to do our own kind of experiments in First VR, and 360 video, and then AR. Created a number of projects that have traveled to Sundance, and IDFA, and Tribeca, SIGGRAPH, and other projects like that.
[00:03:37.084] Kent Bye: Well yeah, it may have been exactly a year ago that we also talked here at IFA.Lab where you were here with Sensorium with this motion capture studio. And I also, since then, had a chance to visit Onyx Studios there in New York City where there was the annual summer exhibition of the different projects and so we had a chance to sit down and talk about a little bit more about the future work that you were each working on because artificial intelligence and machine learning and the different generative AI tools had really exploded over the past year or so and so there was a lot of projects that were there and I just sat down with you and Matthew, Matthew was talking about Topal Mainzer, and you were talking about the project that's here now at IDFA DocLab. So yeah, maybe you could just give a little bit broader context for this piece that you're showing here and how that came about.
[00:04:29.494] John Fitzgerald: Yeah, so I'm here at IDFA this year with a project called The Vivid Unknown, which I created with a filmmaker, Godfrey Reggio. It's based on a film that he released in 1982 called Kojane Scotsie. Yeah, a bit of background, my personal journey into making work with artificial intelligence. A few years ago was really the first time I started experimenting with GANs and style GANs and building image generated through machine learning processes. There was a, you know, big explosion of tools that became accessible, you know, through Midjourney and DALI and I. Experimented with those, but was always really interested more to like control parameters, you know, inside of more open source software and experimenting with stable diffusion. And yeah, I guess like to dive in, I saw this film Koinoskazi 20 plus years ago when I was in college, and I probably saw it dozens and dozens and dozens of times, and it had a big impact on me, the original film. It's a pretty compelling look at the impact of industrialization on the planet through the vantage point of, well, it's a film that doesn't have any words. It has an epic Philip Glass score that takes you through these moments of meditative, experiential, profound connection to the content and then other chaotic moments. and it looks at the violence of nature in its raw form but also the kind of way that humans have settled on top of nature and in a way enacted a violence against nature and in turn how machinery and technology have an impact against violence against humans as well. And I was fortunate enough to meet Godfrey about a year ago through a mutual friend that grew up in Santa Fe where he's lived since the 60s and Godfrey is this enigmatic, compelling character. It's been one of the joys of my life to interact with him. He's a wise sage who's just imparting knowledge like every time he opens his mouth. He was a monk for about 14 years and had no interaction with technology at all. He kind of says that he grew up in the Middle Ages, away from technology, and he saw a film by Luis Buñuel, Los Olvidados, while he was still practicing as a monk, and it was something that he just started to watch over and over again, and he was working with street gangs in Santa Fe, and he was asked to leave. and the Christian Catholic Brotherhood where he was a part of, and he began working on a project with the American Civil Liberties Union to raise awareness. The first actions of his non-profit were around media literacy and kind of anti-surveillance. You know, it was the 70s and there was a lot of techno-anxiety of things that were developing, and so he started creating a lot of work that ended up becoming this film, Koi Niskatsi. I know that was a bit of a broad background to give context to Godfrey's work, but I think it's important because, yeah, the film, when it came out in 1982, was kind of scoffed at by a lot of critics and scholars, but it also had, like, an immediate cult following. and he ended up completing this trilogy of the Katsi films, the first one being Koinas Katsi, which in Hopi, it means life out of balance, and then Powah Katsi, which was released five years later, which means life in transformation, and then Nakayatsi, which was released over a decade later, which is life at war. So that's the context for his films. I don't know which entry point you want to jump in from there, because that was a very broad answer.
[00:08:19.953] Kent Bye: So yeah, I definitely recommend folks check out Koyaanisqatsi. So Koyaanisqatsi is a film that is very poetic in the way that you watch it from both the Philip Glass score but also this meditation of the relationship between humanity and nature, more about the impact of technology on nature more precisely. How would you describe this as a genre, as a film? Because it's very, like, an observational documentary. Because there's a lot of shots that are durational, that you're watching unfold, but, like you said, there's no narration, so there's no diegetic elements that are helped describing. It's just all associative links and how the juxtaposition of all these things together is giving the sense of this industrialization and its impact on nature.
[00:09:04.513] John Fitzgerald: Yeah, there's been a lot of different words to describe it. I mean, a poetic film, some people call it an essay film or a poetic film. It's also been described as an experimental film, and Godfrey actually bristles at that. He's like, it's not an experiment, it's experiential. So he's been using the word experiential to describe the style of the film, you know, since before that was a buzzword in our industry. but it's a very unique project and there wasn't much of a precedent for this style of filmmaking before it was released and yeah, he worked very closely with a incredible cinematographer named Ron Fricke who was taking apart cameras and learning how to build the possibilities for under cranking and and over-cranking so the film features a lot of altered speeds and Ron went on to direct a couple other films that are in the same style of having no words. A film called Baraka and another one called Samsara. And yeah, the durational approach or the altered timing approach became, it's an interesting thing because it became a style that was used by a lot of advertising firms after the film came out. And Godfrey laughs about this all the time because he's like, you know, we were, describing the beast and the beast turned around and used our description of the beast to promote itself so he's got a lot of funny things to say about that and how it actually became a kind of touchstone for an approach to making this kind of storytelling and Godfrey turned them down but he was invited to direct commercials for Nike and, you know, many other commercial enterprises that in the time that he was asked, he chose not to. And I think that's just a statement about his conviction and approach to this kind of storytelling.
[00:10:55.974] Kent Bye: Yeah, so there's a lot of what we would call now like time-lapse shots, but maybe at the time they were more innovative in terms of how the technology needed to be adapted to be able to shoot this, because a lot of stuff at that time was very much still using direct film and film cameras. But yeah, I guess let's jump forward to obviously you had a huge inspiration. You watched it dozens and dozens of times. You had a chance to meet him. And so with this explosion of AI, maybe talk about this confluence of how it came to be that you were able to collaborate with him and to be able to use Kalyana Scott's film to be able to be source material to train a model from stable diffusion to be able to create the vivid unknown that's showing here at DocLab.
[00:11:37.893] John Fitzgerald: Yeah, when I first met Godfrey, I was compelled to ask him if he wanted to make something. And I thought he would say, you know, he was busy with other things, but he invited me to his studio and was really interested in the work that I had been making with Sensorium. And he didn't really want to get in the weeds in the actual technology of VR headsets or AR interface. In fact, he said, I don't want to make anything in a headset ever. He's like, I don't even want to put one on. But he encouraged me to think deeply about what his films were and if there was a place that we could collaborate together. There were a few requirements in that he had always released his films and given away all of the distribution rights afterwards so that he could have complete creative control over the film. And so he said, I can't actually use any of the footage from Koine Scazzi. but I was able to talk to his lawyer and find out like could we actually use it to train a model which you know didn't actually represent any of the footage that was released in the films and yeah we got the green light to do that and I guess this was about March of this year I was ideating on how to use Dreambooth and exploring ways that I could build open source models, when I started talking with a friend of mine in New York, Dan Moore, who's an incredibly talented creative technologist, and I knew he had been experimenting with a lot of GAN-style art, and his day job is also at NVIDIA, so he knows a lot about GPUs and processing power. and Dan and I kind of began unpacking some of Godfrey's archives, images that he had shot during the time of Coeniscazzi, and images from the Coeniscazzi film itself, which became the core of the model that we used to build The Vivid Unknown. and I was really interested in locking it in a very specific style and time period and having it not include anything after the release of Queen of Scotsy. I didn't want it to be kind of like futuristic AI. I just felt that would be a bit slippery to begin opening up to imagery that was not narrow and we came back to this idea and in using footage that's all pre-internet, it's all footage that was captured in the 70s and 80s, it became a kind of like fertile place to explore what we could do with it. We used natural language processing, this open source software called Clip, to decode the visual style and structure and put language against what the source material is of the actual Queen of Scotsy film, which became the basis for a series of prompts that Dan and I used to start generating imagery in what became The Vivid Unknown. But at that point we were just seeing what it was and what it would look like and seeing how malleable it would be. Obviously, you can't input a video and output a video. So, you know, we were extracting frames and then generating frames. At that point, I was kind of having weekly or every other week calls with Godfrey and showing him some of the weird experiments that were coming out of this. He loved it. He said, keep going. Like, I don't know what it is. Like, it's really interesting. He'd be sitting in the studio at a studio in Santa Fe with his friends, like watching these bizarre image clips that we were sending him and it became a point where we actually sent a batch of images and he selected about a thousand images that he hand curated that became the dominant keyframes for a lot of the footage that we generated. And the approach for The Vivid Unknown was we were looking at buckets of content either that depicted nature or depicted the impact of humans against nature and also these machines and this now ancient technology, the state-of-the-art technology in the 70s and 80s. and that became the source material for The Vivid Unknown. We generated about 12 hours of content. It has a story arc that follows the Koinaskazi film, but Dan and I were experimenting with a way to build an interactive engine that could respond to viewers in the space. I've been loosely calling it a collective viewing experience. which I guess takes me back to like when I first saw Koinaskatsi that I wanted to feel like it was larger than life and something that we could like step into and this is my first attempt at displaying it on a larger format. Here at IDFA we have a multi-channel installation. So it's three 14-foot projections that are set up in part.
[00:16:26.078] Kent Bye: It's like 120 degrees apart or so. It's like yeah, yeah.
[00:16:30.645] John Fitzgerald: And that's kind of the limits of the space of what we had to work with here. But thank you to the IDFA team, to Casper for believing in this early prototype of a project, and to his team, Chip, who's been bearing with me as we experiment with some of the technology as we're using it for the first time here. We wanted to create this interactive engine. Dan built a pretty complex system in Open Frameworks that takes machine learning libraries from the sensors and can track skeletal positioning of viewers in the space. So each person that walks in gets ID'd and we start to track their engagement, their movements around the space, their proximity to certain screens, to each other, and then also their look direction, their gaze at a screen. and each viewer is then averaged across so that the system has a state that either advances or goes backwards based on how much the viewers are moving around and how they're looking at the work. And it kind of models the story arc of the film Koinoskazi in that as more and more people come in and as they look at screens the footage itself starts to play back double time and then triple and it starts getting faster and faster and faster. Then it reaches a kind of breaking point where we're experimenting with this idea of like pixel rot or like data deterioration. The images start to lose fidelity and the system breaks down and then it returns back to like its original dormant state, which is this kind of state of nature that we were exploring with the footage. but as much as I'm describing it has the system that works exactly that way. I didn't really want it to be like a funhouse that you would like step into and have a mirror reflection of yourself on the screen. It didn't seem like that served the purposes of the film. Koinaskazi When we first started generating the footage, it was really beautiful to look at, and I really liked how we could present this kind of alternate form of the film, but the film itself, it has a bit of a sinister feeling about what the state of technology is, and so I also kind of want it to be a little disconcerting that you can't really control it. It is moving in a direction that is going forwards or backwards, and you as a viewer can't have a direct impact on it. That was where we kind of like brought I think a bit more of like a cohesion together in the approach of the presentation Yeah
[00:19:05.958] Kent Bye: So in this presentation of the Vivid Unknown, you have essentially these three screens. The one screen that's facing the audience, and there's a big theater. It's in a location where people can just sit down in a theater. And there's a little bit of a stage there. And then from that main screen, there's two other screens that are around 120, 150 degrees away. So it's like you see the three-channel installation. and there is this experience of like going in and like if you are on the stage it's like at least my experience was like I felt like I was in the way of people watching this as a piece but you were modeling being up in the area to be interacting with it but just like you said I couldn't find any distinct traces of my agency to know that it was even interactive if you as the director of the piece told me and described how the interactivity was working in just the way that you described it now, I wasn't able to decode that. So in the absence of those elements of feedback, then it ends up being like, oh, what's the purpose of trying to interact with it if it's not listening to me? But there is a collective agency. I did find that when there was a group of people there that there was a sense of collective agency, but even then the logic of the interaction isn't so clear as to know how you could coordinate amongst different people to see how that interaction is. So I felt like it's in this weird liminal space where Yes, it is an interactive but the agency is so occluded and it is diffused enough that it almost feels non-interactive at that point because I can't discern any traces of the agency or even collective agency or when you do have collective agency it's then even difficult to have things. I saw the impact of it or I saw it go faster and get louder and it glitches out once it goes too fast and then you see it go back. Yeah, I think for me the more compelling parts were looking at the images and the striking nature of those images rather than any sort of the interactive components because of that difficulty of trying to find any traces of my own agency within it.
[00:21:04.451] John Fitzgerald: Yeah, it's a prototype. It's definitely like something that we hadn't really experienced. Fortunately, as you mentioned earlier, I work closely with the Onassis Onyx studio in New York, where we have a series of projectors, three projectors set up in a space, and I've been testing it, but never with the like full impact of a festival audience, you know, where people are coming in and out. So yeah, I have found myself explaining a lot of things, which is okay too. I mean, I think it's just been a wonderful experience to see how all different kinds of people interact with it. I do think that in the future I'd like to explore other ways that the interaction could have more of a one-to-one impact, but it's something that we're still prototyping and developing. There are like eight things that I would like to build further to create more immersion and explore degrees of interactivity as well. So yeah, hopefully we'll be talking again at some point in the future with a different version of it. I also wanted to add the way that the images progress through states from a natural state to a more chaotic glitched out form. I worked closely with Matt Tierney, an artist and composer, and with Dan, of course, because Dan was the central engine for fueling all the pipeline. The three of us explored ways that we could incorporate field recordings. Dan also built this granular synth engine that deconstructed MIDI interpretations of Quena Scotsi. And then we also were testing MusicGen, which is the AudioCraft sound generation. There's something called the Melody Model, where you can basically input content and it will generate audio based on that. And so we broke down the major thematic moments of the original score and then had the AI interpret them.
[00:23:03.366] Kent Bye: When you say break down, do you mean from a MIDI perspective, or a note perspective, or are you giving it descriptions, or what do you mean break it down?
[00:23:10.378] John Fitzgerald: Oh, giving it like 30 second inputs, sound inputs, and then it can generate infinite content based on that. And that's one of the funny things is that the longer you let it run, the less it sounds like something that's palatable. It's kind of like, you can hear the breakdown of the music. It starts to lose, you can hold it on key, but it also starts to like lose rhythmic patterns and it gets very bizarre. But that was also fun and something that we'd like to incorporate in future versions of this, where you can actually have positions of the viewers trigger certain elements that are related to generated music as well. We chose late in the idea to keep Godfrey's voice present in the installation. He and I have been recording audio interviews over the past year and we extracted like certain things that he said that become these kind of booming voice of God moments that really like interject the space with a feeling of almost like a warning that God free issues. It's similar to an introduction that he gave for the screening of the film Koiniskazi. earlier this year at the Royal Geographic Society in the Netherlands, and it just turned out that that clip was my favorite presentation of his approach to introducing the film. He talks about leaving terra firma and becoming techno firma, and it's all these Godfrey soundbites that ended up being part of the composition that we wanted to present for The Vivid Unknown.
[00:24:50.286] Kent Bye: Yeah, and on the opening night you were saying that it was having trouble progressing through the audio and so it kept looping in a way that I kept hearing the same clips again and again but approximately how long of audio-wise if you were to play the audio clips back-to-back like how long would that be and what determines how those are played because you said it's like 12 hours of footage and so like about how much audio clips are spread out because they kind of randomly pop in with these moments of reflection and insight and these little aphorisms that he has that are in the spirit of Connie Scottsy and so yeah maybe you could describe like how much audio clips are in there.
[00:25:27.023] John Fitzgerald: Yeah, the audio is something that Matt Tierney built that's composed of these generated music compositions and also a kind of mixing and mastering of field recordings. The field recordings, I think, are about 30 minutes. The stretched, granulated synth elements are something like seven minutes and then these kind of compositional chunks that Follow the speed at which the images start to go faster and faster are probably about 12 minutes of music And how much of Godfrey? I think it's about four minutes of clips if it were to run back-to-back but we also didn't want it to be like a linear progression of his voice. But I do have about hours and hours of Godfrey dropping knowledge and it's been interesting working with him because he's been delivering these masterclasses all over the world. He has so much contextual material to present for the project. It's overwhelming to break down. I tried to write out some notes before that I wanted to share with you and I'll have to share them later because it's so much content. But yeah, the audio we wanted to build these kind of additive layers so that it could go from like state zero, advance to the next state and layer on top and then remove some of the lower state content and so it kind of like scales up, pyramids up and then back down again.
[00:26:59.282] Kent Bye: Yeah, so going back to the AI pipeline in the workflow, because I feel like that's a big part of all these AI projects. You mentioned Dreambooth. Maybe you could give a bit more context for what Dreambooth is and what it was able to enable for this project.
[00:27:12.866] John Fitzgerald: Yeah, within Stable Diffusion there's a way for you to kind of build models, and Dream Booth is one of those ways that you can kind of start building unique models. You can also base them on other models, or you can adapt something called Allura, which basically doesn't involve such a heavy amount of input images. We kept ours pretty unwieldy and heavy, so there isn't an actual real-time generation that you see. In the experience, it's all been pre-rendered. And actually, the way we were able to do that was we partnered with an amazing GPU cloud compute company called ML Foundry. They do very advanced server distribution of GPUs. Actually through Matt Tierney I was able to connect with their company and Matt's one of the producers in the project as well and he brought them in to see an early demo of the piece and they had never supported an artist's work in the past. It's a new company, the founder is a genius who worked at a lot of different AI companies before and has developed this cloud compute system. Yeah, like I said, they had never supported an artist project before and I was lucky to be in the room to present it and so they gave us a node of eight super fast GPUs that we were able to have running over the course of two or three weeks to generate the majority of the content. We'd been doing it in piecemeal format before on GPUs that Dan had at his studio and that I had at the studio, but we wouldn't have been able to render this amount of footage without the support of ML Foundry.
[00:28:53.258] Kent Bye: Yeah, so I know that going around for a while, people were able to take a number of their own headshots or photos of themselves and be able to train them. And was that how Dreambooth started? Was people creating their own images of themselves?
[00:29:06.586] John Fitzgerald: Yeah, a lot of those pipelines were related to that. If you go on YouTube, there's these moments that you can track on when things get popular on Reddit. And they come to define certain moments of this recent AI period, where you see people Yeah, training animations of animated faces of themselves or whatever is the hottest thing that hits Reddit or Instagram. And then, yeah, there's usually like hundreds of creators that pop up YouTube tutorials where you can immediately dive in and try it out, which has been really kind of a fun thing. you can go very deep into some of these open source processes and you can spend a long time experimenting with a lot of the tools that are out there. That was one of the things I was doing.
[00:29:53.155] Kent Bye: So I know with these models, sometimes there's a lot of models that you can download and use for stable diffusion. And so were you able to, from scratch, just train images from Kanehiskatsi and other images that you may have gotten from Godfrey? Or were you using some sort of other foundation where you could give it other prompts and be able to generate a frog or something, even if it wasn't trained? So does it have some sort of core foundation, or is it just basically from scratch?
[00:30:18.977] John Fitzgerald: Yeah, it's from scratch. I've experimented with other ways of adding elements of Godfrey's images to other models, but Dan was able to build this from scratch. It was a pretty unique process, so that it is all imagery that's kind of like trapped in this moment. but also not all the prompts work for the footage. And so I've heard a lot of people be like, oh, that's so odd. It almost looks like an animation that's taken over because a lot of it has photoreal textures and qualities. And then at certain times, the model itself kind of breaks down its ability to replicate the thing that you're prompting it to make. So there are elements of it that start to look almost like design-y. you know, kind of linear animation style. I chose to keep those in. I kind of liked that it wasn't a perfect representation of Koyaanisqatsi and we ended up leaving that in because it hopefully plays into the story we're developing when you actually see it projected on the screens.
[00:31:17.803] Kent Bye: So I know within stable diffusion there is like a clip editor where you can upload an image and say tell me if this was being prompted what the words would be and it tells you what those prompts are and that is something that is probably coming from all sorts of other models and trainings for what data is even training the clip and so there is some other model influences that I imagine are seeping in or unless you were Taking the suggestions from clip I guess when you're training a model are you taking both the image as an input and the clip text input because you're associating what the text prompts because eventually you need to be able to prompt it so it has to be able to interpret what it is and so how much Detail do you have to give is it just like one word is better or adding the full clip description of it So yeah, I'd love to hear like What is such a constrained data set? Most of these are huge amounts of hundreds or thousands or millions of pictures, but this is a very constrained number of pictures. How much detail do you have to add as you're adding the text metadata for each of the images that you're training the model?
[00:32:21.338] John Fitzgerald: Yeah, most of the prompts that the images are generated from are just under 20 words. They're descriptive to a point. Some of them are even less, like 10 words. Sometimes it works better when it has longer descriptions, and sometimes that seems to lead to more confusion. And I think it prioritizes a certain part of the front of the prompt. But yeah, there isn't an exact answer to that. It's kind of like a spectrum of a long sentence is kind of the prompt Yeah, and clip contrastive language image pre-training is an open source model that you can use to kind of generate all these texts descriptors.
[00:33:02.447] Kent Bye: Did you find yourself often using the default clip description or were you modifying things to be a little bit more specific?
[00:33:08.698] John Fitzgerald: mostly left them exactly as they were. We then also like categorized them to talk about whether it was like machines that it was describing or whether it was nature, so that we could then build content that would allow you to interact with certain elements that were related to just specifically nature or just specifically crowds and people or just specifically the military, you know, different categories of the footage in Kuena Skatzi.
[00:33:35.186] Kent Bye: In these categories, did you have to have different models for those categories? Or is it all just in one model and just prompted with a phrase or a keyword or a category at the beginning of the prompt to help the model take care of that?
[00:33:46.076] John Fitzgerald: Yeah, all using the same model. It didn't need to be chunked down into something smaller. It was accessing the certain elements that were related to what we were prompting it.
[00:33:55.922] Kent Bye: So then you train the model and you have all these, I guess the metadata from each of the images that you're training on around a thousand images with the clip. So then as you're generating the vivid unknown, how do you go about figuring out how to prompt it or what words do you use then? Is it just the same, just kind of feed it back? It's like an Ouroboros, like train it in and just put in the exact same prompts and see what comes out?
[00:34:19.688] John Fitzgerald: Yeah, that was what we did. The idea of having Godfrey curate some images was kind of about giving it a bit of a singular touch, but the rest we kind of left to it recursively reading itself and describing back what it was seeing.
[00:34:37.293] Kent Bye: Well, I have to say that, you know, watching The Vivid Unknown, there is a sense of, like, I feel the spirit of, kind of, Scotty in it, especially the music and the visuals, and especially with Godfrey coming in and giving his voice of God, like, moments of insight and reflection and knowledge drops that he's doing throughout the course of the piece. So there is this kind of, like, essence, a spiritual essence in some ways, or like a remix, or how do you think of what it is that you've created then?
[00:35:00.891] John Fitzgerald: Yeah, I think anyone that's had a chance to sit down in the projection space, in the installation, who knows the film, feels like a kind of relationship to the original imagery. You can almost notice bizarre forms of the film itself and I think In that it's not the original, in that it deviates from that, it introduces this kind of like interesting like latent space between the real version of the film and this bizarre manifestation of it. I originally was editing some of the footage to the original score and so for me it feels like really connected to it and I tried to like break myself away from it. You know, there's a little bit of like a inside baseball feeling when you're so close to the footage itself and to the original source material that I also was like trying to be aware of the fact that you know, put myself in the shoes of a viewer that didn't know about the original film or had never heard the score before. And so I think that was like a framework we're trying to find as well, that I wanted it to have a bit of the essence of the original, but I also hope that it like stands as a work that's an homage to the original work, but that you don't need to have seen Coen Discazzi to have a connection to what this work is.
[00:36:17.720] Kent Bye: Well, in terms of, you're talking about editing, but you're not necessarily using Adobe Premiere or something and outputting it into a linear film because it's got all these chunks and it's almost like got a dynamic nature for how fast things could be blended together and there's three screens so it's a multi-channel installation and so when you talk about the editing, do you make a cut in Premiere and do you export that into like the open frameworks or like what's the output for actually taking these Clips that you're generating from the stable diffusion these video clips and then how do you sequence them or organize them? Because it's a bit of a linear film that has 12 hours of footage, which is quite a bit So like what's the format of which you're editing it or putting it all together?
[00:36:59.272] John Fitzgerald: Yeah, the system that Dan built that allows us to see so much footage. Basically, once we've generated these files from exported videos from Stable Diffusion, we can convert these MP4s into another codec called HAP that allows us to play these kind of like higher resolution videos without crashing the GPU. So yeah, these bits of footage are things that I've looked at in Premiere with music next to them. Before we had the hardware for the installation, you know, I was kind of previewing things and also so that I could share content with Godfrey so that he could look at at his studio. He's got a big like 80-inch TV and he sits in front of it and turns up music really loud and you know watches the clips. And so, yeah, so we really did generate a bunch of these videos, essentially, and Dan's built this really, like, complex engine that I'd love to show you if you want to see, too, behind the screen. I can show you kind of how it works and the chunks of footage that it's reading from in order to serve the content that's on the screens.
[00:38:05.543] Kent Bye: Okay, cool. I'll be free at 6, so I'll take a look. So, what's been Gottfried's reaction? Has he been able to see the final version, or did you send him a rough cut? I mean, it's 12 hours of footage, and I don't imagine he watched the whole thing, but, you know, just curious, like, what his reaction was.
[00:38:20.795] John Fitzgerald: Yeah, I was able to visit him at his studio and go out to Santa Fe three weeks ago and so we still made a lot of tweaks since then based on some of the input that I got from him. So I was able to show him a lot of the footage. He hasn't seen it in its current iteration with an interactive engine that's playing. I've been sending him lots of screen recordings of how the engine works and then footage of people in front of the screen so he kind of gets a sense of what the feeling is. But that's one of my big goals for the project in the short term is to find a way to present it at a venue in Santa Fe so that I can bring him there to see it. He's advanced in years. He says a thousand months old. He's like 83 and a half and so it's been tough for him to travel recently. He, even this year, has been super productive. He completed a new feature film that just premiered at the Museum of Modern Art in New York in September. It's called Once Within a Time. It's another project that he made with Philip Glass and it's a really wonderful film. I highly suggest, if you're interested in Godfrey's work, I highly suggest seeing this film. I think distributed by Oscilloscope and it's at like IFC theaters around the country right now. So yeah, he's been super productive this year and I hope that I can show him the film in its version. I started building a little scale model of how the projections work that even if we don't get it to a proper venue in Santa Fe. In the short term, I'm going to come out with a multi-projection system to just demo it for him in the studio, because I'm interested to hear what he thinks about it. He loves the footage. He sends me these videos all the time that show him responding to the work, so I get these You know, it's funny, it's kind of the model of like on YouTube you watch the like video reactions. This is like Godfrey sending me like video reactions. And yeah, he's such a warm spirit. He's just like very excited to see what we can do with it. I'm hoping that we can get it in a place where he can actually watch it in its full installed form as well.
[00:40:27.005] Kent Bye: Yeah, I know Meow Wolf has a venue there. It might be good to go there. And I know they do different music performances there. So that's probably very much in the spirit of the immersive realm.
[00:40:35.897] John Fitzgerald: Yeah, we've been talking to them, and Godfrey's done a few masterclasses over there. When I was out there last month, he actually did a masterclass at Meow Wolf, and he opened it with a short excerpt of some of the machine learning tests that I'd been making with him just to give them a sense of some of the things we've been working on, which was such an honor. And I didn't expect it, but he was like, oh, do you mind playing your video first? I was like, sure, I'm happy to. Yeah.
[00:41:03.975] Kent Bye: Nice. And finally, what do you think the ultimate potential of all these immersive media with XR and AI and blending all these multidisciplinary interfusion of all these media, what the ultimate potential of that might be and what it might be able to enable?
[00:41:20.602] John Fitzgerald: Well, I think, first off, it's so hard to see a lot of this work in the true form that a lot of artists want to present it. And so I love what IDFA is able to do here to present such a wide range of totally different works that only at IDFA DocLab could you have such an expanse of experimentation. Thank you. Thanks for including us and for letting us run this experiment. And I think it's hard to say where all of this goes. I feel like it's interesting how machine learning technology keeps kind of like jumping through these like short bursts of things that almost define the moment that it's created in. And you can kind of mark how stable diffusion has changed art in the past three months or six months or nine months because there are these kind of bumps in the road of like, oh, this is like a big output of content that looks really similar. And I think in a way our project is like dated to this moment. even though it's referencing other footage, it kind of feels very much of this style of things that are being created right now. So I'm interested to see where we can go. I don't really know what the answer is on that. I'm interested in other ways that we can have viewers interact with the work, other forms of projection, other formats for installing it, whether it's like, you know, a diptych of screens that work against each other. I've also been interested to kind of show a little bit more under the hood of what's happening. I loved all your questions about how the model got made because I feel like that would also be really interesting for an audience, especially at a festival like this, you know, where people could get to see what's happening and what decisions were made to impact the final output of the work. So I think, yeah, I'm looking to find new ways to like reveal how we made this thing. In a way, we were kind of just rushing to get to the deadline. But I think other forms of it in the future, I'd like to reveal more of that process and other ways that we can have human involvement in the creation of work with machine learning processes.
[00:43:30.023] Kent Bye: Awesome. Is there anything else that's left unsaid you'd like to say to the broader Immersive community?
[00:43:35.882] John Fitzgerald: Yeah, hopefully we'll be showing demos of this when we get back to New York. And as I mentioned, I've been working really closely with the Onassis Onyx Studio. We have a really incredible production-exhibition hybrid space in New York. And yeah, the doors are always open. Send us a message if you're interested in seeing some of the capabilities there. We work with a lot of creators from a pretty diverse background of not only immersive and interactive video, but theater and sound. and painting and all of the many things that go into storytelling in this moment in time. So yeah, thanks again to Onassis for sponsoring and supporting the work and helping me get here to show it. I don't really know what the model is for selling this in the future. We haven't really gotten to that. This is more about presenting it in this form. And we're always looking for exciting ways that it can be distributed across other institutions and venues. So I'm interested to have more conversations around that as well.
[00:44:34.434] Kent Bye: Awesome. Well, certainly Onyx Studios is at the center of a lot of the innovations, especially projects here at the Merced Festivals with Totalmancer and Kinetic Diffusion by Brandon Powers and Aaron Santiago and, yeah, with your piece here with The Vivid Unknown. Lots of really great experimentations coming out of that studio and, yeah, just a community there and a place to see a lot of that innovation. Yeah, I just really enjoyed the vivid and known and just to see both all the dimensions of the audio and the visuals and yeah, just also hear a little bit more about the process of how it was created. So yeah, on the bleeding edge of what's possible and yeah, you're right, there is this kind of like stable diffusion aesthetic that is still in that, but yet it's got this from the 80s train with a very specific data set that is got its own look and feel and aesthetic, which I think can be a little difficult to control. So having a smaller model, you have a little bit more control of doing that. So Anyway, thanks for taking the time to share a little bit more about your process and journey.
[00:45:27.269] John Fitzgerald: Yeah. And thanks to you, Kent. I know that this is like such a Herculean effort to sit down and actually get the on record, you know, such a impactful document of artists created in the space. So thank you for the big effort that you give to the community.
[00:45:42.702] Kent Bye: Yeah, you're quite welcome. So thank you. So that was John Fitzgerald. And he had a piece called the vivid unknown at if a doc lab 2023, which was an three channel video art installation and AI model that was showing different scenes from the groundbreaking 1982 documentary Koyaanisqatsi by Godfrey Reggio. So I have a number of different takeaways about this interview is that first of all, well, it's another piece here in 2023, where we see this intersection between these generative AI tools and exploring how to start to use them in the context of telling stories or kind of remixing different stories to kind of reflect upon, you know, going back to 1982 is looking at a lot of those different scenes. And so there's a retro feel. And so you have an AI model that's only trained on that imagery and then remixing it with this late 2023 stable diffusion aesthetic that we've all seen, which is kind of like this waking life painterly abstraction. What's interesting is that as we continue to move forward with AI tools, we're going to have new aesthetics and, you know, start with this kind of surrealistic painterly type of staple diffusion. I think it's going to get a lot more photorealistic as we move forward and then get like surrealistic, hyper-realistic, even more greater detail than we could imagine any cameras. So really interesting like snapshot as to the evolution of some of these different AI tools. And Godfrey seemed to be completely on board with this type of experimentation and that it was in the same type of spirit of the original Klingon Scotsie, which was trying to reflect upon humans and our relationships to technology. So kind of a remix of something that was premiering back in 1982 and running it through all these different AI tools and to show it again at IFA DocLab in this art installation that has these other dimensions of interactivity. I found that there wasn't a lot of traces of agency when I'm interacting with the piece, and I found that I needed to have a lot of explanation to what are the rules, what are the mechanics, so that I could try to at least find some traces of my agency as an individual as I was interacting with the piece, which I was able to do at different times. I was kind of dropping in and out at different times throughout the festival, playing around with it a little bit. I found that when there was a group dynamic, then it's a little bit more difficult to have the group act in unison or to have that kind of feedback loop where you're able to really see that you're interacting with it. So I think there's something about having those traces of agency. What are those traces of agency where you can really see that you're interacting and having some control? because otherwise it just feels completely random and that whatever you're doing isn't either being registered or put into some feedback mechanism. And so having traces of agency for collective agency is something that's like a really hard problem that I'm not sure there's anyone that's quite solved that just yet. But that's a noble effort to try to add a whole other dimension where this could just be a exhibition of this art, but to try to have these other dynamic interactive components as well. And I feel like it's still very early in where that might go, because typically when I've seen that there's a little bit more of a one-on-one interactive component on that. So I feel like there's maybe some other either priming of explaining to the audience what they have to do to be able to interact with it or instructions or just other feedback mechanisms that are there in real time for you to comprehend it and to know how that's going to change how you interact with it. So. Yeah, but overall, just another example of what's been happening with this intersection between artificial intelligence, these large language models and generative AI, and how you could start to train a very specific model on some image sets and start to use that in the context of telling a story or creating an art piece like The Vivid Unknown. and whole other sound dimension and sound design that's happening that we talked about, as well as these voice of God, moments of insight and reflection from God for you that are also sprinkled throughout the piece as well. So that's all I have for today. And I just want to thank you for listening to the Voices of AR podcast. And if you enjoy the podcast, then please do spread the word, tell your friends and consider becoming a member of the Patreon. This is a supported podcast and I do rely upon donations from people like yourself in order to continue bringing this coverage, so you can become a member and donate today at patreon.com. Thanks for listening!