#1327 “Voice in My Head” Remixes Your Inner Monologue with AI

Voice in My Head is a provocative immersive exploration by Kyle McDonald and Lauren Lee McCarthy that takes large language models the logical extreme of remixing and modulating your inner monologue. Audience participants are invited to customize ChatGPT character that clones your voice to speak to you via an ear bud as you walk around the DocLab festival in Amsterdam and have casual conversations with festival attendees for a half hour. The AI listens to everything you and your interlocutors say, and then pipes every minute or so with feedback based upon your instructions thereby creating a new modulated voice in your head.

The piece made me reflect upon my own inner dialogue, and how surprisingly compelling of a piece this is in exploring both the social, ethical, and privacy boundaries of technology but also how art contexts like this create a safe space for earnest exploration for some of the new potentials of immersive technologies and contextually-aware AI. I had a chance to unpack it all with the creators McDonald and McCarthy in Amsterdam after a few days of their world premiere exhibition at IDFA DocLab.

This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR Podcast. It's a podcast that looks at the future of spatial computing. You can support the podcast at patreon.com slash voicesofvr. So this is the last of a brief little mini series that I'm doing of looking at different artists over the course of time. So in this episode, I have Lauren Lee McCarthy in collaboration with Kyle McDonald. In the previous episode, I had Lauren talking about her piece called Surrogate, and that was at Sundance in 2022, where she is exploring the idea of what would it mean to potentially become a surrogate for someone to carry their baby and to create an application where they were able to control many different aspects of their lives. And so a lot of the work that Lauren does is blending the boundaries between what's real, what's virtual, what's a story, what's a performance. And so, in this piece called Voice in My Head, it was in collaboration with Kyle MacDonald, and it premiered at Diffa Doc Lab, and you essentially go into this sound booth, you are training this AI, you're able to describe to it the voice in your head, like what's your inner monologue, and how would you like to shift or change that? So, it's exploring this idea of what would it be like to have an AI that is sort of like this therapist who is like able to monitor your conversations and your behaviors over the course of the day, or to also replace the voice in your head with an AI voice that has much more of a specific personality that you're able to craft and control. And so yeah, this is like a half hour experience that over the course of that half hour, you're walking around talking to different people. It's through those conversations that the AI is piping in and trying to give some feedback based upon how you would like to create this remixed inner monologue and voice in your head. So that's what we're covering on today's episode of the Voices of VR podcast. So this interview with Kyle and Lauren happened on Monday, November 13th, 2023 at IFFA DocLab in Amsterdam, Netherlands. So with that, let's go ahead and dive right in.

[00:02:06.850] Kyle McDonald: My name is Kyle McDonald. I'm an artist based in Los Angeles and create a lot of open source software, immersive installations. I've been doing this for about 15 years. I came from music and performance and then recently have gotten into sailing and technology.

[00:02:27.697] Lauren Lee McCarthy: Hi. I'm Lauren Lee McCarthy. I'm an artist also based in LA. I work with a lot of different media. I think performance is really central to my practice. And I'm also a professor at UCLA.

[00:02:42.409] Kent Bye: Great. Maybe you could each give a bit more context as to your background and your journey into making this type of creative work.

[00:02:48.592] Kyle McDonald: Sure, I mean I started with kind of generative art and creative coding in the early 2000s with processing and got into kind of interactive stuff originally through Max MSP for experimental electronic music and then open frameworks for computer vision. That community really led me to like a broader scene of media artists around the world that are at places like Infodoc Lab, where we are now, and Ars Electronica, those kinds of festivals. And from there, I have been making this kind of work for like the last, I guess, maybe 12, 15 years.

[00:03:26.773] Lauren Lee McCarthy: Yeah, I have a background in computer science and also in visual art. And I started out just trying to figure out how to put those two things together. I'd say my work is very personal, so a lot of times it kind of follows the things that I'm feeling confused about in my life. I've also spent a lot of time working on open source software. So for about eight years, I created and led a project called P5, which is an open source platform for making creative work with code on the web.

[00:03:57.587] Kent Bye: Great. So you have a piece here that's called A Voice in My Head that's showing at IFA DocLab. And you had a chance during the DocLab to give a number of different previous projects that were kind of leading up to this. And so I'll let you kind of start with describing to me how this project came about. And if you want to start with some previous inspirations that were kind of leading into this, I'll let you decide where to begin to tell the story for how this project came about.

[00:04:21.560] Kyle McDonald: Well, we've been working for a while last year already on this project called Unlearning Language, which combines even a longer history of Lauren and I collaborating together over the last nine years. We've always been thinking about technology and the way it mediates our relationships and society and with AI coming into prominence over the last few years and sort of like the public becoming broadly aware of it in their everyday life with tools like ChatGPT. We wanted to reflect on those tools and technologies and their influence on our lives. With unlearning language, do you want to give a quick description of what we were doing, Lauren?

[00:05:01.888] Lauren Lee McCarthy: We were thinking a lot about language and how human language may be different from machine language. So it was an installation that was trying to train people to communicate in a way that was undetectable to machines.

[00:05:14.490] Kyle McDonald: So every time that you would say something that it could understand, it would give you some feedback to let you know it understood, and it would try and convince everyone to communicate with each other in a way that it would never understand. We're imagining this future where, like, the machines are sort of looking back on the past to the moment where we were uninterpretable to them, to the moment where, like, we hadn't adapted ourselves so fully and they hadn't come to understand us so well. to think about a future where we became the most human humans and I think this piece came partially from that project but also from some of the personal work that Lauren's done in some ways. I was reflecting on, you know, speech recognition and kind of always on audio tracking and wondering what would it look like if we had the ability to like reflect on every aspect of our life the way that we would with a therapist but with a therapist that was kind of like always there. And then this was also connected to another project we worked on called People Keeper, where we were looking at the way that we related to people and maybe didn't always know which relationships were good or bad for us. You want to talk about People Keeper?

[00:06:28.197] Lauren Lee McCarthy: Yeah, PeopleKeeper is an app that would use your biometric data to determine how people around you or people in your life made you feel and then it would take action on your behalf. So it could auto delete your contacts or schedule times to hang out depending on whether the person made you feel good or bad. So I think we're thinking about projects like that and then also I've been doing sort of a series of AI works. So for example, I did a piece called Lauren where I was performing as a human Alexa. So I would install cameras and devices and then watch and remotely control different people's homes for them as they lived in them. So I think this piece really came out of a lot of that work, and we were thinking about the way that synthetic media is being increasingly pumped into our consciousness, whether we know it or not, and what would it look like to take that to an extreme. And so in this piece, the premise is, what if AI replaced your thoughts, your inner monologue completely? And so it's intended as both like a sort of exploration but also a provocation too. So it's sitting somewhere between something dystopic and something that we're earnestly exploring.

[00:07:36.341] Kent Bye: Yeah, it's a visceral experience to go through this experience. And I'd love for you to maybe talk about some of the core technologies that you're using in order to actually do this. Because I know there's been lots of different innovations in AI with large language models and chat GPT, but also with this ability to take someone's voice and clone it. And so all these different services that are now available that you can start to use. So yeah, I'd love to hear if there's a catalyst turning moment where you're like, OK, we have everything that we need to be able to actually pull this off.

[00:08:06.029] Kyle McDonald: And if it was just a slow iteration or if you know at what point you realize that this could actually work Well, I guess we should also just say what's happening with it too, which is like basically there's this sound booth you have an air pod in your ear and you carry a phone with you and you go into the booth you have a kind of interview with an AI and then it takes that interview, which is about your internal monologue and how you talk to yourself, your inner voice, and it kind of analyzes everything that you said and tries to become a better inner voice for you. And I'll say there's kind of a trick, which, you know, if you get the chance to try it, it will actually clone your voice kind of unexpectedly. When you're giving those answers, it will use that as sample material to have your inner voice be in your spoken voice. which it turns out not everyone's inner voice is their spoken voice. Some people told me that like their inner voice is always shouting very loud, for example, or very quiet or has a different accent. Yeah, so then you'll go out into the world from the booth and have these conversations with people, your friends, strangers, whoever, and it will be there as like a persistent overlay, as this augmented reality on top of your own inner voice, replacing your inner voice for about 20-25 minutes. Yeah, so that's the piece and it really varies from person to person what they experience because it's a very responsive to what you ask and what you want. So some people it's really intense and I think it's because they're more honest with it and for other people it's kind of a lighter experience I think because they give sort of generic answers and they don't really engage with it as deeply. So it depends on how much you give to the piece. Yeah, do you have any more thoughts on that?

[00:09:51.408] Lauren Lee McCarthy: I think an interesting part of the project has been doing the research and talking with people about their inner voice. And my favorite question to ask people is, give me your best impression of the voice in your head. So that's actually one of the prompts in the onboarding session with this work. I think it's really interesting. It gives sort of a window into people's psyche that, like, whenever I hear someone do their impression, I'm surprised, but also it makes sense. And I feel like I'm understanding them in a new way. And I guess a part of the piece, you know, on one hand it is critical of these technologies, like it's not necessarily suggesting we'd like all our thoughts to be replaced by AI. I think the idea of that feels quite dystopic. On the other hand, maybe there is something to taking a moment and really reflecting on the voices that we're listening to day to day. And so I guess the question is, could that prompt some more awareness? And even if not, what if you use this thing and it actually does feel better than the normal voice in your head? How do you deal with that kind of dissonance?

[00:10:55.463] Kent Bye: Yeah, I actually found that that was the most difficult question, was to do an impression of my inner voice, which made me reflect on how maybe I don't have enough of an awareness of my inner voice to be able to actually mimic it. For me, I used the experience as an opportunity to, I guess I had a theory of like, well, maybe my inner voice is very analytical, critical. I'm very transactional in my relations sometimes of like doing interviews. And so I instructed myself to pay more attention to the emotional channels of a conversation rather than the informational channels of the conversation. But, you know, to do an impression of that in the moment, I had trouble articulating it. So I'd love to hear if a catalyst of this project was your own inner voices and if you'd be willing to kind of share how you would describe that. Because I actually found it really difficult to describe some of my own inner voice.

[00:11:43.219] Kyle McDonald: Yep. I remember when Lauren launched this question on me without warning me and I had the same experience of like, oh wow, actually, what is my inner voice? Like, how do I imitate that? I think I gave you something like, let's see, my inner voice is sort of, Oh, what's that over there? Huh? That's interesting. How does that work? Oh, why? Why is it doing that? Oh, cool. Oh, okay. Let's see. So this goes like this. Doop, doop, doop, doop, doop. And that's sort of my inner voice. And it's kind of always putting things together and trying to understand what's happening in the world and trying to, I guess, be like a bit always learning to be grateful in new ways. I would hope that's what's happening. I think sometimes it's what's happening. And yeah, one of the funny things about like feeding that through speech recognition, because I've tested the system a bunch and so I've been asked that question a bunch and had to do my own impression of it a lot, is the way the speech recognition tries to translate that into text, the way that it puts punctuation in certain places, the way that it transcribes something like dupe, dupe, dupe, right? And whenever I see that transcription, it reminds me of like the boundary between the kind of pure, organic quality of what it means to be a living person, and then the kind of rigid, synthetic quality of what it means to be computational or to transcribe something, even just into written language. It's already such a static, quantified paradigm. So I guess that's been an influence for me working on this project.

[00:13:15.083] Lauren Lee McCarthy: Yeah, I guess I think a lot about the voices in our heads. I mean, meditation is also a part of my practice, and I think maybe it comes from that, just kind of reflecting on what you're hearing and what your experience is. In general, I feel like I have a pretty loose grasp on reality, like it's a light hold. And so I get curious about how, if the voice in your head changes, how does that change your experience or understanding of reality and what's going on?

[00:13:45.518] Kent Bye: Yeah, so I guess in this experience you're giving people an opportunity to remix or modulate their inner voice or at least for me especially after the experience I started to really pay attention to my inner voice more. I was trying to like really characterize it better so if I were to do the experience again maybe I'd have a little bit better responses to what my inner voice is but There's this training part where you're in this sound booth and you're answering these questions and prompting the AI in some sense to understand the directions that you're trying to do. And I actually thought it did a really great job of summarizing what I had said. It felt like I was really being listened to and being heard. I think the challenging thing was to then be set off into the world and make random conversations with people that I know, because there's a certain amount of a social contract with like, okay, I'm in the middle of an immersive experience and I disclosed that to some people. I was like, okay, I'm going to be having this conversation, but sometimes I might pause to listen to what this has to say. And then sometimes I would do that, and other times I felt like it would be disrupting of the social contract that I have in this conversation to say, hold on, stop what you're saying right now, because I've got to listen to what this AI is saying, because then it would be ruining the other deeper purpose of me really tuning into the emotional vectors of what people are saying. It would be contradictory to the intent that I had. So I found myself constantly challenged between the intent of the experience that I wanted to have, but also the relationship that I had with the people I'm in conversation with. I don't know if that's a common piece of feedback.

[00:15:14.892] Kyle McDonald: That is. I mean, a lot of people come back and say, wow, it really felt like it was interjecting a lot. It was hard to pay attention to both at the same time. And we changed the frequency at which it interjects because of that. But I would hope a lot of people walk away from that, not just being annoyed with the experience, but instead Using that as a moment to reflect on the difficulty of trying to be yourself in the face of these systems that are constantly giving us feedback and shaping us. Maybe we don't think about it that way normally because the systems that are part of our everyday life are designed to be somewhat transparent and sort of out of the way. But they really are the same thing. They're kind of modulating our thoughts and feelings and our interactions. And maybe only in this kind of artful context where we're not trying to make a frictionless experience, maybe then it can kind of become more obvious, this kind of difficulty of trying to be yourself in the face of these modulating systems.

[00:16:12.193] Lauren Lee McCarthy: Yeah, and I think there's also lots of times that you may not be using this piece, but the voice in your head is distracting you from being really present with the person that you're with. So it's also sort of a reflection on that, and I think it's a lot about awareness, like where are you placing your attention? noticing that like when the voice is talking you could listen to it or you could listen to the person in front of you and that's kind of true in your everyday life too but sometimes I think it's a little easier to slip into listening to your own thoughts without really realizing it.

[00:16:44.370] Kent Bye: Yeah, I was reflecting on this piece. I was wondering if you've thought about if there would be like a kind of immersive theater remix of this where you would have like an actor who would know how the inner voice was interacting with someone or at least to be able to have a conversation where they would be able to maybe have it so it's contrived and it's someone you don't know and also maybe emotional beats and moments where the AI is interjecting and it's sort of like an opportunity to really stop and listen. Because there is something about being in a context where there are people that I know and I'm kind of roaming around, but if there's more of a curated experience that you could have to shape different opportunities and have like an interactive, immersive theater actor who's able to maybe even be aware of what the prompts and intentions are to help take it to the next level of really reflecting on someone's inner voice and their experience of that.

[00:17:37.172] Kyle McDonald: Yeah, I mean, I love those ideas and we've talked about a lot of different directions this could go. It's also related to another project Lauren had done with waking agents. You've got folks laying on these pillows getting messages from Confederates behind the scenes where they feel like they're talking to a machine, but they don't realize that there's a human in the loop sort of controlling the conversation. and I think in other projects we've done together there's also actors involved and that dynamic is super interesting between like a mixture of computational machinic feedback and human controlled feedback. This piece really lends itself to some of that just because of the way it's built technically like the infrastructure is all based on audio chat basically so at any moment we could like join the audio call and Jump in and say hey, we're the admins here. You know, how's your experience going or something like that? If we had actors doing that it would be really kind of surprising experience. So who knows? I think yeah, we're always like trying to build on the infrastructure from previous projects to make the next one. So I don't know.

[00:18:46.407] Lauren Lee McCarthy: I mean, for me, I think I'm really interested in the way that our everyday life is a bit of a performance. So I don't know that adding more theater to it makes that point. I think I would actually like to see the experience get less performative and more into the everyday. So doing this rather than walking around a festival and trying to have conversations with a bunch of people, I talked to some people that did it like in pairs. So they would each like train one and then just like walk around together and talk with each other. like a couple doing that. And then at some point they even like switched and listened to the other person's voice while they continued their conversation. And I thought like that got kind of more interesting because rather than having these awkward conversations with people that may or may not understand what's going on, it was more like these voices that it was like a conversation between the four of them.

[00:19:33.329] Kent Bye: Oh, wow. That's really quite fascinating, because during the DocLab preview of this piece, you had the conceit where you were giving a talk, Kyle, and you were sort of giving different information. And then Lauren's inner voice would interrupt and interject in a similar way, and you would pause and let it say its thing. And then similarly, Lauren, when you were presenting, then Kyle's inner voice was interjecting. And so if you could elaborate on if those voices were trained on your inner voices, or if that was trained on a very specific performative like it because it seemed like uncannily prescient in terms of like reflecting on Some of the deeper thoughts that I would have had with some of these things.

[00:20:09.501] Lauren Lee McCarthy: So yeah, I love to hear some elaboration on that Yeah, those were trained on our voices.

[00:20:14.946] Kent Bye: It was a similar process Yeah, I heard your voices, but I don't know if it was prompted in the same way as this was prompted if I was prompted Yeah

[00:20:23.968] Lauren Lee McCarthy: Yeah, it was similar. I mean, it was the same way. I guess when I was answering the questions, I was thinking about the context. So I had asked mine to be slightly antagonistic. I knew this, I was doing this with Kyle, just for fun. But yeah, no, I mean, it was the same process. I think that format maybe blended itself especially well to these kind of interjections, especially the fact that we're able to pause when the voice started talking. So you really got that mix rather than it all happening simultaneously.

[00:20:55.859] Kyle McDonald: Yeah, it was the same. It was the same system. The only difference was that we trained our voices separately in advance instead of doing it in the booth. But yeah, I think the audience can tell when something's, you know, on the spot versus like pre-recorded.

[00:21:12.733] Kent Bye: Well, yeah, it sounded like, well, for me, what I was really struck by with some of the deeper reflections on, you know, the antagonistic aspect of thinking about the data privacy issues and meta has put forth their vision of what they call contextually aware AI, which is is essentially this, which is that you would have not only the audio, but all the visual tracks of being able to track everything that you do. And they have this thing called Ego 4D, which is being able to track all these different things to say, like, what are your memories? Who are you talking to? Who's saying what? And it's basically like this omnipresent, omniscient AI that's trying to track all these things. It's for the purpose of having new user interfaces that are contextually relevant. But it's also used for signal and scapitalism. what I think is violating Helen Isenbaum's principles of contextual integrity, meaning that there's a specific context for information rather than aware of all contexts all at the same time. So I'd love to hear any reflections on the deeper contextually aware AI and privacy implications of this type of technology.

[00:22:13.916] Lauren Lee McCarthy: Well I think there's a lot of privacy concerns and I mean even for this piece we did kind of research what each API and tool we were using actually did with the data and try to share that with people. Whether or not they cared was another question and I think that's a challenge of thinking through the privacy issues that like there is a lot of Concerns a lot of information and data is being taken from people to build these systems without their real awareness But on the other hand, I think so many people Have a sort of attitude these days that like that's just how things work and they're not particularly concerned like my students will sometimes say like why should I care about whether my data is kept private or where it's going. So yeah, I mean, I think the tools that you're mentioning kind of just push further in that direction. And I think it's a problem.

[00:23:06.265] Kyle McDonald: Yeah, one of the things that's kind of funny, not everyone, I feel like not everyone understands about data privacy around AI is that your data is just not enough. Like if you are using one of these tools, it's not enough data for any of these companies to find it very useful. Like if you've got a search engine like Google, maybe where you're getting 50,000 queries every second, then maybe you can do something with that. But if you're a company that's just doing speech recognition or you're a company that's doing voice cloning, like using that service is not enough data for them to actually like put it into research and production and all that stuff. The way that they collect their data is way before that, you know, it's from like the whole internet and it's from lots of other content that they can get their hands on. So, I think a lot of the time we're worried about what happens when we use these services. But from what we learned, I think, from looking into them is that the majority of them don't really care about the data that you're giving them when you use the service. They've already got it.

[00:24:10.718] Kent Bye: Well, I guess to push back on that just a little bit is just to think about how, in large language models, they have this whole process of reimbursement learning with human feedback to be able to tune it in some ways. But there are people that are out there to, quote unquote, jailbreak those large-language models and get to information that is hidden. Well, the thing that I think is probably of concern is, like, in some of the conversation I was having, I was talking to people about information that, as a journalist, would be, say, off the record or embargoed. But the AI has no awareness as to how to treat that information. And so the concern is, with Helen Eisenbaum's contextual integrity, is that there's information you tell your doctor because it's information that your doctor should know. But when you're in a bank, you have information about your finances that the banker should know. But when you start to blur those lines of contextual integrity, all of a sudden, the AI isn't aware of what those boundaries are. And now, all of a sudden, it may be repeating back sensitive health information to other people. If you're sharing it or if someone's able to get access to it, it's this idea that if this omnipresent AI is out there and gathering all this information, It has no sense as to what is private and what's not, and the risk is more around some of that information getting into the wrong hands and being used for nefarious purposes.

[00:25:21.784] Kyle McDonald: I mean, something to remember about these systems is that they don't think, they don't reason, and I think that kind of, like, social behavior requires some level of reasoning. It's not something that can be modeled the way that a lot of other linguistic interactions can be modeled. So... If you're asking an AI to write a short story, it's going to do a great job because that's a well-defined thing that has a lot of examples of. If you're asking an AI to act correctly in a given situation, it may not understand what's correct and what's incorrect in that situation. But I don't know, I mean, yeah, I don't know that this is a... I guess I was just thinking this might be like an inherent limitation to like how large language models work, but one of the things we've been doing with this project is just getting ChatGPT to act very differently depending on what people ask for, right? So, I don't know, maybe part of what's going on with this kind of contextually aware AI stuff is like we have been building models that are designed for everything and not prompting them for the situations that we want them to be and like what we want them to be in those different cases.

[00:26:34.165] Kent Bye: Yeah, one of the other aspects that you have in this piece is that you are disclosing the statement that's on the side that does say all the data privacy concerns in terms of most of it seem to be that either it's being deleted immediately or that it's for engineering purposes to solve various problems. And I guess the other interesting aspect of that is that in some ways it's disclosing the pipeline of what tools that you're using just so that people are aware as well. And so as I've had different discussions with folks about AI pipelines end up being a big discussion So love to hear any reflections on the pipeline that you're using here.

[00:27:09.175] Kyle McDonald: Yeah, just briefly I think when people have this kind of experience they're thinking like oh my god, like AI's come so far look at this it can talk to me like a human or something like that, but that part of the pipeline, that part of the infrastructure is really small in terms of the overall complexity of everything. Actually, the hardest part of this installation is just real-time audio streaming and getting it to work on a device where people aren't accidentally unpairing the AirPod from the iPhone. It's really basic stuff that's UX questions and audio is still complicated in 2023. I would say that's like half of the engineering that went into this is just getting the audio working correctly. And then another 25% is the speech recognition, which is happening kind of in the cloud. And then that last 25% is sort of split evenly between chatGBT and there's speech synthesis that we're using from Eleven Labs. And yeah, getting real-time streaming speech synthesis is also its own challenge. But all of that stuff basically gets plugged together on a server that's running in the cloud and it's orchestrating everything. You know, as artists, we're constantly thinking about like how to keep something running for a long time so that maybe in the future we can show this piece again. And in the past, we've built a lot of stuff that works in the browser. And then after a few years, it's like impossible to maintain. I'm hopeful for this one in terms of the back-end infrastructure lasting for a little longer. But the thing that will change first is OpenAI will deprecate this version of ChatGPT at some point in the future. And then we'll get a different version, and we'll never get it back. Actually, this version is going to be constrained to this moment in time where we have access to this ChatGPT API. And in the future, it'll be different.

[00:28:55.098] Lauren Lee McCarthy: That's part of the piece is that it will change over time the language of it It's reflective of the systems and the tools that we have right now So as those APIs and tools change experience will also change and that's kind of built in It's also interesting programming this piece because I would ask Kyle like oh Can we make it do this or that and how do we do that? And then he say well, we just tell chat GPT to do that and I was like, oh, right so it was a really different experience to some of the things we've coded in the past where it's like I Any idea we had for it, we could kind of just tell ChatGBT, oh, make that more concise, or do this, or even the onboarding session, the script for it came from me asking ChatGBT, okay, imagine you're the voice in my head, run me through an onboarding session where I train you, and then, you know, working with those questions.

[00:29:44.707] Kyle McDonald: We've had this idea, because CHAP-GPT has actually been part of the creative process for this project, we had this idea of like maybe being able one day to write like a single page of text of what we want a piece to be, and having it not only brainstorm with us, but to have it actually write the entire script, write all of the code, build the infrastructure, and have it be a kind of like minimal instruction art piece, like a Solowit or something where you just sort of hand it a single paragraph or page and then everything else becomes kind of emergent from there. But I don't know where it is now is we've been really like balancing between being kind of on the rails and like open worlds with this kind of experience because if you let an LLM take the reins completely, it gets really quickly into this like feedback loop of going down some route that you're just not expecting at all. but instead what we do with this piece is we sort of use the first part of the experience to kind of ground the rest of the experience and then we don't allow for a feedback loop to happen. Every time that it gives you some feedback that bit of feedback is disconnected from the other pieces of feedback that it It's not building on its feedback until the very end and then it has a conclusion. Finding those balances between where to have the feedback loops and where to have the rails and where to have the open world over an experience that's 25 minutes long, that's the real creative and technical challenge.

[00:31:14.378] Kent Bye: Yeah, I was really struck by actually hearing my own voice in my head. I mean, hearing a mimicked voice of my voice, a cloned voice of my voice speaking to myself. But it sounded like for me, because I do a podcast, I listen to my voice all the time, it sounded like me speaking in a phone. It felt like it was maybe slightly modulated. But when I had different vocal ticks or vocal pauses that I edit out myself, I was like, oh, wow, this really is how I speak in a way that was able to be mimicked. So it was really powerful to hear that. At the same time, I felt like I was giving it maybe more trust than I would have if it would have been another voice. And so I feel like there's a lot of other ethical implications of how people are using cloned voices and how that could also be used for nefarious purposes. So yeah, I don't know what your own experiences have been with listening to your own voices and thinking through all the different potential implications of that.

[00:32:09.588] Lauren Lee McCarthy: Yeah, I've made a series of pieces using my own voice. I found it was really interesting when I first made a clone that I could then type and hear myself say things and it gave me a new kind of freedom. Even though I could say anything I wanted in private, maybe there's certain phrases or ideas that I just couldn't quite bring myself to say. And so then having this interface where I could type really changed things. So I don't know, I think it's interesting to think about those dynamics.

[00:32:38.475] Kyle McDonald: Yeah, I mean, there are, I've heard of, I haven't engaged in any of this, but there's examples of banks getting scammed out of money because someone copied the CEO's voice. There's examples of parents getting scammed out of money because copied their kid's voice. You know, we do put a lot of trust in a familiar voice, and that is dangerous when someone wants to abuse that trust. I'm not personally so worried about these worlds where the AIs go off the rail and use that to manipulate us, but I am worried about what happens when humans are being normal humans. I think it would be good to give people more experiences with these systems and gain more intuition for how they work. Maybe like we can keep trust where it belongs and not let it spill over into something just because it's a little surprising I don't know I mean I think also about these apocryphal stories about like the beginnings of cinema when people ran out of the theater when the train was coming at them and like the kind of inability to see the real for what it is and like confusing the real and synthetic and I think we're in that moment again now where it's like maybe it's not the AI scaring us out of the theater but it's like we can't really tell the difference right now and I think the difference is not something that comes from understanding the fidelity it's more about understanding the application and like why we trust someone's voice that's familiar to us it's because we have a relationship with them and we have a dialogue with them and when there's an AI that copies a voice it's like that relationship and dialogue is not there anymore obviously and I would hope that we can like pivot away from just trusting something because it looks correct or it sounds correct and get back into like why we're doing that in the first place which is a lot of like in-person relationships.

[00:34:30.103] Kent Bye: So my own disposition sometimes is to be very concerned around some of these ethical and moral implications and that can be a paralysis is to not go off and venture and create something like this. So I'm really appreciative of things like Ifadoc Lab that provides a context for artists like yourself to really push the edge of what the technology can do and what's even possible. And it certainly had me reflect a lot on my own inner voice and my relationship to it. But also as I was having other conversations with people who were going through the experience, Some of them had very critical voices and they were very almost elated to hear like an inner voice that was encouraging and give them positive feedback and they just felt like they were like on cloud nine listening to their own voice speak to that. But love to hear any other reactions you've had to this piece and what kind of things people have said.

[00:35:16.267] Kyle McDonald: Yeah. One of my favorite reactions was someone came back saying, wow, it was so weird. It just kept trying to get me out of every single conversation. It would just be in the middle of a conversation. It would say, you know, uh, maybe it's time to wrap this up. And I said, well, what did you ask it to do? And we kind of looked back at the transcript together and he said, uh, oh yeah, I guess I asked it for solitude. And I thought that was hilarious. Yeah, some people have come back and said, you know, they wish they could keep it with them throughout the day. There was someone who was a writer that when they came back, they were surprised at how responsive it was to them. And they started going like, wow, what else could I have asked for? If I could have asked for anything, could I have asked for it to speak to me in rhymes or like a pirate or as if I was a child, you know, While I was still learning how to navigate the world, he had a really poetic perspective on how to use a tool like this and that's one of the reasons I'm excited for these systems to be not just something that we're like afraid of and like trying to avoid but it's something that we should be able to engage with in safe spaces like art contexts where we can be open and not have such a dangerous power dynamic as there is between like a large tech company and the public.

[00:36:29.848] Lauren Lee McCarthy: Yeah, I think it's interesting, like having done it, then you start to have other ideas for how you might train it differently. So it seems like, at least for me, like something that's interesting to do more than once, because the first time you don't quite know what it will be. And then once you get a handle on it, it's like, oh, how would this be different? And I think the range of the things the voice can do or say is sort of one of the interesting points of it.

[00:36:56.164] Kent Bye: What's next for this project? Is there anything that you are willing to share with either this project or future projects that may be catalysts that are coming out of this experience to see where you could take it from here?

[00:37:06.265] Lauren Lee McCarthy: I think we're just interested in showing it in different contexts. And like you were saying, kind of thinking about the situation in which it's running. It's not always possible to have a situation like this where you're in a festival or something talking to other people. So how to prompt that interaction in different settings is sort of interesting. And like I mentioned, I'd be really interested in trying it in like everyday scenarios.

[00:37:29.644] Kyle McDonald: Yeah, I'd be curious to ship it to people where they can use it for longer periods of time in a way that the sort of timing for this piece is like I said it's about 25 minutes long and it's about every minute it will give you something in your ear, right? And you know our original idea for this was it was an hour long and it was every couple minutes and we condensed it because we realized it worked better for this context but I think another interesting version would be like a full day and it's only when certain topics that are specially relevant come up and then you kind of dig into those. It would be a very different piece and I think you'd leave feeling much more like it was a psychedelic experience. So I'd like to try some other context for this and we're really open to chatting with people about what that looks like.

[00:38:19.051] Kent Bye: Great. And finally, what do you think the ultimate potential of this type of immersive media and AI and where this is all going and what it might be able to enable?

[00:38:29.469] Kyle McDonald: Yeah, I don't know. I mean, I think, like I was saying a minute ago, I think these artistic spaces can be a safe place to explore some of the questions about the future that are hard to dig into otherwise. I mean, we have journalists sort of provide a good survey of what's happening right now and help us understand, like, the specifics of, you know, what technologies are out there and how they're changing. There's researchers and engineers who are like building those technologies or like imagining what the technical possibilities of the future are. You know, there's people who are fiction writers who are brainstorming about what it looks like. But I think in these kind of immersive spaces, you get something very different, which is not just a survey or an R&D or kind of imagination. It's something that's a lived experience where you can actually step in and feel it for yourself, right? And I think when you have those feelings and when you share those feelings with other people, you learn something different and you can build a better version of the future together.

[00:39:30.968] Lauren Lee McCarthy: I think it's a moment where the whole world is sort of reflecting on AI right now and a lot is changing. And I hope that as the tools continue to advance, that reflection can keep happening, that we don't just accept that, oh, this is just how everything is. I think there's a nice questioning that's happening amidst it all, and I hope that the art and media projects we see coming out of it continue to provoke that.

[00:40:01.313] Kent Bye: Well, I think of all the different pieces that are here, I feel like this is like a new genre of a direction of where media is going to go. There's already movements of quantified self, but I feel like this is sort of like the qualified self of being able to add more of reflections about yourself in more of a qualitative way using these large language models. So I think it's a really powerful experience and something that's going to really stick with me. So thanks again for taking the time to create it and to come on the podcast to help share some of your process and experiences and motivations and intentions for creating it. So thank you so much. Thank you.

[00:40:31.371] Lauren Lee McCarthy: Yeah, it's been a pleasure. Thank you so much.

[00:40:33.933] Kent Bye: So that was Kyle McDonald and Lauren Lee McCarthy, and they had a piece at Ifadak Lab called Voice in My Head. So I have a number of different takeaways about this interview is that, first of all, well, this is one of the pieces that could very easily and quickly become a product that I could imagine people using. Of course, there's all sorts of different privacy implications with a piece like this, but in the context of this art festival, I can definitely see the allure of what it would mean to Have this ai that's being able to monitor you throughout the course of the day kyle said one of the provocations for this piece would be What if a ai is able to become a therapist that is just always there and provide feedback about what's happening? In your life and just kind of help you reflect And it's kind of like this cognitive behavioral therapy where you're able to have this meta voice that is listening to what you're saying, but then provide you specific feedback. And if you want to give very specific directions, if your inner voice is too critical, then maybe have something a little bit more supportive. And so this was all designed for like a half hour experience, but they are very much interested in potentially expanding it out to have either a longer context or to just give people access to something like this, where they're able to play around with this idea. certainly within the context of if a doc lab it's like this art project that You're able to maybe surrender yourself a little bit more than if you were to just use this over the course of your life But yeah, it's something that as I've told and hover different people about it. They're like, yeah, I totally would use that I would love to have something like that So I feel like this is something where we're allowing AI to really intrude into the most intimate parts of our lives, but for the sake of trying to have it help us and to improve our lives in different ways. Of course, there's lots of different limitations for the large language models and To what degree are you going to potentially allow it to subtly shift your thoughts or your behaviors in a way that may be going in a negative direction? But honestly, that's what Kyle and Lauren said. This whole piece was also making these systems much more explicit in the way that they're doing that, in a way that we already have all these different AI systems that are controlling our Algorithmic feeds within Instagram or on TikTok or on YouTube or Facebook, whatever social media, there's just lots of layers of AI that's already subtly shaping the way that we think about and see the world. This is just making it much more explicit in a way that is literally trying to like replace the voice in your head. Yeah, so it's such a provocative idea. And for me, it just really got me thinking about my own inner monologue, my own inner voice, and to have it in artistic context, that makes me reflect upon my own inner monologues that I have. And yeah, I feel like there's ways that the technology feels like it's inevitable that we're going to be leaning in some of these more directions. Meta has already outlined that they want to create this quote-unquote contextually aware AI which is essentially this they're just on things like meta Ray-Ban smart glasses that you know imagine having those cameras on all the time or AI listening to everything that you say or do and then trying to make Contextually aware judgments based upon either giving you information or having different commands or interfaces that would be contextually relevant and So there is the whole drive towards this contextual AI that certainly for me at least from my perspective Has a lot more problematic aspects when it comes to privacy but this piece is this weird thing where it's half dystopic and half like completely pragmatic in a way that they're trying to create this as a tool that could really help people so and It's quite an interesting blend of being in this liminal space of something that could feel like it would be totally going down a very dark path, or going down a path that is very exalted for what the potentials of the technology even are, and really leaning into this idea of AI that's being in support of what we want to create in our lives. So it's definitely a piece that when I came back from DocLab, it's one of the highlights when it comes to just really provocative and thought provoking and intriguing and, you know, just challenging in a lot of ways, both in terms of what the boundaries are for where we want this technology to go, but also just the potential for what you can do with it and really just pushing those boundaries in all those different ways. So. Yeah, so that was a voice in my head and I had a chance to do Traversing the Mist and Voice in My Head. I have another 17 interviews from IfDocLab that I'm going to get to into the new year, but I just wanted to get these out, especially because I had talked to a couple of artists that were at DocLab this year that I had previous unpublished interviews with, as well as with this conversation with Lance Weiler that I wanted to do this little mini-series that was going back in time looking at some of these previously unpublished interviews. to give you a sense of the trajectory of some of these different artists like Lance Wyler, Ting-Yin Cho, in this case, Lauren Lee McCarthy, and then the second piece in collaboration with Colin MacDonald. So, that's all I have for today, and I just wanted to thank you for listening to the Voices of VR podcast. And if you enjoyed the podcast, then please do spread the word, tell your friends, and consider becoming a member of the Patreon. This is a listener-supported podcast, and so I do rely upon donations from people like yourself in order to continue bringing this coverage. So you can become a member and donate today at patreon.com slash voicesofvr. Thanks for listening.

More from this show