#1256: Using GPT for Conversational Interface for Escape Room VR Game “The Unclaimed Masterpiece”

The Unclaimed Masterpiece won the best student VR project at Laval Virtual as it has a novel integration of a conversational interface with a virtual assistant / character who is assisting you as you try to find the correct virtual painting to steal from a mult-floor gallery. The project was created by Alizée Calet, William Plessis and Maël Sellier, who are all students in the Master MTI 3D at Arts et Métiers Laval. I spoke with Calet and Sellier about their process of creating this escape room VR experience, and the range of different AI integrations that include Whisper, ChatGPT 3.5, and Stable Diffusion to create the paintings in the experience via generative AI.

This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR Podcast. This podcast looks at the future of spatial computing. You can support the podcast at patreon.com slash voicesofvr. So continuing on my series of looking at the intersection between XR and AI, today's episode is with the winner of the Best Student Project at Laval Virtual called The Unclaimed Masterpiece. So this was an immersive experience where you are chatting with a chatbot. You're actually using natural language to be able to communicate. It's using Whisper to be able to take what you're saying and then translate it and feed it into chat2BT 3.5 and then it returns back instructions. So you are a thief and you're trying to steal one of the different images. And it sometimes is giving you clues that are accurate. Other times it's hallucinating and giving you bad information. So you have to navigate the clues that you're given and find the right painting and then basically steal it and go down the elevator. All the images are also created using stable diffusion as well. So this is a student project that was created in a couple of weeks to be able to explore this intersection between generative AI and conversational interfaces and virtual reality. So that's what we're coming on today's episode of the Voices of VR podcast. So this interview with Elise and Mel happened on Friday, April 14th, 2023 at Laval Virtual in Laval, France. So with that, let's go ahead and dive right in.

[00:01:27.920] Alizée Calet: Alright, so I'm Alizée from MTI 3D and we are both students from the Arts et Métiers level. And we were asked to produce a VR application in two weeks. So we chose to make a VR game on the principle of who is it. So basically the player has to guess what painting he has to steal.

[00:01:50.744] Maël Sellier: Hello, so my name is Myles Sellier. I'm also a student in the first year of master in MTS 3D. And so Alize works in the project as a 3D artist and 2D artist. And I worked as a few as a 3D artist, but most of my time as a developer, gameplay developer. OK.

[00:02:13.594] Kent Bye: Great. And yeah, maybe you could talk a bit more about your backgrounds and your journey into working with virtual reality.

[00:02:20.920] Alizée Calet: So I grew up in Laval and I've always been captivated by virtual reality as it started very early with this event and I've studied in L3DI which is a video game school based in Laval. At the time I was in we didn't have VR courses and lessons so we all did basically only video games but that was a solid base to learn how to make a VR game.

[00:02:45.763] Maël Sellier: And for me, I have always been fascinated by video games in 3D, movies in 3D, but not by the virtual reality when I was younger. But last year, I discovered the virtual reality game and it was amazing for me because it's absolutely different than classical screen. and I tried the master and it's very interesting and now I know how to make a 3D video game and application in virtual reality.

[00:03:17.827] Kent Bye: Yeah, I'd love to hear how this project came about. Because the idea is that you are trying to find a painting, and then you're interacting with a robot. So you're using ChatTBT to translate your voice, and you're getting clues. And so you have an assistant with ChatTBT, and there's a conversational interface where you're trying to talk to the virtual assistant, this virtual robot. as you're trying to go through these different floors and find the right painting to steal. So that's the premise of the piece, but how did the idea to use artificial intelligence come about with virtual reality? So where did that start?

[00:03:52.922] Maël Sellier: I think our goal, we had a project to make a virtual reality, and our goal was to make something new, because when we started as ChalGPTRP, to make a video game with this, it was new, nobody make it, so we decided to try something new, and we didn't want just to make a functionality with AI, We wanted to make a real application where the AI has a personality, you have a real interaction with it, and yes, to play with you because she's not a simple robot that says yes or no. She also makes some jokes with you. Sometimes she taunts you, but it's a way to play with people.

[00:04:38.935] Kent Bye: And so yeah, as you're coming in on the project, what were you working on on this project?

[00:04:43.926] Alizée Calet: During this project I've worked on basically most of the graphics, like for the 3D modelling, the 3D texturing, some of the lighting and I've also worked on the 2D graphism for the UI in-game and the UI for the public, for the spectators, those who are not currently playing but watching the experience happening. And I've also worked on the communication part, meaning some of the posters and the tutorials that are displayed in our stand.

[00:05:11.402] Kent Bye: Yeah, and there's a lot of pieces of art and I see that you're using stable diffusion where you're a part of also creating some of the paintings by giving prompts and including that in the VR piece as well.

[00:05:21.408] Alizée Calet: Yeah, what we did was, most of the painting you see here in our environment has been generated by Stable Diffusion, indeed. And as you can see, there are some posters that we have made ourselves. For example, this poster has been hand-painted fully, and it kind of represents our game, which is the user versus the AI.

[00:05:43.920] Kent Bye: And yeah, maybe you could talk a bit about the interface between being able to speak, having that translated into the text, fed into ChatGBT. And did you use Whisper? Did you use Google? So maybe talk about translating the voice interactions into ChatGBT.

[00:05:59.538] Maël Sellier: So when you begin the game, as you saw, you can choose your language. So when a player is speaking, it's recorded, it's made as an audio file, it's translated by Whisper in a text file, it's sent to ChatGPT, then we get an answer of ChatGPT, we translate it in an audio file with Microsoft Translator. And then we get the audio file from Microsoft to Unity, and we played it. And it's made every time.

[00:06:30.478] Kent Bye: Yeah, I have to say it works pretty well in terms of communicating what I was saying and getting answer back. And there's a delay, and I think that's the delay that I saw also in the quantum bar, where you say something and there's a little bit of a processing that has to happen. But I feel like over time, That's probably going to get faster with real-time processing and whatnot, but there is a bit of very, very early days of this interfacing with virtual beings with artificial intelligence. There's an uncanniness of, it doesn't feel like a human because it's like a delay, but I feel like it's just a part of the, my early impression of saying something and then waiting for the response back. So, but yeah, overall, I felt like it was able to understand what I was trying to say.

[00:07:08.733] Maël Sellier: But honestly, we know it's now possible to get a direct answer, but the issue is when we start the project, the good version to automate it with a ChGPT wasn't very stable, so we used an old version. Well, it was pretty safe. But one of our next goal is to use this new technology and make it locally, because the issue is we are using the internet to communicate with ChatGPT. But now it's possible to just put ChatGPT on your computer and get the answer instantaneously. Is that with the llama? No, because Yamaha is Facebook, but with the IP of ChatGPT, it's now possible to use it locally.

[00:07:54.882] Kent Bye: And are you using 3.0 or 3.5?

[00:07:57.144] Maël Sellier: 3.5, but we want to use 4 better, because the issue is we can't use picture recognition, because we are just entering keywords like a color, an animal, a location, but it's not enough for the moment. and we want to give to the player a real interaction and get the possible answer to know what is in detail in the scene or in the painting for the player.

[00:08:27.798] Kent Bye: Congratulations for winning an award last night for the best student project. They called out the use of artificial intelligence because it is an integration that I think works quite well. What have been some of the reactions so far of the piece?

[00:08:40.492] Alizée Calet: So far, we've been extremely happy with all the reactions we've had. The people have been extremely welcoming of our project. People tend to be shy at first, speaking with the AI. Sometimes we see them discreetly looking at the mic and speaking really softly. But then after a few minutes of gameplay, we can see that the shyness goes away and people start to really have fun.

[00:09:04.082] Kent Bye: Can you talk about the process of training the personality of this AI bot? Because you have to use a prompt to say, this is a character you're going to be playing. And so maybe talk about how you created this character.

[00:09:14.832] Maël Sellier: Yes, that's it. When we are launching the game every time, we send a prompt to ChatJPT with a lot of questions, possible answers. And we are just giving a context to ChatGPT. And with that, you can understand what kind of comportment he has to get with the user, what is the limits of answer with people. DAVID MALANYIKIS Yeah.

[00:09:37.207] Kent Bye: And sometimes, ChatGPT gives wrong answers, or it sort of hallucinates in different ways. And so in the experience, you say that it may not be providing you accurate information. And so is that part of the limitations of ChatGPT, or is that part of the gameplay that

[00:09:53.415] Maël Sellier: That's a part of the gameplay, because at the beginning of the gameplay, we give the instruction to Chachibite to lie to you. Then he has to be honest, but it's progressive, it's not instantaneously. So firstly, if you are playing, don't take the word what he's saying. You just have to look at the painting, and don't speak too much with him, because he's going to make a lot of jokes about you, and not telling you every time the truth. But at the end, he's going to be OK.

[00:10:23.131] Kent Bye: TIMOTHY JORDAN-JONES. Nice. And finally, what do you each think is the ultimate potential of virtual reality, and what it might be able to enable?

[00:10:35.124] Alizée Calet: I think the full potential of virtual reality is doing something that you cannot do in reality. And I think this project shows it because you're playing a simple game, really. You're playing the who-is-it with somebody who's not there, with somebody who doesn't exist. And I think implementing AI in a virtual reality game is making full reality possible. It's like you're creating something that's not possible and making it fully possible, even with a social interaction.

[00:11:05.085] Maël Sellier: For me, one of my goals has been done with this game, I think, but I want to explore more about this, because for me, getting a real interaction for each people with what people say, and in VR, that's one of the better thing, because you've got a real immersion. That's what we are talking with people. The environment is very cool, but getting a real answer and real joke, and it's a very human compartment about it, and it make a real reality game, yes.

[00:11:36.136] Kent Bye: Again, congratulations on the win here at Laval Virtual and this integration between the chat GPT and VR. I think there's a lot of potential for where this might go in the future, especially when you start to have local versions and have that latency get reduced down. But yeah, really enjoyed the experience. And yeah, thanks again for joining me here on the podcast. So thank you.

[00:11:52.929] Alizée Calet: Thank you so much. Thank you.

[00:11:54.950] Kent Bye: So that was Alize Kalei, as well as Mel Salih, and they were the student creators of a project called The Unclaimed Masterpiece, which picked up the best student prize at the Laval Virtual of 2023. So a number of different takeaways about this interview is that, first of all, well, this is a pretty straightforward experience in terms of, you know, creating essentially a gallery space with these different floors, you have an elevator, And the big innovation here is this conversational interface to speak with this omniscient voice and you're getting feedback as to different tips and clues as to where to go as well as what image you're looking for. And there's a game component where they're kind of using the hallucination aspect of generative AI to play with getting false information or fake information. And the piece overall is also a bit of a commentary that you're going in and you're stealing the art and It's also using the generative AI, so they're casting it in what is the ethics around the data provenance and data theft, and you're literally embodying a thief within this project. So yeah, to be able to have this real-time conversation, it's also something that was featured in Quantum Bar. I was using Chat2BT 3.0, this was using Chat2BT 3.5, so the 4.0 had just come out. And yeah, I think this is a theme that's going to continue to expand and grow out when it comes to being in these different immersive worlds and how to use these conversational interfaces. They were using Whisper, which is an open source version from OpenAI. And after this, over the summer, I actually used a fork of Whisper called WhisperX to be able to create transcripts for all 1,200 plus episodes of the Voices of VR podcast. And so that's a way that I'm exploring the utility of these large language models that are able to understand spoken text and be able to convert it into a transcript. I found that it was at a level that was very satisfactory, something that was very consistent and very easy for me to be able to now generate these automatic transcripts that have time codes and everything. So that's been a huge boon for my own workflow and using these open source tools like Whisper, I imagine that I'm going to see a lot more of these types of conversational interfaces and You know, in this case, they're feeding that text that is from Whisper into chat to be able to have these conversation interfaces. They prompt it with a character and the scene and some certain knowledge. And then from there, you're able to engage with different conversations you can have with the virtual assistant. And they cited the fact that they were using the intersection between AI and virtuality as one of the reasons why it took home the top prize of the student projects at this year at Love All Virtual. So that's all I have for today. And I just wanted to thank you for listening to the Voices of VR podcast. And if you enjoyed the podcast, then please do spread the word, tell your friends and consider becoming a member of the Patreon. This is a support podcast. And so I do rely upon donations from people like yourself in order to continue bringing this coverage. So you can become a member and donate today at patreon.com slash Voices of VR. Thanks for listening.

More from this show