#1264: Inworld.ai for Dynamic NPC Characters with Knowledge, Memory, & Robust Narrative Controls

Matt Kim is the technical creative director at inworld.ai creating demos to show off their NPC platform, which was featured in my previous episode #1263 in the MeetWol demo by Liquid City and Niantic. I was really impressed with how my interactions with inworld.ai seemed to go beyond the limitations of ChatGPT and existing tech demos of large language models. Their website elaborates on how they’re taking NPCs to the next level by saying how they add, “configurable safety, knowledge, memory, narrative controls, multimodality, and more. Craft characters with distinct personalities and contextual awareness that stay in-world. Seamlessly integrate into real-time applications, with optimization for scale and performance built-in.” The low-latency responses is one of the more noticeable features, which makes a huge difference. I had a chance to catch up with Kim at AWE to dig into a bit more of some of the features that they’ve built for NPCs on their inworld.ai platform.

This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR Podcast. It's a podcast that looks at the future of spatial computing. You can support the podcast at patreon.com slash voicesofvr. So this is episode 12 of 17, looking at the intersection of XR and artificial intelligence. And today's episode is with Matt Kim, who is the technical creative director at nworld.ai. So this is a company that's doing some really cool NPC technology to be able to have conversational interfaces with bounded knowledge that allows the NPCs to have a unique and novel interaction, but still operate within the bounds of knowledge that you're providing it and. be able to use this tool to do this speculative world building exercises that allow you to generate your own terms, your own jargon, and also create really compelling character arcs that can change over time. And yeah, just a really robust character platform to start to take the potentials of something like large language models and add a lot of narrative tools that immersive creators and world builders and game designers can start to use to have the next level of non-player characters within these immersive and interactive experiences. So, that's what we're covering on today's episode of the Voices of VR podcast. So, this interview with Matt happened on Friday, June 2nd, 2023 at the Augmented World Expo in Santa Clara, California. So, with that, let's go ahead and dive right in.

[00:01:28.888] Matt Kim: Yes, so my name is Matt Kim. I work at InWorld as a technical creative director. I have a background in software and ML as well. So I work as basically a generalist who focuses on creating demos that both show off what InWorld is capable of doing and also help inform some of the product development as well.

[00:01:50.228] Kent Bye: So maybe you could give a bit more context as to your background and your journey into AI.

[00:01:54.708] Matt Kim: Yeah, so I studied computer science and worked in the field as an ML engineer for a while, working on different kinds of things like anomaly detection for potholes, for example. Or I also worked on chatbots as well, doing NLP, similar to what our in-world team's background is, who worked at Google on Dialogflow and Google's Assistant as well. And I think something that we have a lot in common is that a lot of us had a passion for gaming, right? Personally, me, I was also passionate about gaming, but I spent time making films. So basically, post-grad, I would get money from a software contract, and then I would spend it on a film, and then make that, right? And then eventually, I was directed towards InWorld during COVID, when I taught myself some Unreal Engine in order to continue my creative projects, because I couldn't do films in real life, right, because people had COVID. And then I ended up here, kind of combining a bunch of these things at InWorld together.

[00:02:53.853] Kent Bye: Maybe you could give a bit more context of the origin story of InWorld, of how that came about as a company.

[00:02:59.341] Matt Kim: Yeah, so the origin story of InWorld is actually a pretty long one. So the founder, Ilya G., had a team of engineers who came out of Russia together and they created this company called Dialogflow, right? And this was doing, you know, AI chatbots from a long time ago before all of this hype, like centuries ago, right? And they got acquired eventually by Google, and they worked on their product Dialogflow, which was building out chatbots as a service for Google. And eventually, as Ilya-G tells me, this is paraphrasing, right? Eventually, they were getting bored. And his investors of Dialogflow and himself and his engineers were like, OK, what's next? What are we going to do? And so they decided, well, let's do something cool, right? Let's take all of our learnings, right, and apply it for AI characters for games, right? Because, like, Ilya G himself has a passion for Diablo, right? And everybody else has a passion for certain kind of games and experiences, and was thinking, like, oh, this would be awesome if we can revolutionize gaming in some way. And so they went ahead and they started the company.

[00:04:06.942] Kent Bye: Nice. And so, yeah, I just had a chance to do the meat wall. That was Niantic demo, where I had a chance to interact with the owl. And in my interactions, I was instructed some directions for how to interact with this owl character, you know, to tell me a story or to tell a joke. And then it had some prompts that gave me some direction. But then in the open sort of experimental, I tried to break it. And it's like, what's the meaning of the universe? And then it still tied it back into a relevant question. I asked it, how are you going to get scared? And it was able to always go back to something that was on topic. And my interactions with large language models like ChatGBT, it's very easy to break it or to push it past its bounds. But I found it quite difficult, actually, in the short time that I had in the demo to really find a way that it was going to break the immersion. So I was able to preserve that character. And I found that each time it was surprising me with keeping on topic, I felt more and more immersed into the social possibility that I'm actually engaging with this as a character. So, I'd love to hear a little bit about what the magic is to be able to help create these knowledge bases and tune it and redirect it into something that doesn't feel like you're having the response of, as a large language model, I cannot answer that. So, you kind of have this denial that you get from Chat2BT from the extra tuning that they have.

[00:05:21.607] Matt Kim: So most of it is done simply by you telling us what you want this character to know. And then in-world, we, as a service, make sure that they stick to that, right? So you as a user, you basically give it what you want it to know. And you create the world that you want, and we make sure it sticks to that, more or less.

[00:05:39.643] Kent Bye: And so what kind of additional interfaces do you have? Because you have people that are speaking. You have to interpret it. So are there large language models that are interpreting what people are saying? Or maybe talk about what is the interface between people speaking and then the magic that is having this bounded set of knowledge that's interfacing with. So what's some of the glue that you have in order to make that connection?

[00:05:59.965] Matt Kim: Yeah, so we take care of what you're talking about is basically a contextual mesh, right? So when a person asks a question, right, our systems, our models, right, figure out what context is important for this character to know of in order to answer that question, right? And there's a lot of ways to hook into that context. Either that's through knowledge, which you as a user give to this character when you're configuring the character, right, creating the character. You can basically upload a whole world's knowledge worth of things, or in the experience of Niantic, knowledge about redwood forests. And the other thing is within an experience, you can also give scene context as well. You can even create triggers that when a certain action is taken in a game, like if a player shoots you or steals an item, for example, then that gives a trigger awareness, a scene awareness to the character. Also, finally, there is primary motivations of the character in the scene, right? What it is they're trying to do. That's configurable both by having a base motivation in the character, but also we have a goals and action system that you can use to make transient motivation. So it's like, this motivation is finished, now move on to the next motivation, right? And you can also have actions that are triggered based on if a certain motivation is activated. So we have like intent recognition as well to allow you more control of how the flow of this character thinks and decides what actions to do. And then when that action gets triggered, you can actually have that trigger like quote-unquote physical action of this character. Like for example in our Anakin demo, if he understands that you want him to grab a certain box, he can actually go and grab it.

[00:07:33.620] Kent Bye: Yeah, and so I feel like that that mechanism of a motivation allows you to get beyond some of the problems that folks like Edward Saatchi of Fable Studios found with Chachabit 3 and 3.5, which was that it was hard to define a character that would maintain character and not break character. And so you're able to preserve the character, but also have a dynamic evolution of a character arc. And so because there's motivations and completions of that, you're able to complete a full arc of a character, which I think is also unique in the sense of like most of the large language models just seems like it's repeating facts, but this seems like a mechanism that you're able to actually get a character dynamic that's going through a whole hero's journey or an arc in some way. So I'd love to hear if there was certain dimensions of like just embodied experiences from video games or looking into like deep narrative theory for character motivations and It's almost like you're taking aspects from cinema and literary theory for talking about these characters, but to actually implement it in the code. So yeah, just love to hear a little bit more about that evolution of that development of these concepts of things like motivation and ephemeral actions that have different states that then can evolve over time like a character arc.

[00:08:37.857] Matt Kim: Yeah, I mean definitely. We are constantly talking to creators and creatives all across the board and we're very lucky to have been connecting with people who are experts in this, like arranging from Disney, of course we're working with Niantic, we also did a project with Neil Stevenson, an excellent writer, right, writer of Snow Crash, and basically And the way we go about it is they kind of inform us of what is important and the languages of which they talk about, right? And then we create these tools accordingly so that there is a human to AI relationship here where it's the creatives, right? We've built this tool around creatives and how they think and work in order to enable them to have control over these characters. So that's the method that we took. Also, a lot of us, ourselves as individuals on the team, are creative. We write. We have hobbies in the creative industries. So we have a bit of an intuition here as well.

[00:09:31.748] Kent Bye: So have you started to use some of the in-world AI for some of your film projects?

[00:09:36.270] Matt Kim: Oh, well, since I work at nWorld, right, most of my own projects that I come up with creatively are things that I pitch within the company. So I do it directly at nWorld. And yeah, it's quite busy, so I don't get as much time to experiment. But I do do some experiments sometimes. But it's mostly the kind of stuff that it's like, I'm not sure if this would work that I do in my free time. So when did nWorld launch publicly? nWorld, we started in August of 2021. And then in the following December is when we launched Studio. OK.

[00:10:07.845] Kent Bye: And yeah, what are some of the other projects that are notable if people want to get an experience of nWorld AI? What are some public projects, aside from Meatwool, which I just saw from Niantic? But are there other things that you point to for folks to go check out?

[00:10:20.802] Matt Kim: Out of the projects that are currently out, there's a lot. I mean, many indie devs working on their own interesting projects, which if you join our Discord, you can see we have a showcase page from Slothtopia. It's a really cool one that kind of has an Animal Crossing type of vibe to it. There's also an experience released by Netties. They integrated Enrolled AI into that controls this drone that helps you out, and you can give it commands to heal you and defend you and stuff. Of course there's Niantic Wall, which you already mentioned. We have our own internal Metropolis Origins, which is going to be released on Steam next fest.

[00:10:57.848] Kent Bye: Seeing some of the demos from nWorld, it brings up these sci-fi examples of things like Westworld, where you have these AI characters. And so you have the human interactors who are engaging with the AI. But sometimes you have the AI interacting with themselves. So you have these characters that are dynamically interacting. Have you seen any examples where you have the characters that are interacting with each other, rather than just the primary person that's interacting with them?

[00:11:20.045] Matt Kim: Yeah, so this is a very popular request, actually, by a lot of our users. I think it's just a very human thing to want to see the AIs talk to each other, right? And yeah, we've seen a lot of experimental demos and games that people have made. And a very popular one is Twitch streaming. So similar to what I said before, there's a user on our Discord named Peter who did a series of these Twitch streams where we had the robots talk to each other. And then people in the Twitch comments could insert suggestions for conversation topics and see how it changes.

[00:11:49.712] Kent Bye: Interesting, yeah. And I guess when I was talking to Kichi Matsuda of NeatWool, he was saying that a lot of the input and output of both the speech-to-text synthesis as well as the text-to-speech back that you're getting is all done on your back end. And so I've played around a little bit with the OpenAI's Whisper. It has a really good Reliability. And so have you been able to pull from some of these projects that are out there to do that type of speech synthesis? And what kind of reliability do you see in terms of like error rates or something like that?

[00:12:22.187] Matt Kim: So actually, we use our own models for that. And I think we plan to keep doing it because we found that for our customers, there's a very specific problem set that they have around this, which is we make characters for fictional worlds. And we are going to be rolling out a feature for people to be able to bias speech to text based on fictional terms that they have. And so this is something that is kind of a good way to pitch in real competitively for this specific use case, which is that tool to allow them to do specific biasing. And it's something that we kind of got as a need, working with these different creative partners, like Neil Stephenson, for example, where they would request, hey, they're not pronouncing this, both on the speech-to-text and speech-generation side of certain words not being mapped properly because they're fictional, and also certain names being pronounced in the correct way that they see them being pronounced.

[00:13:18.506] Kent Bye: Gotcha. So because you're doing your own model of that translation, then you can start to add a set of keywords that is able to feel like you can create your own jargon and create this imaginary world, not have to rely upon that it's not in the training set of large language models. In other words, yeah.

[00:13:32.139] Matt Kim: Yeah, exactly.

[00:13:33.620] Kent Bye: Great. So what's next? What are some of the big features that you're working on that we should expect here at some point soon?

[00:13:39.012] Matt Kim: Well, the biggest one is our goals and actions system, which we're going to be rolling out pretty soon, hopefully this month. What that is is actually a lot of stuff that you would have seen in some of our demos already, but we're going to be releasing it to the public because we've been just trying to figure out the best way to expose that and let people use it and control it. And so essentially what that does is it allows you to have what we talked about, like a set of motivations, and to have those motivations of these characters kind of chain or be triggered by intent detection. of what people want. And we have this demo here of the Anakin, which was built from a shopkeeper demo, essentially, that allows you to have a character have whatever actions you want. In this example, it's buy and sell, right? And have those actions, when recognized, mapped to entities. So in this example, buy and sell could be like a potion or a sword of might, right? And for them to actually be able to get these actions recognized from dialogue and then trigger them in the game. And this is fully from being designed in studio according to what your experience is to receiving these triggers in Unreal and Unity and whatever integration, web, you know.

[00:14:48.523] Kent Bye: So I've noticed that in MeatWall, this is an XR project, so it has this whole immersive component where you have a MetaQuest Pro headset that's being shown here that's a WebXR. It can be also on the phone, but I'd love to hear some of your thoughts, because we're here at Augmented World Expo, that you see this intersection between the XR components and what you're doing on this AI characters and how these AR characters are being integrated into these immersive worlds.

[00:15:12.405] Matt Kim: Yeah, I see a lot of potential for AR and the whole experience already demonstrates that and a lot of people are excited about this because the thing with AR, right, is that it gives you some kind of an immersion, right, by bringing 3D out into the real world, right. And when you do that with an AI character, it adds a layer on top of this, right, of these characters being alive and talkable too. So it's just very magical, right, and kind of natural, similar to VR as well. In a different way, it's immersive of you going into a virtual world, right? And I think that that's actually a very sweet spot. There's like a soft spot for a lot of the early devs here at nWorld because we used to actually build these AI characters on VR demos exclusively because we thought this is the way, right? This is the best way to experience AI characters. It's in virtual reality because it creates a more immersive experience of talking to NPCs. And doing that inside of a already more immersive 3D experience goes hand in hand very well together. But obviously, there's a lot of hardware issues with VR that prevent it from being more popular. But that is definitely something that we're all excited about and see as being important and valuable.

[00:16:17.915] Kent Bye: And finally, what do you see as the ultimate potential of artificial intelligence and XR more broadly, and what it all might be able to enable?

[00:16:27.870] Matt Kim: The ultimate potential is basically the science fiction world, right? Where you have these AI characters walking and talking around, both in AR and VR. You can have your pal next to you. It's limitless. It doesn't even have to be like a humanoid, right? It's like the owl in Niantic shows that there's a lot of value in talking to an owl and just enabling people to be able to create these personalities and characters for all sorts of use cases, both for entertainment, but also you can have an assistant or You know, I saw an early Niantic hackathon where they had, like, the spirit of a building, right? It gives almost an area sort of a permanence, right? So yeah, I think it just brings a lot more life both to our world, like, physically with AR, and also to immersive VR worlds where you can have these AIs populating them.

[00:17:12.460] Kent Bye: Is there anything else that's left unsaid that you'd like to say to the broader immersive community?

[00:17:16.872] Matt Kim: I'm very excited. And I hope you guys are excited as well. And I highly encourage everybody to experiment as much as possible. Go sign up on InWorld. Make your own characters. Join our Discord. And give us feedback. Join the community. We'd love to talk to you and help support whatever your vision is for how to use AI characters.

[00:17:37.442] Kent Bye: Is there a free option for people to play around a little bit? Or do people have to sign up and become a subscriber in order to make a character?

[00:17:44.260] Matt Kim: So everybody who signs up gets a free amount of interaction time as soon as they sign up. I think it's about like 300 credits and translates to a pretty good amount of conversation, I think like a day or two. So you can do that before you start charging. But honestly, the costs aren't too crazy. And me myself, I don't know what the exact numbers are, but I know it's low enough for a bunch of indie devs to be experimenting with it on our Discord.

[00:18:07.925] Kent Bye: Awesome. Well, thanks so much for joining me. And excited to see where this all goes, because the meat wall demo was really quite impressive of ways that it was able to capture the sense of a character, which I think is a unique new thing that I haven't seen before. And so to see how that's going to be integrated into these immersive experiences, I think it's really exciting to see how that starts to flesh out these complex worlds and differing character motivations and immersive experiences. So thanks a lot for joining me to help break it all down.

[00:18:33.435] Matt Kim: Yeah. Thank you, too, Kent. It was a pleasure.

[00:18:35.613] Kent Bye: So that was Matt Kim. He's a technical creative director at nworld.ai and they're creating all sorts of really amazing non-player character technologies to be able to push the limits of what you can do with these conversational interfaces and large language models and artificial intelligence in general. So I've had a number of different takeaways about this interview is that first of all, well, I've been super impressed with the different types of experiences that I've seen from nworld.ai. And it's funny because if you go back and listen to the panel discussion that I had earlier on Wednesday, May 31st, 2023, where I said that I haven't seen any good examples of AI that has like a coherent narrative and is able to like really live into these different worlds. And I actually think that the technology has been progressing a lot faster than I was aware of at the time within world that AI, I have to kind of retract that a little bit because I think they're actually making a lot of really amazing innovations on that front. to set forth like a knowledge base, but also to have these different components of like character actions, different motivations, and have an ability to really design a character arc that allows different interactions and interactivity and knowledge to be solely revealed to it or to have different states and different actions. And yeah, just by collaborating with different game designers and storytellers like Neil Stevenson, the author of Snow Crash, who's been working with nworld.ai on future stories that he's working on. So the ability to tune these large language models to add your own jargon, your own pronunciation of things, and to also have these character arcs where you have the different changes. So MeetWall is the conversation that I had with Keiichi Matsuda in the previous episode. You can go to meetwall.com to be able to have your own interaction with nworld.ai. And yeah, you can see how you're able to have like a whole arc, but there's still a spontaneity when it comes to saying things that's like slightly differently, or it's not like a script, but they're using a broad base of like improv and they're able to kind of use the technology to get this unique and different every time, but also the technology that's able to be bounded within a certain set of knowledge and to slowly direct things back. Yeah, I was just really impressed by asking like an open ended question around like, name a time when you were scared. And the story that was shared by wall had enough of a connection where, okay, this sounds like this could have been a story where there was some fear in there. So just the way that large language models work is like these associative links that are somehow able to do like these inferences of like, tell me a story where you were scared. And then you can hear the story and imagine that there would be fear within that story even though there isn't any explicit mention of fear in the story or what may have been in the knowledge base. So that's where a little bit of the magic comes in where there's a bit of glue that comes in with these large language models that are able to close the gap between the ambiguity when it comes to the different inputs and how it responds and Yeah, just super impressed as a tool how it's able to steer and direct these story worlds. And I think if you go back to episode number 293, it's AI and the Future of Interactive Drama and the conversation I did with Andrew Stern and Larry LeBron We're working at the time on this project called immerse which is a gestural based virtual reality training simulation for soldiers that were able to have these Modules of emotions that were stacked on top of each other to build these different emergent behaviors for social dynamics But going all the way back to facade, which is a interactive game that Andrew Stern did with Michael Mateus that was released in 2005 they were talking about how you can push the limits of what AI is able to do and If you put it in the context of a story world, because in the story world, you know what the context is, you can kind of cheat a little bit in terms of like, we don't have artificial general intelligence that's able to independent of whatever the context is, be able to respond to an interaction. But if you're within the context of a story world, then that story world is actually bounding possibilities in a way that actually can take the existing artificial intelligence, machine learning, all these different tools, and push it to the limits of plausibility, because you're kind of already suspending your disbelief by interacting with these characters who have a certain amount of motivation. So yeah, just to see how the literary theory is able to be put into this modular system that you're able to interact with and create these different inputs. And at the end of the day, you have these very robust characters that have different motivations and arcs and Dynamic interactions and yeah, it's at the very beginning of where this is going to go in the future And I expect to see a lot more of this as we move forward having these types of dynamic interactive AI characters And so yeah, very interesting to hear that even the creators within in world that AI Felt that the best experience of some of these interactions with AI agents was within the context of virtual reality so being completely immersed within a virtual world were able to be transported into a completely different context and And when you're met with these different AI agents that are in that world of that context, then it helps you just to suspend your disbelief and just create this dimension of social presence and plausibility that gets you even more immersed into the world. And there is this experience where you're trying to break the conversational AI, and when you can't, then the more that it's responding in a way that's plausible, the deeper the presence and plausibility that you feel in the context of this world. This is something I experienced back in a piece called Starship Commander, where I've had three previous conversations in episode 503, 729, and 955, where there's a conversational interface where you're speaking to an AI agent, and each time it responds to you in a way that you expect, it just deepens the level of possibility. felt like even more immersive. And so I feel like these NPC conversational agents have a huge capacity to cultivate that type of deep level of social presence and possibility within these experiences. So, yeah, I expect to see a lot more of this as we move forward, especially as they start to fine tune some of the specific things that you don't see in a lot of other systems with the real time interaction and the streaming input and also just being able to define the knowledge base and be able to have different actions and create a whole arc for the NPC. So. Yeah, super excited to see where this goes in the future and expect to see a lot more of it as we move forward. So that's all I have for today, and I just wanted to thank you for listening to the Voices of VR podcast. And if you enjoy the podcast, then please do spread the word, tell your friends, and consider becoming a member of the Patreon. This is a listener-supported podcast, and so I do rely upon donations from people like yourself in order to continue to bring you this coverage. So you can become a member and donate today at patreon.com slash voicesofvr. Thanks for listening.

More from this show