#503: Conversational Gameplay & Interactive Narrative in Human Interact’s ‘Starship Commander’

Alexander-MejiaHuman Interact announced this past week that they are collaborating with Microsoft’s Cognitive Services in order to power the conversational interface behind their Choose-Your-Own-Adventure, interactive narrative title named Starship Commander. They’re using Microsoft’s Custom Recognition Intelligent Service (CRIS) to as the speech recognition engine, and then Microsoft’s Language Understanding Intelligent Service (LUIS) in order to translate spoken phrases into a number of discrete intention actions that are fed back into Unreal Engine for the interactive narrative.

I caught up with Human Interact founder and creative director Alexander Mejia six months ago to talk about the early stages of creating an interactive narrative using a cloud-based and machine learning powered natural language processing engine. We talk about the mechanics of using conversational interfaces as a gameplay element, accounting for gender, racial, and regional dialects, the funneling structure of accumulating a series of smaller decisions into larger fork in the story, the dynamics between multiple morally ambiguous characters, and the role of a character artist who sets bounds of AI and their personality, core belief system, a complex set of motivations.

LISTEN TO THE VOICES OF VR PODCAST

Here’s a Trailer for Starship Commander

Here’s Human Interact’s Developer Story as Told by Microsoft Research

Subscribe on iTunes

Donate to the Voices of VR Podcast Patreon

Music: Fatality & Summer Trip

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. My name is Kent Bye, and welcome to The Voices of VR Podcast. So about six months ago at VRLA, I had a chance to talk to Alexander Mejia of Human Interact, and he was still in the early phases of developing his interactive AI-driven role-playing game called Starship Commander. So just this week, Human Interact announced their partnership with Microsoft Cognitive Services. They're using their cloud-based speech detection systems to be able to allow a user to be able to use natural language input to be able to have a conversation with the video game. And using Microsoft's Language Understanding Intelligent Service, they're able to extract the intent of whatever is spoken and be able to feed that within the Unreal game engine and essentially drive this interactive story. Now, at the time of this interview, Alexander was still in the process of exploring the different cloud-based services, and they hadn't announced anything yet. Now that it's been announced, I'm able to go back and listen to some of the early design principles and ideas that were driving this Starship Commander experience. So we'll be exploring the cross section of artificial intelligence and natural language input and interactive storytelling on today's episode of the Voices of VR podcast. But first, a quick word from our sponsor. Today's episode is brought to you by the Silicon Valley Virtual Reality Conference and Expo. SVVR is the can't miss virtual reality event of the year. It brings together the full diversity of the virtual reality ecosystem. And I often tell people if they can only go to one VR conference, then be sure to make it SVVR. You'll just have a ton of networking opportunities and a huge expo floor that shows a wide range of all the different VR industries. SVVR 2017 is happening March 29th to 31st. So go to VRExpo.com to sign up today. So this interview with Alexander happened at VRLA, which was happening in Los Angeles from August 5th and 6th, 2016. Now, like I said, this interview is about six months old, so there may have been some changes in the overall strategy and trajectory of Human Interact, but I still think it's an interesting time capsule to be able to look at the evolution of interactive storytelling in this interview. So with that, let's go ahead and dive right in.

[00:02:37.284] Alexander Mejia: Hi, I'm Alexander Mejia. I'm the founder and creative director of Human Interact. We are creating cinematic VR experiences where the audience talks directly to the characters to influence the story.

[00:02:51.208] Kent Bye: Great. So as I hear that, my first question is, well, are you doing some sort of natural language processing? And then how are you interpreting the meaning of that to be able to actually make a decision?

[00:03:01.487] Alexander Mejia: So there's actually a kind of complicated process that happens when we do this, but for the sake of brevity, what we do is we do speech to text, and then we take that text and then we attempt to determine its intent. And then once we have a certain percentage reliability that we know what the intent is, we then send that over a structured data to the game engine, and then it executes what it's supposed to do. That's a really long and roundabout way of saying, I talk directly to the character, and then the appropriate line plays back to me.

[00:03:33.030] Kent Bye: I see. And so how do you structure a narrative in this type of fashion? I mean, are you creating a fixed arc with subtle variations? Are you able to actually generate some sort of emergent story based upon whatever anybody says?

[00:03:47.272] Alexander Mejia: So for our current product, Starship Commander, we see the story as a funnel. And at the top of that funnel, it is wide open and there are many choices that you can make. As the story goes along, we start to narrow that funnel down so that you can see the results of all of your actions that you've made. So in our story, we have three characters. Of those three characters, they're kind of all morally gray. and that creates a lot of interesting opportunities. As you talk to every character, they give you their side of the story, and their side of the story is always, well, what I'm doing is great, and what I'm doing is amazing. But as you talk to other characters about those characters' story, and you try to question it, you start to find out, hmm, maybe things aren't exactly the way that they are. So, in that essence, there is an emergent narrative, but it's something that the audience needs to find out on their own. But in terms of us writing lines and everything, it is baked in stone. We don't do any text-to-speech. We don't generate lines on the fly. We found that our audience doesn't really react well when you're talking to a robotic voice. There's something really ingrained inside ourselves as human beings that we really need to see the human's face. We need to understand that they are like us. Otherwise, we won't attempt to have a conversation. So in Starship Commander, we actually contrast this intentionally by putting in a ship computer. So Starship Commander is a science fiction, choose-your-own-adventure narrative where you fly around the galaxy making decisions as a captain of a voice-automated starship. We kind of made the ship's computer Siri dumb, Siri stupid, as you might say. No offense to Siri, but kind of like people always say like, oh, it's not working. It's not working the way that I do. And the characters in the story actually respond to you and talk about how stupid that computer is. And we find that that contrast between the two. Helps the audience realize that the characters are people that they can talk to that they are people that they can relate to Because they have the same problems that you do with your voice activated smartphone right now Hey, so this reminds me of this interview that I did with Andrew Stern and Larry LeBron who Andrew worked on facade and he talks about this drama manager, which is kind of like this

[00:06:14.153] Kent Bye: overall directing of the narrative and they have this PDF called behind the facade which when you read it kind of reads like more of a computer program rather than a script right and so in that facade there's essentially like the two characters the husband and the wife and the emotional intent in essence is kind of like do you agree or disagree with them it's basically like a positive or negative Are you trying to be contrary to whatever's being said or are you trying to be agreeable? And there may be other slight variations, but in essence that's sort of like the high level. And so when you talk about extracting the intent behind what you're saying, is it something similar where you're competing towards allegiances amongst the many different characters and you're taking sides between them? Or maybe you could talk a bit more about like how you're actually extracting that intent and then what you're doing with it.

[00:07:02.010] Alexander Mejia: So what we do is we offer the audience choices when they get to a point where there's essentially, we try to keep it down to a binary choice like, are you going to A or are you going to B? And a lot of times, the story, the characters, they'll run into some drama themselves where you have to basically choose between one or the other. And as that happens, you make your choice and then we play out the story based off that. Now, to take that a little bit further, that sounds like the most simple part of it, but a lot of those choices start to add up, and that's why we call it the funnel effect, because at the bottom of the funnel, you must come out, you know, and like, this is your ending that you receive from all the choices that you make. So, at the end of the story, any of the three characters can be alive, but because they're kind of at odds with each other, you could probably take a good guess that they are not going to like that and they will have action in the story. So while you have agency to control yourself and what you tell people who will listen to you what they want to do, remember that these people are also real, that they're fully fleshed out characters, that they also have their own wants and desires too. So we try to make sure that whenever we create any characters in our narrative, that none of them are flat characters. We have a scale from like 0 to 10 when we create a character. And on 0, their motivation is incredibly flat, and it can be summed up with one word. For example, we say at level 0, maybe like the characters from Team Fortress, it's like, I want to kill the other team, right? Their motivation is really simple. And at the other end of that we have Joel from The Last of Us who is a really complicated person and I could go on for like you know paragraphs and paragraphs about his motivation and even at the end of it it's not really clear what he would do if he were presented in a situation because he has a lot of emotional weight behind him. And so on that range we like to have our characters at about a seven. And what that means is that they have a very simple-to-understand and surface-level motivation. For example, our wingman is a good soldier and she wants to kill the alien race, the Echnians, right? That's what she does. But underneath that, she has some trauma that she's experienced that will make her question those motivations later when they do come up. So you as the audience can actually play through the story, not learn about any of the backstory because you just want to go through all the action moments. And you will have a compelling story because you'll say, wow, that was fun. There were a lot of fun things that I did. And at the end, you know, the good guys won and the bad guys lost.

[00:09:52.249] Kent Bye: Yeah, the thing that that makes me think of is that, you know, in Facade there's like five different outcomes essentially at the end. And, you know, one of the things that Andrew Stern said to me is that he would rather play a 12 to 15 minute type of experience that has a good story, but that he could play it over and over and over and over again. something that has like a lot of replayability, meaning that there's a lot of variations where you can exert your agency and drive different types of interactions and outcomes. And so, in your experience, how do you think of that? Do you think of like there's this discrete number of outcomes or just trying to figure out how you're measuring how replayable this type of experience might be?

[00:10:29.772] Alexander Mejia: So, you know, part of getting there is the journey, right? And part of the journey is experiencing what happens when you interject yourself into this. So keep in mind, when I say fire the, you know, I tell the computer to fire the missiles, that intent is, okay, I'm going to go ahead and the computer's going to shoot the missiles. But there's many ways that you can say that. You can also say, computer attack, or you can say, computer go pew pew at those guys. We actually added that because someone in playtesting said that. And it was like, you know, that was a lot of fun. That should have worked. So being able to add your own self into the story, even though you're running through some of the same aspects, Our audience really, really enjoys that. They say that that was the part that they loved the most about the story was the fact that they got to be themselves and not pick through a little dialogue wheel like you see in the Bioware games or even in traditional adventure games where it's like, okay, I've got these five things to say. We actually find the enjoyment comes from the fact that you chose. There's like a selection bias in it. And, you know, it's something that you don't see in a traditional cinematic experience where a writer puts on the table, here's the five choices you have. Even though we've done the same thing, we say, you know, here's the five choices you have. But because you kind of get to do it the way that you want to, there's gameplay associated with that, which is kind of a strange thing because we don't think of talking to each other as a gameplay element, but we've spent many months just breaking this down, learning a lot of psychology, learning a lot about how people speak to each other for creating Starship Commander.

[00:12:05.140] Kent Bye: Yeah, what that reminds me of Ernest Cline's Ready Player One, which for the audience, if you haven't gone and read that, go read it now and then maybe skip a few seconds because it might be a spoiler, but there's a part in Ready Player One where they're actually kind of acting out in a movie and they're trying to actually recreate the dialogue from specific 80s classics, you know, and so they're able to actually step into the movie and start to interact with the other characters and so It kind of feels like that a little bit, what I hear you saying, is that the participants in these experiences are able to kind of, instead of actually following the fixed dialogue, they're able to actually say it from how they would actually say it, given if they were in that context and situation.

[00:12:43.722] Alexander Mejia: Yeah, so I want to clear this up really, really, really well here. I want to make this really clear. People say, oh, it's like Ready Player One. No, because you have choices. And I think that's what's going to really separate us. But it is similar to that, right, where you can't actually say it the way you want to. So there's choices. For example, do you want to be friendly with the wingman? You can do that. You can follow everything that she says. And she's going to treat you a certain way. Or you can be a little more brash and abrasive with her and she's going to also treat you appropriately. She's a person with character, you know, she's not a pushover. So as we develop the story and as we test this, we always find ourselves adding new lines, new sequences, new areas of like, oh, we didn't think about that. And that really speaks to the importance of having a diverse writing group. What I want to do in the side of the story is not the same as what a female wants to do. And we see this all the time in playtesting. And other different racial groups do different things and we have to add different words based off of colloquialisms and just how people speak from different regions. And so being able to add all these things together gives us a really compelling and really rich experience. Even though the storyline may not last as long as, say, a Netflix series, we are planning for having a movie-length type experience where you can play from the beginning to the end. I'm not quite sure exactly how long that experience is going to be. It really depends on scope and play testing. But that's what we think, that people can sit in VR, have a compelling experience, and take it off, and just be like, wow, I completed something.

[00:14:28.800] Kent Bye: Can you expand on a little bit more about what you mean in terms of the special considerations when it comes to gender or race, and how that actually plays out in your experience?

[00:14:36.933] Alexander Mejia: So I'll give you one example. In the story, there is a scene where you come up against the big bad guy. His name is Derek the Terrible or Derek Hao Xing. We make sure that everybody has two names so that they're kind of like the easy to understand name and then like their personal name. We find that a lot of male players, when they come up to him, they want to act very aggressive. So immediately they see him and it's like, fire the missiles, you know, attack him, let's kill this guy, right? They use these really aggressive words against him. And yet, when we see a lot of female players play, not all, but a lot, we find that they attempt to be more diplomatic, they attempt to negotiate. So we make sure that we have those two routes inside of the story and you know that all just comes up with player satisfaction at the end of the day. Did you get to affect the story in a way that you like to or at least was it acknowledged that you attempted to affect the story in a way that you wanted to but another character said no I'm going to prevent you from doing that.

[00:15:34.692] Kent Bye: So yeah, just a couple weeks ago, I went to the International Joint Conference of Artificial Intelligence and did 60 interviews and nearly 24 hours of content to bootstrap the Voices of AI podcast, which should be coming out soon. But in going there, what I learned is that natural language processing is something that requires quite a lot of either machine learning techniques to be able to actually determine from big companies like Google or Apple or Facebook, especially they've been able to get a vast amount of actual user input in terms of recording the voice and be able to train these machine learning algorithms to be able to detect what people are actually saying. Is this something that you're using some cloud platforms like IBM Watson or the Google Voice API? Or is this something that you've actually kind of created your own natural language processing machine learning engine?

[00:16:23.958] Alexander Mejia: So right now we're currently exploring all different avenues. We're only at a proof of concept stage at the time of this recording. But we are going to settle on cloud-based processing systems. There are some, like you said, from Google, Watson, and Microsoft. And what we found is that if you can stream the audio data from a PC, which is our primary platform, The latency from when you stop talking to when you get that intent back is less than 500 milliseconds on average. It's actually closer to about 100 to 200 milliseconds. And what we find is that the audience actually accepts that delay between when they stop talking and when they get the response back. Because that's a natural time that people actually do delay when they're talking. In terms of getting, like, hey, should we roll our own? Why, yes, I would love to roll my own system. But I always take a look at the cost benefit of that. You know, we look at what some of these systems are being made by Google. Have you looked at the resumes of some of these guys that are doing it? You know, we're talking about you know, millions of dollars poured into the development of these speech-to-text systems. And we see ourselves as a content company, as well as, you know, being able to implement this in a smart way and train people how to use this so that, you know, we don't want to be the only title doing this. We see in the future that this is a genre of storytelling in VR. There's like we're so much on the cutting edge right now. We don't even know what these words that we should use to describe this. We have to use this big jar of old mess of words like cinematic, choose your own adventure, VR, storytelling. It's like someone's going to come up with a word and it's just going to stick. We actually tried to come up with a word and then we just thought, hey, we're spending too much time trying to figure out what we're going to call this and not enough time figuring out what this thing actually is. We're in development. So, to answer your question, we don't develop it on our own, just because there are other guys that have done it, and they've been doing it for, frankly, 20 years. And the accuracy on these systems are amazing. It's just a matter of finding out what works well for us, us being able to submit a custom language model, because we have words that are not in the English dictionary, like, echnion. the race of our aliens. So we want to make sure that those words get transcribed accurately and properly. And we actually find that if we can submit a custom language model, our accuracy goes way up because you only need to have, realistically, so many of the words translated. But they need to be the right words. They need to be the right proper nouns that get in there. So you want to make sure that the words that you know your audience is going to say are weighted more heavily than, say, other common words that your audience may never say. and we use telemetry data to make sure that that works. So because we are doing the processing on the cloud, it's also important to state that this game requires internet connection to use. Not because we're tracking you or trying to record what you're saying, but just because it actually delivers the best user experience. We actually tried to do it all locally first, but the accuracy was just not there to have solid gameplay of you talking to another person, and having it understand you in a reliable fashion.

[00:19:52.790] Kent Bye: Well, I think one of the big challenges moving forward with these types of games that interact with these cloud-based services is that every time you make a call to the cloud, it is presumably charging somebody. So I'm just curious if people would have to set up their own account to be able to put it on their own bank account, or if that's something you would take care of as a company, as an overhead, and sort of charge some sort of fee to cover the cost. It's just sort of like this problem where you're creating an experience where if people use it all the time, you could actually lose money on it.

[00:20:21.975] Alexander Mejia: So I'm not going to be allowed to actually go into the specifics of the licensing details, but what I will tell you is that we are not going to charge customers a monthly fee for this. We see this as an experience that you download, that you play, like a game title, and you're going to pay a fixed price for it, and that's it.

[00:20:40.317] Kent Bye: And are you kind of focusing on building tools to license to other people or is this something that you're thinking about eventually just being a content producer specializing in this kind of either interactive fiction or interactive drama?

[00:20:53.333] Alexander Mejia: So I am actually building tools right now to facilitate our voice system. We also have other people on those doing that as well. In terms of getting it to a point where we can sell it as an SDK, I think it's really important to show everybody, you, the audience, that this is a viable product. Imagine if I were to sell you tools to build a house but you have never actually stepped inside of a house ever before. That you just lived outside under the stars and you just took your little roll-up bed and that was it. And you live in Southern California where you can actually do that. If I gave you tools to build a house, you wouldn't really know how to build that house properly, or how to furnish it, or, oh, you need rooms organized in a certain way. Oh, the bathroom should be connected to where you sleep in the master bedroom. You know, all these things that we take for granted because we've seen them over and over again. We feel that with our tools, that if we were to just release them out in the open right now and say, here you go, implement into Unreal Engine, pay your licenses through your cloud provider or whoever you want your speech-to-text provider to be, that people would use it but not create an experience that would look good for the system. We've been working on this for about a year now, as of this recording, just trying to get it right, iterating on it, iterating on it, until we say, yes, this is an experience that customers can go down, not just us, right, because we have to playtest this against so many different people, And as we do it, we actually find that we can't just playtest in one region. We used to just hit all the VR conventions in California. And once we started going to different places, we were like, OK, yes, our demo's tight. Yes, our demo's solid. Now let's go ahead and test it against some other people on the East Coast. And then we start running into problems that we didn't think of. when we thought, oh, we had this. Because there are cultural norms that we ourselves are learning about. And that's, again, one of the reasons why we say that we need to have a diverse writing team. It's not because it's some words that are going to sound nice when I say it. It is a real, absolute need for our product to have that diversity. Without it, I don't think we could be successful in the marketplace.

[00:23:13.173] Kent Bye: Yeah and to me back in May of 2014 a lot of people asked how I started the Voices of VR podcast and you know part of it was just tuning in and looking for the revolutionary technologies based upon the work of Rick Tarnas and looking for the cycles of history and just being attuned to like there was a big tidal wave of something revolutionary on the way and I saw virtual reality as that technology and I started doing the Voices of VR podcast at the first Silicon Valley Virtual Reality Conference and here now in July of 2016, it's now August, but in July I actually started recording the Voices of AI podcast because I feel like there's a similar tidal wave of this revolutionary technology that is coming but yet it's probably going to be a lot bigger than virtual reality will end up being just because I think that there's so many more all pervasive applications for artificial intelligence and a lot of these new machine learning techniques. And so I see this confluence and this trajectory where virtual reality and AI are on this collision path. And since you've been working on this for so long, I'm just curious to hear some of your thoughts of how these two technologies are going to come together.

[00:24:22.787] Alexander Mejia: So artificial intelligence through machine learning, God that's a mouthful isn't it, is going to change our world in ways that we cannot even imagine yet. I will try to at least give like my understanding where my brain is right now of it. I mean people will probably listen to this in five years and laugh but to me what I think where AI is going to really help us is to have a conversational future. And what I mean by conversational future is that you remember hanging out with your friends when you're a teenager and maybe you were playing some video games or you're sitting around playing some board games or I don't know, shooting hoops or whatever you did as a kid. And you could just yell over to somebody and say, hey man, let's get a pizza, I'm hungry. And they knew exactly what to do, and they knew exactly who to call. And, right, that's a really complicated process if you're thinking about, you know, from a computer. It's like, pick up the phone, call, oh, what are my friends like? Oh, let's not get this. Oh, I have to order a cheese pizza because one of the guys is vegetarian, right? And all of that was implicitly understood. over multiple sessions of, hey, we play the game on Friday, we always order the pizza, we always get these pizzas, and they were able to fire it off with one command, one string or statement, right? I'm thinking like a designer here. But those are the types of problems that AI will start to solve. There are other problems, too, that I'm seeing AI solve that might make people a little scared. Like, for example, the one where, I'm trying to remember, but there is this computer that can do impressionistic art, you know, based off of its algorithm, and then you could paint, like, trees, water, sky, and, you know, it would make a pretty compelling piece of art, you know, 90% of the time. And this was just a research project. You know, imagine what happens to artists in this world. I can even imagine, just in the future, Adobe somehow figuring out, okay, we're going to give away Photoshop for free. And part of the license agreement will be, we're watching you over telemetry, we're recording your data, and we're going to use that for future purposes. So the free version of Photoshop will be all manual, right? You're doing all the tools just as you do it now in 2016. Imagine in five years that they've collected telemetry for those five years. So now we're 2021. And now there's a pay version of Photoshop that they launch. And then they say, by the way, this thing will do perfect cutouts of everything for you. Absolutely the right time. And it's been trained by thousands of the best artists to do cutouts, right? Like that's a really common thing that you do. you know, doing a mask for a person, cutting them out of a picture and then, you know, compositing them later. That's just one aspect of it. Imagine all the different aspects it could learn from all the thousands of sessions that you could learn. And this is just one application that we know that has a use case right now. Now, we can't even begin to imagine a future applications, a future use cases that we haven't even dreamt of or even think that we need. And there may be even new applications that have to come up just to manage the AIs in the first place. It's like an AI for the AI and it's like AI inception.

[00:27:35.097] Kent Bye: Yeah, the thing that you were referring to, I think, is the style transfer with the convolutional neural network where you basically give it the seed image and then it's able to do edge detection and basically transfer that. And there's actually an app called Prism app on iPhone right now that's essentially doing that same type of style transfer. That to me I think is, I see a lot of people using it and it's kind of creating this almost augmented reality type of, you take a picture and you're altering it. And so you're taking scenes from reality and augmenting it just like Snapchat in a lot of ways. But in terms of privacy, I'm not sure if they're going to be able to get away with collecting data in that way just because I mean, what you're saying theoretically is correct, the training, but, you know, in some ways we own our data, and I think there's a little bit of battle that's going to be starting to ensue in terms of, like, what all do we actually own in terms of these information that we're putting into these systems, especially when you start to talk about biometric data within VR, because it's going to be able to detect our emotional states and potentially, eventually, Take all the different biometric data and eye tracking and be able to look at what we're paying attention to and then with all these machine learning Algorithms be able to actually direct attention in some ways But I think that any of these things you get new capabilities and there's things that you're giving up you're like signing up to these services, you're signing a terms of service, and there's going to be things that people agree to do in VR, like, okay, you can track my heart rate, because it's going to be able to create this interaction within this narrative that it's going to be able to be responsive to my emotional state, and given that emotional state, then it's going to be able to change the character of how the progression of the arc of the story is unfolding based upon what I'm feeling.

[00:29:16.526] Alexander Mejia: Yeah, we're kind of thinking about the future of our technology when we go, okay, what do we see in five years? What do we see in ten years? I personally see a totally emergent storytelling system where we have a text-to-speech algorithm that actually gives you the human inflection, gives you the performance you want. that is good enough that it will trick human beings into think that they're talking to a real person. I mean, that's one of the reasons why we actually record actual human voices for our story because we needed that to get you to interact with it. So you've got that. We've got photogrammetry scans of people with animations that will look at, here's the speech inflection and everything. Okay, now here's the facial animations that I should bring. Here's the performance. You know, how many hours of footage do I need to record of a person talking? for machine learning to essentially assimilate that data and then say okay now we've got the Inflections of the person the human being right so we have this virtual humans voice that's being completely synthesized and now we have their emotions that are completely being synthesized and we've scanned them off of a person and Or maybe we're creating new people, like an artist is touching up and creating a new person. So now we've got a character in a story and it's also driven by AI, right? Like that character's ability to reach out to the internet and see what's happening in the news or what's happening in Wikipedia. Also, through all the thousands of sessions it's talked to other audience members, right? It'll start to pick up and learn. And everybody always says, well, what about Tay? Tay was corrupted in 24 hours on Twitter. How will you do that? And, you know, we'll set boundaries up as character artists. We actually created this new role inside of our company called a character artist, and they essentially set the boundaries. AI-ness of the character and like how they're going to react to you and what do they do when they get angry and what do they do when they are in a neutral state and like can you intimidate them? Can you do these things? So character designers are going to create a boundary to say, okay, this character lives inside this space. It can reach out and talk about those things, but its core belief system is this box that we've set and it won't change. And I think that's incredibly important for storytelling. And I think that's also going to be important for virtual assistants or things that a mass market try to connect to and talk about because as a brand or a corporation, you don't want to have that embarrassment. that Tay was, right? So when we think of storytelling in the future, all those things combined are going to allow us to have these crazy emergent narratives that are really, really, truly respondent to the audience. Right now, we say, Starship Commander, you fly through space, you get into trouble, you're doing missions, and that's fine. But what if the audience wanted it to be a romance? What if the audience wanted to leave the ship and go onto a planet, right? Like, all of these things that you want to do inside the world, and we need to be able to generate that content in a smart way, but be able to be responsive to everybody. And that's where we see machine learning really taking us in the future. We don't see ourselves at that point then making stories, even though we might set a story inside of it just to say like, okay, here is a story that could happen that, you know, we demo off and it demos off really well. But we see ourselves selling more environments. So you might look through and say, oh, I want to have a story in Egypt today. And so artists go through and create all those environments and create characters that make sense that live in that environment. The character designers set their boundaries that they will stay. It's like creating someone's values and morals that they will always believe in. And then allowing the audience just to run wild and have an emergent story.

[00:33:07.896] Kent Bye: Yeah, it reminds me of working in an office and understanding the office politics and the personalities and how to kind of interact with each of the different people in special ways. And I can kind of see how you're kind of creating the framework to be able to do that.

[00:33:19.841] Alexander Mejia: Well, I'm not quite sure if people are going to want to play an office politics emergent narrative story, unless you're like the CEO or the boss and you get to just be a meanie. I mean, it could be. I mean, maybe some people would want that, but you never know.

[00:33:35.547] Kent Bye: I'm not saying specifically that it's in an office but I'm just making the point that when you work within an office environment you have to understand how to interact with each of the different people and you may tune your message based upon who's receiving it. So just more understanding the social dynamics within group situations and kind of mimicking that within VR but in the context of a story.

[00:33:57.565] Alexander Mejia: Yeah, I agree with that, but maybe I'm wrong about the office simulator. I mean, people do like playing that if there's kind of a goofy play school vibe to it. I don't know. It's all about packaging and theme, right? And people just say, I want to have an experience there. I mean, people don't know when they travel the world. Like, I'm going to go visit Mexico. You don't know who you're going to run into. You don't know what you're going to do. You just know from the pictures on the brochure. It's like, I'm going to visit this pyramid, and I'm going to go buy silver from Tosco. And you go and you do those things, but there's so many things that happen in merchant, and that's why we enjoy vacation so much. I don't think the tourism industry has it wrong or is actually that much different from what we might be doing in the future. And, you know, it's possible that we might merge into the virtual tourism industry without even knowing it because the type of experiences people want, you know, are ones that are not extreme, ones that don't make you feel sick, you know, in VR. Like, that's something that we've attempted to do the best as we can in Starship Commanders, make sure that it's as comfortable of an experience as possible. And we've actually had a lot of really positive feedback saying, wow, I played that and I didn't get sick. Like, that shouldn't be the thing, right? Like, people should not be making experiences that make you sick unless, you know, you're making like a stomach sickness simulator. I don't know. Like, I think someone has tried that.

[00:35:21.911] Kent Bye: Great. And finally, what do you see as kind of the ultimate potential of virtual reality and what it might be able to enable?

[00:35:29.736] Alexander Mejia: Wow, that's a really, really long question. Well, I'm just going to toot my own horn and just say I think the potential of VR is because it does block off your vision, because we do control your senses. It's about giving a full sensory storytelling. It's a storytelling medium. You know, you can look back in time and say, OK, books, they came along. They created fictional stories. Those still sell well today. And then movies came along. And that was another storytelling medium. And it's still very successful today. Video games on consoles. You have the action-adventure game. It took them a while to get there, but they finally figured out a way to say, this is how you tell a story in a video game. To me, virtual reality is just another medium from which stories can occur. And all those other mediums I talked about, you know, games and storytelling are not the only thing that's there. But we really want to hone in and figure out and crack that nut, figure out what storytelling in VR should be. And we're pretty positive that talking to characters in the story is one aspect of it. You know, moving forward, there's all kinds of amazing things that will come out of this industry. I mean, if you look at how fast every technology has grown, how long it takes for it to get to maturity, that time keeps getting shorter and shorter and shorter. You know, how long will it take for VR storytelling to get to its maturity point where we say, okay, from here on out, we have a pretty good idea of what VR narratives are supposed to be. And that could be as early as five years. I mean, don't quote me on that. But like, you know, looking at how fast these technologies adopt and how fast people are developing and the fact that our information is shared at a rate that is way faster than we've ever had before. Like, I can make a breakthrough in Starship today right about it. And the whole world knows about that. And as people listen, as we talk to each other, we will learn how to create better stories together. And I would love it. I would love, love, love it if other people also made narratives and said, oh, here's our take on how we interpret you talking to the characters in the story. And someone's going to come in from another diverse background. They're going to do it in a completely different way than what we thought of it. And we're going to be like, oh, that is so amazing. We should have thought of that. And, you know, we just all learn from each other.

[00:37:55.200] Kent Bye: Awesome. Well, thank you so much. Thank you. So that was Alexander Mejia. He's the founder and creative director of Human Interact, which is creating a role-playing interactive drama that's driven by natural language input and artificial intelligence using Microsoft's cognitive services. So I have a number of different takeaways about this interview is that first of all, when you look at the cross section of artificial intelligence and virtual reality, I think it's going to be enabling all sorts of new conversational interfaces. And I think that this is one of the first games that I've seen that is really using the process of a conversation as one of the primary gameplay elements. Now, in terms of an emergent narrative, this seems like this is more along the lines of a choose-your-own-adventure with different branching points. And I haven't had a chance to play or try out the demo or play the full experience yet, but it sounds like there's going to be a number of different bounded branches that are going to be happening. The way that Alexander explained it is that there's a number of different decisions and that they amount to coming up into different branches at some point. So you're making a lot of small local agency interactions and then at some point there's going to be some branching that is going to determine different forks of the narrative. It sounds like there's going to be three different morally ambiguous characters that you're interacting with and that none of them you can fully trust. And so there's going to be a little bit of going back and forth and trying to cross-check each other's stories with these other characters, which I think is kind of an interesting way of trying to suss out each individual character's motivations. So on the scale of 0 to 10, it sounds like, you know, if someone has a zero scale in the complexity of their motivation, it can essentially be described into, like, one primary verb, which might be kill the enemy. Now, if someone's in a complexity of 10, then there's all sorts of morally ambiguous motivations and emotions that it might be a little bit hard to both understand but also predict the future behavior of that character. And so On the scale of 0 to 10, Alexander is saying that they're trying to create their characters at about a 7 on that scale. Now, at the time of the recording of this interview, there was a number of different major cloud services that are available, ranging from Google, IBM, Watson, and Microsoft. And Microsoft hadn't really announced a lot of their public offerings yet, but as a part of the launch of Microsoft's new custom speech services that were launched this week, Human Interact was a part of Microsoft Research's overall announcements. So there's a couple of new things that are coming out with Microsoft. They have this CRIS and LUIS. The C-R-I-S is the Custom Recognition Intelligence Service. It sounds like that's essentially the process of being able to take an audio stream into the cloud and translate that from speech into text. And then the LUIS, which is the Language Understanding Intelligence Service, that's the process of being able to have a wide range of different phrases that then are boiled down into what is essentially an intent. So if it's open the hatch, you may have like 15 different ways of saying open the hatch and there could be different verbs and different nouns that are able to trigger that and that's kind of the process of allowing the creator, which is the human interact to create a number of those different phrases and also intentions, and given each of those intention buckets, send that back into the game and then have those intentions be the primary driver of the interaction of the experience. So when I hear this, it makes me think of the game Facade, where there's different ways of interacting with natural language input. And in that way, that's being boiled down into whether or not you're showing affinity or not showing affinity to the two characters. And because these two characters are fighting and they're in conflict with each other, you're either showing affinity towards one or the other, or you could actually be neutral, which if you play the neutral game, doesn't actually get you to the win state. So you have to kind of choose between different sides and facade. Now in this game, it sounds like it's a little bit similar where you're kind of balancing between these three different relationships and having to kind of navigate trust between these three different characters. So this is a role-playing game, and so you do have the freedom to be able to get into the character and be able to infuse your personality into the experience, but it is going to ultimately come down to these different intentions and these different choice points. So Alexander said that they wanted to be able to have a live performance from an actor so you can see their face and see all the emotional delivery of the different lines. And because of that, they've had to create a very authored experience that has different branches, but it's more like a choose-your-own-adventure rather than a completely emergent conversation. But I think in the long term, in the future, he sees that eventually we may be able to get to more of an emergent conversation where you're getting into a little bit more open-ended role-playing with these different characters. So some of the technological barriers for being able to do that right now, though, is to be able to take text and to convert it into speech that has the inflection of a human. And since this interview was recorded in August, just a month later in September, Google had announced WaveNet, which is this really complicated generative model for creating raw audio that has really convincing sounding inflections of human. Now, one thing that I don't think is possible yet is to be able to add a layer of emotional performance within these different deliveries of the lines. It's still kind of a flat and static delivery. And so to really get the emotional tone, I think we're still going to be using actors for a long, long time. But what that means is that on the spectrum of authored versus emergent, it's going to be more weighted towards the authored side. And it's going to be a while before we get to the completely emergent experiences where you're able to have conversations with an AI. And it just feels like you have complete local and global agency at any moment so that any of your small actions that you have in any moment in the conversation could completely change the course of the entire experience. So, I think that the ultimate expression of that is Dungeons and Dragons, and I talked to Chris Perkins, who is a longtime Dungeon Master at Wizards of the Coast, in episode 441, and he talks about D&D being a theater of the mind where any of the role-player actors could do anything that they want at any moment. And there is a little bit of bounding of the fate. So if you think about what Microsoft is doing in terms of being able to convert a string of natural language text and to be able to determine the intent of that, that's what Microsoft is using their Lua system for. A Dungeon Master essentially has to do the same thing and to be able to convert an intention down into whether or not the Dungeon Master is going to just let you do that, or if you want to leave it up to the fates, then he does this translation of your intention into one of the different types of skills and abilities that your character may have. And so he may say, okay, I want you to roll for an attack. And then if the attack is successful, then you roll for damage. So there's all these different things that you roll for. And those whole taxonomy of lists is kind of like the boiling down whatever you're saying into some sort of specific intention. And so Dungeons & Dragons I'd say has really created this systematized way of translating natural language input into intent and creating a system to allow both constraints of fate when you roll the dice or the discretion of the dungeon master to allow you to let you do that. So that is just an example of how to take the unbounded amount of human imagination and to start to boil it down into discrete actions and intentions and behaviors and whether or not you're able to carry through the wishes of your imagination or not. And I think that that's the overall trajectory of where VR is going is these types of role-playing games where you're giving these open-ended natural language input and then you have to translate that and convert it into an intention and then that feeds into the overall gameplay. So I really do see that the future of interactive narrative and virtual reality is going to have a lot of artificial intelligent driven technologies, which is part of the reason why I've started to record a number of different voices of AI podcasts. I've done about 90 up to this point. still in the process of finishing up my Ultimate Potential VR book. But after that, I want to launch into the voices of AI because I've had a chance to talk to a lot of these cutting-edge AI and interactive narrative researchers and just the overall field of AI and how it's going to start to infuse into virtual reality a lot more, I think, in 2017. So, that's all that I have for today. I just wanted to thank you for listening to the Voices of VR podcast. And if you enjoy the podcast, then please do tell your friends, spread the word, and become a donor. Just a few dollars a month makes a huge difference. So, donate today at patreon.com slash voicesofvr. Thanks for listening.

More from this show