Jessica Brillhart is the principle filmmaker for virtual reality at Google, and she been exploring the cross section of artificial intelligence and storytelling in VR. I had a chance to catch up with her at Sundance again this year where we did a deep dive into my Elemental Theory of Presence that correlates the four elements with four different types of presence, including embodied (earth), active (fire), mental & social (air), and emotional (water) presence.
Artificial intelligence will enable VR experiences to more fully listen and respond to you within an experience, and it will be the vital technology that will bridge the gap between a story of a film and the interaction of a game. I expand upon my discussion with Brillhart in an essay below exploring the differences between 360 video and fully-immersive and interactive VR through the lens of an Elemental Theory of Presence, and make some comments about the future of AI-driven interactive narratives in virtual reality.
LISTEN TO THE VOICES OF VR PODCAST
AN ELEMENTAL THEORY OF PRESENCE
Many VR researchers cite Mel Slater’s theory of presence as being one of the authoritative academic theories of presence. Richard Skarbez gave me an eloquent explanation of the two main components of presence being the “place illusion” and “plausibility illusion,” but I was discovering more nuances in the types of presence after experiencing hundreds of contemporary consumer VR experiences.
The level of social presence in VR was something that I felt was powerful and distinct enough, but yet not fully encapsulated in place illusion or plausibility illusion. I got a chance to ask presence researchers like Anthony Steed and Andrew Robb about how they reconciled social presence with Slater’s theory. This led me to believe that social presence was just one smaller dimension of what makes an experience plausible, and I felt like there were other distinct dimensions of plausibility as well.
I turned to the four elements of Natural Philosophy of earth, fire, air, and water for a philosophical framework and inspiration in describing different levels of plausibility in VR. I came up with four different types of elemental presence including embodied, active, social & mental, and emotional presence that I first started to talk about in a comprehensive way in my last interview with Owlchemy Labs’ Alex Schwartz.
The earth element is about embodied presence where you feel like your body has been transported into another realm and that it’s your body that’s there. The fire element is about your active and willful presence and how you’re able to express your agency and will in an interactive way. The air element is about words and ideas and so it’s about the mental & cognitive presence of stimulating your mind, but it’s also about communicating with other people and cultivating a sense of social presence. Finally, the water element is about emotional engagement, and so it’s about the amount of emotional presence that an experience generates for you.
After sharing my Elemental Theory with Skarbez, he pointed me to Dustin Chertoff’s research in experiential design where he co-wrote a paper titled “Virtual Experience Test: A Virtual Environment Evaluation Questionnaire.” The paper is a “survey instrument used to measure holistic virtual environment experiences based upon the five dimensions of experiential design: sensory, cognitive, affective, active, and relational.”
Chertoff’s five levels of experiential design can be mapped to the four levels of my Elemental Theory of Presence where earth is sensory (embodied), fire is active (active), air is both cognitive (mental) and relational (social), and water is affective (emotional).
I started to share the Natural Philosophy origins of my Elemental Theory of Presence with dozens of different VR creators, and they found it to be a useful metaphor and mnemonic device for describing the qualitative elements of an experience. I would argue that the more that a VR experience is able to achieve these four different levels of presence, then it’s going to feel like more of a direct lived experience that mimics what any other “erlebnis” experience feels like in reality.
Achieving the state of presence is an internal subjective experience, and so it’s going to be different for every person. But I believe that this Elemental Theory of Presence can help us understand a lot about virtual reality including being able to describe the different qualitative dimensions of an individual VR experience, describe the differences between mobile VR/360 video and room-scale VR, help elucidate the unique affordances of VR as a storytelling medium, and provide some insight for how AI will play a part in the future of VR narratives.
360 VIDEO CONSTRAINS EMBODIMENT & AGENCY
Brillhart begins her Filmmaker Magazine article about VR storytelling with a quote from Dziga Vertov about how the film camera could be thought of as a disembodied mechanical eye that can “show you a world the way only I can see it.” She says that “VR isn’t a disembodied medium at all. It’s quite the opposite, because its whole end-goal is embodiment.”
Watching a VR experience and being able to look around a 360-degree space starts to more closely mimic the experience of being in a specific place, and it takes away the control of the creator of being able to focus attention on specific things. From a storytelling perspective it means that “what I have to be as a VR creator is a story enabler, not the story dictator.”
Film and 360 video at this point has limited amounts of embodied presence and active presence. Because you can’t fully move your body around and there’s not a plausible way to interact or express your agency within the experience, then we could say that the earth and fire elements are constrained. You can still turn around your head, which mimics what it feels like to be standing still and looking left, right, up, or down without leaning too much, and you can express your agency by choosing what to look at and pay attention to. But it’s difficult to achieve the full potential of embodied and active presence given the current 3DOF tracking constraints and limited interactivity with live captured footage.
Having three degrees-of-freedom in mobile VR headset at this matches that capabilities of 360-video, but anyone who has experienced full-room scale VR with 6DOF hand tracking knows that the sense of embodiment is drastically increased. If you have good enough hand and elbow-tracking and inverse kinematics, then it’s possible to invoke the virtual body ownership illusion where you start to identify your virtual body as your own body.
Adding in the feet gives you even more of a deep sense of embodied presence, and haptic feedback is also once of the fastest ways of invoking the virtual body ownership illusion. My experience with The VOID still stands as the deepest sense of embodied presence I’ve experienced because I was able to explore around a space unteathered forever because my mind was tricked by the process of redirected walking. I also was getting passive haptic feedback every time I reached out to touch a wall. Moving around a room-scale environment can partially mimic this feeling, but the level of embodied presence is taken to the next level when you remove the wire tether and allow intuitive, beyond room-scale movements without any presence breaking chaperone boundaries trying to keep you safe.
The fire element is also constrained in 360 video. You are able to look anywhere that you want to across all levels of 3DOF and 6DOF virtual reality, but mobile VR limits the full expression of your agency. Without having natural and intuitive movement that comes tracking your hands and body on all six degrees of freedom, then any expression of agency is going to be abstracted through buttons on a gamepad, gaze detection triggers, or the trackpad on a Gear VR. These abstracted expressions of agency can only take your level of active & willful presence so far, because at a primal brain level I believe that active presence is cultivated through eliminating abstractions.
This means that 360 videos are not able to really cultivate the same depth of presence that a fully volumetric, interactive experience with 6DOF tracking in a room-scale environment is able to. This is the crux for why some hardcore VR enthusiasts insist that 360 video isn’t VR, and it’s also why 360 video will be trending towards positionally-tracked volumetric video whether it’s stitched together with photogrammetry techniques like 8i or HypeVR, using depth sensors like DepthKit or Mimesys, or using digital light field cameras from companies like Lytro.
I believe that the trend towards live action capture with volumetric video or digital lightfields will increase the feeling of embodied presence, but yet I have doubts that it will be able to achieve a satisfying level of a active and willful presence. Without having the ability to fully participate within a scene, then it’s going to be difficult for any live-action captured VR to be able to create a plausible sense of presence for the fire element. It’ll certainly enable “story dictators” to have complete control over the authored narrative that’s being presented, but any level of interactivity and expression of active and willful presence will be constrained.
Conversational interfaces with dynamically branching pre-recorded performances will perhaps offer way for you to express your agency within an experience. There are some narrative experiences starting to explore interaction with pre-filmed performances like Kevin Cornish’s Believe VR, which is triggered by gaze detection as well as Human Interact’s Starship Commander, which is triggered by natural language input (more on this down below. But the dominant mindset for most narrative storytellers coming from the film world is to not provide any level of interactivity to their authored stories.
360 VIDEO AMPLIFIES MENTAL & EMOTIONAL PRESENCE
Whenever you reduce the capability of one dimension of presence, then you can amplify the other elements. If mobile VR and 360 video has constrained embodied and active presence, then it can actually cultivate a deeper sense of mental/social and emotional presence. There’s a reason why the major empathy VR pieces have been 360 videos, and I’d argue that social VR experiences with constrained movement like BigScreen VR can actually provide a deeper sense of social presence with deeper and longer conversations.
360 video can also capture microexpressions and raw emotions in a much more visceral way. Our brains have evolved to be able to discern so much information from a human face, and so 360 video has a huge role to play in capturing human faces within a documentary or memory capture context. Live performances from actors can also be extremely powerful in the VR medium, and there is something that can be lost when it’s converted into an avatar.
The uncanny valley ends up driving avatars towards stylization. This has a double edged sword for 360 video. One the one hand, live capture video can capture a transmission of raw emotional presence when you have full access to someone’s facial expressions, body language, and eye contact. On the other hand, the uncanny valley is all about expectations, which means that 360 video almost always violates the fidelity contract for presence. When you get a photorealistic visual signal from VR, but it’s not matched by the audio, haptics, smell, and touch, then your brain will send a presence breaker signal to your primal brain that keeps you from feeling fully present. So CGI experiences can create a surrealistic world that transcends your expectations, and therefore can actually cultivate a deeper sense of presence.
That said, there’s still so many compelling use cases for 360 video and volumetric capture that I’m confident that it’s not going to go away, but there are clearly enough downsides to the level of presence that you can achieve with 360 video given it’s constraints. But I’d still argue against anyone who tries to argue that 360 video is not VR, especially once you understand how the power of embodied cognition can be triggered whether it’s in a 360 video or fully volumetric VR experience.
There are also a lot of storytelling advantages in having limited embodiment and agency that can amplify the sense of emotional, social, and mental presence in an experience. It will get cheaper and easier for anyone to create a 360-video experience, and emerging grammar and language of storytelling in VR is continue to evolve. So I see that there’s a healthy content ecosystem for 360 video that will continue to evolve.
The level of social interactions that you can have on a 3DOF mobile VR headset is also surprisingly adequate. There is still a large gap of body language expressiveness when you’re not able to do hand gestures and body movements, but there’s still quite a lot of body language fidelity that you can transmit with the combination of your head gaze and voice. This gap will also be closed as soon as 6DOF head and hand tracking eventually comes to mobile VR as soon as 2017 or 2018.
STORYTELLING IN VR IS ABOUT GIVING & RECEIVING
When it comes to narrative and storytelling in VR, there’s a continuum between a passive film and an interactive game where I feel that a virtual reality experience sits in the middle of this spectrum. I’d say that films and storytelling are more about mental & emotional engagement while games are more about active, social, and embodied engagement. Storytelling tends to work best when you’re in a receiving mode, and games work best when your exerting your will through the embodiment of a virtual character.
From a Chinese philosophy perspective, films are more about the yin principle and games are more about the yang principle. The challenge of interactive storytelling is to be able to balance the yin and the yang principles to provide a plausible experience of giving and receiving. At this point, games resort to doing explicit context switches that move into discrete story-mode cinematics to tell the story or have explicit gameplay and puzzles that are are designed for you to express your agency and will. This is because interactive games haven’t been able to really listen to you as a participant in any meaningful way, but that will start changing with the introduction of artificial intelligence and machine learning technologies.
ARTIFICIAL INTELLIGENCE WILL ENABLE EXPERIENCES LISTEN TO YOU
Machine Learning has shown amazing promise in computer vision and natural language processing, which means that games will soon be able to understand more about you through watching your movements and listening to what you say. When you’re embodied within a 6DOF VR experience, you are expressing subtle body language cues that machine learning will eventually be able to be trained upon. As I covered on my episode on privacy in VR, Oculus is already recording and storing the physical movements you make in VR, which will enable them to train machine learning neural nets to potentially identify body language cues.
Right now most machine learning neural networks are trained with supervised learning, which would require a body language expert human to watch different movements and be able to classify them into different objective categories. Body language experts are already using codifying body language behaviors within NPCs to create more convincing social interactions, and it’s a matter of time before AI-driven NPCs will be able to identify the same types of non-verbal cues.
When you speak to AI characters in VR experiences, then natural language processing AI will be able to translate your words into different discrete buckets of intent, which can then trigger behaviors in a much more interactive and plausible way than ever before. This is the approach that Human Interact’s Starship Commander is taking. They announced today that they’re using natural language input with Microsoft’s new Language Understanding Intelligent Service (LUIS) and Custom Recognition Intelligent Service (CRIS) that’s a part of their Cognitive Service APIs. Starship Commander’s primary gameplay mechanic is natural language input as you play through their interactive narrative giving verbal commands to the Hal-like computer. I have an interview with Human Interact’s Alexander Mejia that will air in my next episode, but here’s a trailer for their experience.
I believe that with the help of AI, a VR storytelling experience is what is going to sit in the middle between a yin-biased film and yang-biased game. What makes something an “experience”? I’d say that an experience is any time that we feel like we’re crossing the threshold to achieve a deep level of presence in any of the four dimensions whether it’s embodied, active, mental/social, or emotional presence. If it’s hitting all four levels, then it’s more likely to reach that level of direct experience with a deep sense of presence. With the embodiment and natural expression of agency that is provided by VR, then the virtual reality medium is uniquely suited to be able to push the limits of what embodied storytelling is able to achieve.
WHAT THE FUTURE OF AI-DRIVEN NARRATIVES WILL LOOK LIKE
There’s a couple of AI-driven games that I think show some of the foundational principles of where VR games will be going in the future.
Sleep No More is an immersive theater experience where the characters are running through a hundred different warehouse rooms interacting with each other through interpretive dance in order to tell the story of Shakespeare’s Macbeth. As it stands now, the audience is a passive ghost, and you can’t really directly interact of engage with the narrative at all. You can decide what room to go into and which actors to follow or watch, and it’s a looping narrative so that you have a chance to see the end of a scene and later see the beginning.
Imagine what it would be like if the characters were able to listen to you, and you’d be able to actually engage and interact with them. The characters would have the freedom to ignore you just as in real life a stranger may blow you off if they had something important to do, but there could be other ways that you could change the outcome of the story.
One AI-driven simulation game that explores the idea of changing the fate of a tragic story, in Elsinore you have the ability to change the outcome of Shakespare’s Hamlet. You play the minor female character named Elsinore whose movements and interactions would have been plausibly been ignored within the context of the original story. You can go around and try to intervene and stop the set of actions that leads to a number of different murders.
Elsinore is a time-looping game that uses some sophisticated AI constraints and planning algorithms that determines the fate of the story based upon each of your incremental interventions. The story dynamically changes spins off into alternative branching Hamlet fan fiction timelines based upon your successful interventions of multiple tragedies. You have four iterations of the course of a single day Groundhog’s Day style, and you prevent the multiple murders through social engineering. Once natural language input and VR embodiment are added in an experience like this, then this type of emergent storytelling / live theater genre is going to be perfectly well-suited for the virtual reality medium.
Bad News is an amazing AI-driven game that I had the privilege of playing at the Artificial Intelligence and Interactive Digital Entertainment conference last October. There’s a deep simulation that creates over 100 years of history in a small imaginary town complete with characters, families, relationships, work history, residencies, and a social graph. You’re thrown into this world at the scene of a death of a character and it’s your job to notify the next of kin of the death, but you don’t know anyone in the town and you can’t tell anyone why you’re looking for the next of kin. You can only notify the next of kin of the death otherwise you lose the game. Your job is to explore a this imaginary world talking to residents trying to find the family members of the deceased.
You have two interfaces with this simulated Bad News world. One is an iPad telling you locations, addresses, and then descriptions of people who are in each location that you go to. After you pick a location and choose someone to talk with, then you explore this world though conversations with an improv actor. These improvised conversations are your primary interface with this imaginary world, and you have to use your detective skills to find the right person and your social engineering skills to come up with a convincing cover story without being too noisy or suspicious.
The improv actor has a list of information about each character that he’s embodying so that he can accurately represent the personality and openness as determined by the deep simulation. He also has access to a Wizard of Oz orchestrator who can query the database of the deep simulation asking about information on the location of other town residents. Because this world is based upon an actually deep simulation that’s running live, then the interactor has find a moving target and so the Wizard of Oz can provide subtle hints to keep the run time of the game at around one hour.
It was an amazingly rich experience to engage in a series of conversations with strangers played by the improv actor in this imaginary town. You’re fishing for information and clues as to who is related to the person who has passed and where to find them while trying to maintain your cover story. Bad News encourages the type of narrative-driven, role-playing interactions that only a human can do at this point, but this type of conversational interface is what AI-driven natural language processing is going to enable in the future. This is the trajectory of where storytelling and gaming and in virtual reality is headed, and this possible future is freaking out traditional storytellers who like to maintain control.
THE BATTLE BETWEEN AUTHORED & EMERGENT STORIES
There are some people from the Hollywood ecosystem who see the combination of gaming and film as “dystopian.” Variety Fair’s Nick Bilton says, “There are other, more dystopian theories, which predict that film and video games will merge, and we will become actors in a movie, reading lines or being told to “look out!” as an exploding car comes hurtling in our direction, not too dissimilar from Mildred Montag’s evening rituals in Fahrenheit 451.”
This vision of films and games merging is already happening, and I certainly wouldn’t call it “dystopian.” Why is there so much fear about this combination? It think some storytellers see it as dystopian because the types of open world, sandbox experiences have not had a very strong or compelling narratives integrated into them. Being able to express the full potential of your agency often completely disrupts the time-tested formula of an authored narrative.
Hollywood storytellers want to have complete control over the timing of the narrative that’s unfolding. Hollywood writer and Baobab Studios co-founder Eric Darnell says that storytelling is a time-based art form where there’s a series of chemicals are actually released within our bodies that follow the dramatic arc of a story. He says that you can’t let people interact and engage forever, and that you have to keep the story moving forward otherwise you can lose the emotional momentum of a story.
Darnell was actually very skeptical of VR’s power of interactivity when I spoke to him in January 2016 where said that there’s a tradeoff between empathy and interactivity. The more that your engaging with an experience, then it becomes mostly about finding the limits of your agency and control. Stories are about receiving and if you’re completely focused on transmitting your will into the experience, then you’re not open to listening or receiving that narrative.
This is what Darnell believed last year, but over the past year he’s been humbled by observing the power and drive of VR to off the possibility of more interactivity. He watching so people who wanted to more directly interact with their bunny character in Invasion!, and he kept hearing that feedback. So at Sundance 2017, Darnell explored how empathy could be combined with interactivity to facilitate compassionate acts in Asteroids!.
He makes you a low-level sidekick, and you get to watch the protagonists play out their story independent of anything that you do. But I wouldn’t classify Asteroids! as a successful interactive narrative, because your agency is constrained to point where can’t really cultivate a meaningful sense of willful presence. Your interactions amount to local agency without any meaningful global impact on the story. There’s still a powerful story dictator with a very specific set of story beats that will unfold independent of your actions. While there are some interesting emotional branching relationships that are explored that give variation for how the characters relate to you based upon your decisions, you’re still ultimately a sidekick to the protagonists whose stories are fated beyond your control. This made me feel like a ghost trapped within an environment where it didn’t matter if I was embodied in the story or not.
Having explicit context switches with artificially constrained agency makes breaks my level of active presence. One of the keys to Job Simulator’s success of grossing over 3 million dollars is that they wanted to make everything completely interactive and dynamic. They applied this high-agency engine to stories with Rick and Morty’s Simulator, and they allow you to interrupt and intervene within a story at any moment. If you throw a shoe in the face of the main character, then he will react to that and then move on with the story. It’s this commitment to interruption that could be one key towards achieving a deep sense of active and willful presence.
But most storytellers are not taking a high-agency, inspired approach to narrative like Owlchemy Labs. The leadership of Oculus Story Studio has a strong bias towards tightly-controlled narratives with ghost-like characters without much meaningful agency. In my conversation with the three of the Oculus Story Studio co-founders they expressed their preference towards the time-based telling of an authored narrative. Saschka Unseld went as far as to say that if you have a branching narrative, then that’s an indication that the creator doesn’t really know what they want to say. Oculus Story Studio is exploring interactivity in their next piece, but given their preference for a strong authored story, then any interactivity is likely going to be some lightweight local agency that doesn’t change the outcome of the story.
There is an undeniable magic to a well told story, and Pearl and Oculus Story Studio’s latest Dear Angelica are the cream of the crop of the potential of narrative virtual reality. But without the meaningful ability to express your agency, then these types of tightly-controlled, authored narratives are destined to maintain the ghost-like, status quo of constrained active presence that Oculus coined as The Swayze Effect.”
FUTURE OF AI-DRIVEN INTERACTIVE NARRATIVES
In my interview with Façade co-creator Andrew Stern he said that there really hasn’t been a video game experience since 2005 that has provided a player with meaningful local and global agency where your small, highly dynamic action in every moment could dramatically alter the outcome of the story. In Façade, you’re at a dinner party with a husband and wife who are fighting, and you use natural language input to interact with them. The AI determines if you’re showing affinity towards the husband or wife, and a backend algorithm keeps track of your allegiances as you try to balance your relationships to push each character towards revealing a deep truth.
Here is Stern explaining his vision of the future of interactive drama driven by artificial intelligence:
There’s a spectrum of authored vs emergent story, and Façade uses a drama manager to manage input from the but also maintain the dramatic arc of the story. If you read their “Behind the Façade” Guide, then it reads more like a computer program than a film script, but the most authoritative blue-print for how to architect an interactive narrative that is fully listening to player and providing meaningful global agency. Here’s a visualization of Façade’s drama manager that balances user input with the unfolding of a traditional three-act dramatic structure of the overall story.
Human Interact’s Starship Commander is an important step towards allowing a natural language integration within a VR experience, but it’s still a highly authored experience given that they really provide performances from actors. Façade also features recorded performances, and so this hybrid approach provides a limit on the extent of how much true emergent behavior that you can achieve.
Looking to the future, Stern’s Playabl.ai is focusing on creating AI-driven interactive characters where trust and rapport can be built up over time. They’re leveraging research for modeling human behavior from a DARPA-funded AI programs called IMMERSE, and they’re hoping that interactions with these types of AI characters could start to mimic what it feels like to have an emergent conversation.
Stern is collaborating with his Façade co-creator Michael Mateas, who founded the Expressive Intelligence Studio at University of California, Santa Cruz. Mateas and his students are creating some of the most cutting-edge AI-driven, interactive narratives out there (I have an interview with Mateas and a number of his students that will be released with the upcoming Voices of AI podcast, including the creators of Bad News).
AI is going to be a critical part of the future of interactive narrative in VR, and 2017 promises to have many of the advances of machine learning start to be made available through cloud-based AI services from Microsoft, Google, and IBM. It’s already starting with natural language processing being integrating, but the ultimate affordances of AI will go much deeper.
THE FUTURE OF AI & VR IS “UNCOMFORTABLY EXCITING”
At the end of Brillhart’s Filmmaker Magazine article, she says, “Blaise Agüera y Arcas, is a principal scientist working on artificial intelligence at Google. He had this to say about the current state of AI, which I think also describes the current state of VR better than anything else I’ve heard: ‘We live in uncomfortably exciting times.'”
There’s so much potential of VR and AI that it’s hard to predict where it’ll eventually all go, but there are some early indications of some initial cross section will be. Brillhart talks about trippy Google Deepdream VR experiments that she’s done that starts to create such an unexpected experience that it starts to feel like you’re on a psychedelic trip.
Style transfer another area that is likely to be another early win for VR. This is where the primary features of artist’s style can be extracted and applied to new images. This is likely to start to be used in procedurally-generated textures and eventually 3D models within VR. Brillhart imagines that auteurs will train neural nets on image sets in order to use AI as a creative collaborator.
In terms of what else will be on the horizon, we can look at the principles of how AI is able to objectify subjective judgments and eventually be able to make the types of qualitative decisions that we ascribe to intelligent beings. So AI will have the ability to quantify the qualitative aspects of life, and then express it back to us within an immersive environment.
Neural nets have the ability to come up with averages of qualitative data like faces or create “interlingua” intermediary translation languages that amount to every language, but no language at the same time. Eventually, companies like Facebook, Google, or Twitter may be able to translate their vast repository of Big Data to create composite, AI-driven, chatbot NPCs that are embodied within a VR experience. We may be able to interact with a comprehensive representation of the beliefs and thinking of an average 25-year old white male or a mid-40s black lesbian. This will be a long-process and there are tons of open questions around creating a representative and unbiased set of training data to achieve this, but Brillhart think that it represents a future potential of where AI could go.
AI will be also involved in the procedural generation of both environmental and narrative content. Ross Goodwin explored using AI to automatically write a surrealist sci-fi movie script that they produced as a part of a 48-hour film competition. The script for Sunspring was automatically generated by feeding a Long Short-Term Memory recurrent neural network dozens of sci-fi scripts, and then the AI generated dialog and staging notes that the film crew used to produce the piece.
The resulting dialog is largely non-sensical and filled with confusion, but what makes Sunspring so compelling is the actors who are able to take the semantically-correct but otherwise jibberish dialog and imbue it with meaning through their acting and staging. You can feel what they mean with their body language and emotions but the actual words are meaningless, which gives it an other-worldly feel.
At a baseline, AI will provide creative constraints and inspiration for human interactions and collaborators that will allow for more freedom in role-playing and perhaps eventually provide the arc of a satisfying story. The ultimate realization of this type of collaborative storytelling is done by skilled Dungeons and Dragons dungeon masters who are able to guide the arc of an adventure, but at the same time allow each of the participants to do anything that they want at any time. The “theater of the mind” is currently the only medium that fully realizes the full potential of the human imagination.
Each participant has a fragmented picture of a scene in their minds, and the reality is dynamically and collaboratively constructed through each question asked and each line spoken. There’s a set of rules and constraints that allow the dungeon master or the roll of the dice to serve of the fates of these intentions, but any visualization of a virtual space by a VR artist or developer could serve as a limitation to the collective will imagination of the DnD characters who want to be in full control of their destiny.
Will AI be able to eventually serve as the dungeon master’s dual role as master storyteller and group facilitator? I asked long-time dungeon master Chris Perkins, and he was skeptical that AI and VR will be able to achieve the same level of emergent story that a DM can facilitate any time soon. But yet he’s also convinced that it will eventually happen and that it’s almost inevitable when looking at the overall trajectory of storytelling and technology. He says that life-long friendships are forged through playing DnD, and so the collaborative storytelling experiences that are created are so powerful that’s there’s enough intrinsic motivation for people to solve the technological roadblocks that will enable this type of collaboratively emergent form of storytelling. Currently Mindshow VR is one the leading edge of creating this type of collaborative storytelling platform.
VR Chat is empowering the most sophisticated level of collaborative storytelling with their open world explorations of interconnected metaverse worlds. But the stories that are being told here are largely generated from interpersonal relationships and the communication that happens with other people you happen to be going through the experience with you. Having meaning be co-created and constructed by a group of people shows that collaborative and social storytelling experiences are something that are unique to the VR medium.
But there’s still a lot of power in being able to use an environment to tell a story.
Rand Miller’s Obduction and The Gallery: Call of the Starseed are two stand-out examples of environmental storytelling in VR, but there’s still a lot of room for where social and environmental storytelling will go in the future.
Right now a lot of the open metaverse worlds are largely empty aside from what are clearly non-verbal NPC characters aimlessly roaming around or the occasional other person, but they have the potential to be filled with either live immersive theater actors or AI-driven characters to create what Charlie Melcher calls “Living Stories.” Living stories engage the participation of everyone involved, and they feel like an emergent construction of meaning, but have a clear narrative trajectory led by the storyteller. Alec McDowell says that we’re moving back to the oral history storytelling traditions, and there are unique ways that VR can overcome the vulnerability of the first-person perspective and emphasize the importance of many different perspectives with different positions of power and privilege.
Having AI chatbots within an open worlds will be the next logical step, but Rand Miller says that architecting a non-linear story through an environment is not an easy task. Otherwise, there would be a lot more people doing it. Stern says that part of the “>open world storytelling dilemma is that there’s a tradeoff in being able to explore an open world without limits and still be able to communicate a story with an engaging dramatic arc without it feeling fractured and unsatisfying.
This is where emergent conversations with AI characters powered by global drama manager starts to come in. Stern envisions a future of interactive drama where you can have complete and meaningful expression of your agency through conversational interfaces with AI where there is giving and receiving with meaningful participation and active listening. If there is a drama manager in the background that’s planning the logistics system driving a simulation, then the future of AI and VR has a lot of promise of not just being able to watch and observe a meaningful story, but to fully participate in the co-creation of a living story, which is a process that Stern prefers to call “story making.”