Within premiered their first real-time rendered, interactive experience at Sundance New Frontier this year with Life of Us, which is the story of life on the planet as told through embodying a series of characters who are evolving into humans. The experience is somewhere betweeen a film and game, but it’s more like a theme park ride. There’s an on-rails narrative story being told, but there’s also opportunities to throw objects, swim or fly around, control a fire-breathing dragon, and interact with another person who has joined you on the experience. You learn about which new character you’re embodying by watching the other person embody that creature with you, and the modulation of your voice also changes with each new character deepening your sense of embodiment and presence.
I had a chance to catch up with Within CTO and co-founder Aaron Koblin at Sundance to talk about their design process, overcoming the uncanny valley of voice modulation delays, how the environment is primary feature of VR experiences, and how their background in large-scale museum installations inspires their work in virtual reality.
Koblin also talks quite a bit about finding that balance between the storytelling of a film and interaction of a game, and how Life of Us is their first serious investigation into that hybrid form that VR provides. He compares this type of VR storytelling to the experience of going to a baseball game with a friend in that this type of sports experience is amplified by the shared stories that are told by your friends. This is similar to collaborative storytelling of group explorations of VRChat, but with an environment that is a lot more opinionated in how it tells a story.
LISTEN TO THE VOICES OF VR PODCAST
Life of Us is a compelling way to connect and get to know someone. The structure of the story is open enough to allow each individual to explore and express themselves, but it also gives a more satisfying narrative arc than a completely open world that can have a fractured story. Life of Us has a deeper message about our relationship to each other and the environment that it’s asking us to contemplate. Overall, Koblin says that our relationships with each other essentially amount to the sum total of our shared experiences, and so Within sees an opportunity to create the types of social & narrative-driven, embodied stories that we can go through to connect and express our humanity to each other.
Here’s a trailer for Life of Us.
The Life of Us experience should be released sometime in 2017, and you can find more information about Within website (which links to all of their platform-specific apps), or their newly launched WebVR portal at VR.With.in.
I caught up with Human Interact founder and creative director Alexander Mejia six months ago to talk about the early stages of creating an interactive narrative using a cloud-based and machine learning powered natural language processing engine. We talk about the mechanics of using conversational interfaces as a gameplay element, accounting for gender, racial, and regional dialects, the funneling structure of accumulating a series of smaller decisions into larger fork in the story, the dynamics between multiple morally ambiguous characters, and the role of a character artist who sets bounds of AI and their personality, core belief system, a complex set of motivations.
LISTEN TO THE VOICES OF VR PODCAST
Here’s a Trailer for Starship Commander
Here’s Human Interact’s Developer Story as Told by Microsoft Research
Jessica Brillhart is the principle filmmaker for virtual reality at Google, and she been exploring the cross section of artificial intelligence and storytelling in VR. I had a chance to catch up with her at Sundance again this year where we did a deep dive into my Elemental Theory of Presence that correlates the four elements with four different types of presence, including embodied (earth), active (fire), mental & social (air), and emotional (water) presence.
Artificial intelligence will enable VR experiences to more fully listen and respond to you within an experience, and it will be the vital technology that will bridge the gap between a story of a film and the interaction of a game. I expand upon my discussion with Brillhart in an essay below exploring the differences between 360 video and fully-immersive and interactive VR through the lens of an Elemental Theory of Presence, and make some comments about the future of AI-driven interactive narratives in virtual reality.
LISTEN TO THE VOICES OF VR PODCAST
AN ELEMENTAL THEORY OF PRESENCE
Many VR researchers cite Mel Slater’s theory of presence as being one of the authoritative academic theories of presence. Richard Skarbez gave me an eloquent explanation of the two main components of presence being the “place illusion” and “plausibility illusion,” but I was discovering more nuances in the types of presence after experiencing hundreds of contemporary consumer VR experiences.
The level of social presence in VR was something that I felt was powerful and distinct enough, but yet not fully encapsulated in place illusion or plausibility illusion. I got a chance to ask presence researchers like Anthony Steed and Andrew Robb about how they reconciled social presence with Slater’s theory. This led me to believe that social presence was just one smaller dimension of what makes an experience plausible, and I felt like there were other distinct dimensions of plausibility as well.
I turned to the four elements of Natural Philosophy of earth, fire, air, and water for a philosophical framework and inspiration in describing different levels of plausibility in VR. I came up with four different types of elemental presence including embodied, active, social & mental, and emotional presence that I first started to talk about in a comprehensive way in my last interview with Owlchemy Labs’ Alex Schwartz.
The earth element is about embodied presence where you feel like your body has been transported into another realm and that it’s your body that’s there. The fire element is about your active and willful presence and how you’re able to express your agency and will in an interactive way. The air element is about words and ideas and so it’s about the mental & cognitive presence of stimulating your mind, but it’s also about communicating with other people and cultivating a sense of social presence. Finally, the water element is about emotional engagement, and so it’s about the amount of emotional presence that an experience generates for you.
After sharing my Elemental Theory with Skarbez, he pointed me to Dustin Chertoff’s research in experiential design where he co-wrote a paper titled “Virtual Experience Test: A Virtual Environment Evaluation Questionnaire.” The paper is a “survey instrument used to measure holistic virtual environment experiences based upon the five dimensions of experiential design: sensory, cognitive, affective, active, and relational.”
Chertoff’s five levels of experiential design can be mapped to the four levels of my Elemental Theory of Presence where earth is sensory (embodied), fire is active (active), air is both cognitive (mental) and relational (social), and water is affective (emotional).
I started to share the Natural Philosophy origins of my Elemental Theory of Presence with dozens of different VR creators, and they found it to be a useful metaphor and mnemonic device for describing the qualitative elements of an experience. I would argue that the more that a VR experience is able to achieve these four different levels of presence, then it’s going to feel like more of a direct lived experience that mimics what any other “erlebnis” experience feels like in reality.
Achieving the state of presence is an internal subjective experience, and so it’s going to be different for every person. But I believe that this Elemental Theory of Presence can help us understand a lot about virtual reality including being able to describe the different qualitative dimensions of an individual VR experience, describe the differences between mobile VR/360 video and room-scale VR, help elucidate the unique affordances of VR as a storytelling medium, and provide some insight for how AI will play a part in the future of VR narratives.
360 VIDEO CONSTRAINS EMBODIMENT & AGENCY
Brillhart begins her Filmmaker Magazine article about VR storytelling with a quote from Dziga Vertov about how the film camera could be thought of as a disembodied mechanical eye that can “show you a world the way only I can see it.” She says that “VR isn’t a disembodied medium at all. It’s quite the opposite, because its whole end-goal is embodiment.”
Watching a VR experience and being able to look around a 360-degree space starts to more closely mimic the experience of being in a specific place, and it takes away the control of the creator of being able to focus attention on specific things. From a storytelling perspective it means that “what I have to be as a VR creator is a story enabler, not the story dictator.”
Film and 360 video at this point has limited amounts of embodied presence and active presence. Because you can’t fully move your body around and there’s not a plausible way to interact or express your agency within the experience, then we could say that the earth and fire elements are constrained. You can still turn around your head, which mimics what it feels like to be standing still and looking left, right, up, or down without leaning too much, and you can express your agency by choosing what to look at and pay attention to. But it’s difficult to achieve the full potential of embodied and active presence given the current 3DOF tracking constraints and limited interactivity with live captured footage.
Having three degrees-of-freedom in mobile VR headset at this matches that capabilities of 360-video, but anyone who has experienced full-room scale VR with 6DOF hand tracking knows that the sense of embodiment is drastically increased. If you have good enough hand and elbow-tracking and inverse kinematics, then it’s possible to invoke the virtual body ownership illusion where you start to identify your virtual body as your own body.
Adding in the feet gives you even more of a deep sense of embodied presence, and haptic feedback is also once of the fastest ways of invoking the virtual body ownership illusion. My experience with The VOID still stands as the deepest sense of embodied presence I’ve experienced because I was able to explore around a space unteathered forever because my mind was tricked by the process of redirected walking. I also was getting passive haptic feedback every time I reached out to touch a wall. Moving around a room-scale environment can partially mimic this feeling, but the level of embodied presence is taken to the next level when you remove the wire tether and allow intuitive, beyond room-scale movements without any presence breaking chaperone boundaries trying to keep you safe.
The fire element is also constrained in 360 video. You are able to look anywhere that you want to across all levels of 3DOF and 6DOF virtual reality, but mobile VR limits the full expression of your agency. Without having natural and intuitive movement that comes tracking your hands and body on all six degrees of freedom, then any expression of agency is going to be abstracted through buttons on a gamepad, gaze detection triggers, or the trackpad on a Gear VR. These abstracted expressions of agency can only take your level of active & willful presence so far, because at a primal brain level I believe that active presence is cultivated through eliminating abstractions.
This means that 360 videos are not able to really cultivate the same depth of presence that a fully volumetric, interactive experience with 6DOF tracking in a room-scale environment is able to. This is the crux for why some hardcore VR enthusiasts insist that 360 video isn’t VR, and it’s also why 360 video will be trending towards positionally-tracked volumetric video whether it’s stitched together with photogrammetry techniques like 8i or HypeVR, using depth sensors like DepthKit or Mimesys, or using digital light field cameras from companies like Lytro.
I believe that the trend towards live action capture with volumetric video or digital lightfields will increase the feeling of embodied presence, but yet I have doubts that it will be able to achieve a satisfying level of a active and willful presence. Without having the ability to fully participate within a scene, then it’s going to be difficult for any live-action captured VR to be able to create a plausible sense of presence for the fire element. It’ll certainly enable “story dictators” to have complete control over the authored narrative that’s being presented, but any level of interactivity and expression of active and willful presence will be constrained.
Conversational interfaces with dynamically branching pre-recorded performances will perhaps offer way for you to express your agency within an experience. There are some narrative experiences starting to explore interaction with pre-filmed performances like Kevin Cornish’s Believe VR, which is triggered by gaze detection as well as Human Interact’s Starship Commander, which is triggered by natural language input (more on this down below. But the dominant mindset for most narrative storytellers coming from the film world is to not provide any level of interactivity to their authored stories.
360 VIDEO AMPLIFIES MENTAL & EMOTIONAL PRESENCE
Whenever you reduce the capability of one dimension of presence, then you can amplify the other elements. If mobile VR and 360 video has constrained embodied and active presence, then it can actually cultivate a deeper sense of mental/social and emotional presence. There’s a reason why the major empathy VR pieces have been 360 videos, and I’d argue that social VR experiences with constrained movement like BigScreen VR can actually provide a deeper sense of social presence with deeper and longer conversations.
360 video can also capture microexpressions and raw emotions in a much more visceral way. Our brains have evolved to be able to discern so much information from a human face, and so 360 video has a huge role to play in capturing human faces within a documentary or memory capture context. Live performances from actors can also be extremely powerful in the VR medium, and there is something that can be lost when it’s converted into an avatar.
The uncanny valley ends up driving avatars towards stylization. This has a double edged sword for 360 video. One the one hand, live capture video can capture a transmission of raw emotional presence when you have full access to someone’s facial expressions, body language, and eye contact. On the other hand, the uncanny valley is all about expectations, which means that 360 video almost always violates the fidelity contract for presence. When you get a photorealistic visual signal from VR, but it’s not matched by the audio, haptics, smell, and touch, then your brain will send a presence breaker signal to your primal brain that keeps you from feeling fully present. So CGI experiences can create a surrealistic world that transcends your expectations, and therefore can actually cultivate a deeper sense of presence.
That said, there’s still so many compelling use cases for 360 video and volumetric capture that I’m confident that it’s not going to go away, but there are clearly enough downsides to the level of presence that you can achieve with 360 video given it’s constraints. But I’d still argue against anyone who tries to argue that 360 video is not VR, especially once you understand how the power of embodied cognition can be triggered whether it’s in a 360 video or fully volumetric VR experience.
There are also a lot of storytelling advantages in having limited embodiment and agency that can amplify the sense of emotional, social, and mental presence in an experience. It will get cheaper and easier for anyone to create a 360-video experience, and emerging grammar and language of storytelling in VR is continue to evolve. So I see that there’s a healthy content ecosystem for 360 video that will continue to evolve.
The level of social interactions that you can have on a 3DOF mobile VR headset is also surprisingly adequate. There is still a large gap of body language expressiveness when you’re not able to do hand gestures and body movements, but there’s still quite a lot of body language fidelity that you can transmit with the combination of your head gaze and voice. This gap will also be closed as soon as 6DOF head and hand tracking eventually comes to mobile VR as soon as 2017 or 2018.
STORYTELLING IN VR IS ABOUT GIVING & RECEIVING
When it comes to narrative and storytelling in VR, there’s a continuum between a passive film and an interactive game where I feel that a virtual reality experience sits in the middle of this spectrum. I’d say that films and storytelling are more about mental & emotional engagement while games are more about active, social, and embodied engagement. Storytelling tends to work best when you’re in a receiving mode, and games work best when your exerting your will through the embodiment of a virtual character.
From a Chinese philosophy perspective, films are more about the yin principle and games are more about the yang principle. The challenge of interactive storytelling is to be able to balance the yin and the yang principles to provide a plausible experience of giving and receiving. At this point, games resort to doing explicit context switches that move into discrete story-mode cinematics to tell the story or have explicit gameplay and puzzles that are are designed for you to express your agency and will. This is because interactive games haven’t been able to really listen to you as a participant in any meaningful way, but that will start changing with the introduction of artificial intelligence and machine learning technologies.
ARTIFICIAL INTELLIGENCE WILL ENABLE EXPERIENCES LISTEN TO YOU
Machine Learning has shown amazing promise in computer vision and natural language processing, which means that games will soon be able to understand more about you through watching your movements and listening to what you say. When you’re embodied within a 6DOF VR experience, you are expressing subtle body language cues that machine learning will eventually be able to be trained upon. As I covered on my episode on privacy in VR, Oculus is already recording and storing the physical movements you make in VR, which will enable them to train machine learning neural nets to potentially identify body language cues.
Right now most machine learning neural networks are trained with supervised learning, which would require a body language expert human to watch different movements and be able to classify them into different objective categories. Body language experts are already using codifying body language behaviors within NPCs to create more convincing social interactions, and it’s a matter of time before AI-driven NPCs will be able to identify the same types of non-verbal cues.
When you speak to AI characters in VR experiences, then natural language processing AI will be able to translate your words into different discrete buckets of intent, which can then trigger behaviors in a much more interactive and plausible way than ever before. This is the approach that Human Interact’s Starship Commander is taking. They announced today that they’re using natural language input with Microsoft’s new Language Understanding Intelligent Service (LUIS) and Custom Recognition Intelligent Service (CRIS) that’s a part of their Cognitive Service APIs. Starship Commander’s primary gameplay mechanic is natural language input as you play through their interactive narrative giving verbal commands to the Hal-like computer. I have an interview with Human Interact’s Alexander Mejia that will air in my next episode, but here’s a trailer for their experience.
I believe that with the help of AI, a VR storytelling experience is what is going to sit in the middle between a yin-biased film and yang-biased game. What makes something an “experience”? I’d say that an experience is any time that we feel like we’re crossing the threshold to achieve a deep level of presence in any of the four dimensions whether it’s embodied, active, mental/social, or emotional presence. If it’s hitting all four levels, then it’s more likely to reach that level of direct experience with a deep sense of presence. With the embodiment and natural expression of agency that is provided by VR, then the virtual reality medium is uniquely suited to be able to push the limits of what embodied storytelling is able to achieve.
WHAT THE FUTURE OF AI-DRIVEN NARRATIVES WILL LOOK LIKE
There’s a couple of AI-driven games that I think show some of the foundational principles of where VR games will be going in the future.
Sleep No More is an immersive theater experience where the characters are running through a hundred different warehouse rooms interacting with each other through interpretive dance in order to tell the story of Shakespeare’s Macbeth. As it stands now, the audience is a passive ghost, and you can’t really directly interact of engage with the narrative at all. You can decide what room to go into and which actors to follow or watch, and it’s a looping narrative so that you have a chance to see the end of a scene and later see the beginning.
Imagine what it would be like if the characters were able to listen to you, and you’d be able to actually engage and interact with them. The characters would have the freedom to ignore you just as in real life a stranger may blow you off if they had something important to do, but there could be other ways that you could change the outcome of the story.
One AI-driven simulation game that explores the idea of changing the fate of a tragic story, in Elsinore you have the ability to change the outcome of Shakespare’s Hamlet. You play the minor female character named Elsinore whose movements and interactions would have been plausibly been ignored within the context of the original story. You can go around and try to intervene and stop the set of actions that leads to a number of different murders.
Elsinore is a time-looping game that uses some sophisticated AI constraints and planning algorithms that determines the fate of the story based upon each of your incremental interventions. The story dynamically changes spins off into alternative branching Hamlet fan fiction timelines based upon your successful interventions of multiple tragedies. You have four iterations of the course of a single day Groundhog’s Day style, and you prevent the multiple murders through social engineering. Once natural language input and VR embodiment are added in an experience like this, then this type of emergent storytelling / live theater genre is going to be perfectly well-suited for the virtual reality medium.
Bad News is an amazing AI-driven game that I had the privilege of playing at the Artificial Intelligence and Interactive Digital Entertainment conference last October. There’s a deep simulation that creates over 100 years of history in a small imaginary town complete with characters, families, relationships, work history, residencies, and a social graph. You’re thrown into this world at the scene of a death of a character and it’s your job to notify the next of kin of the death, but you don’t know anyone in the town and you can’t tell anyone why you’re looking for the next of kin. You can only notify the next of kin of the death otherwise you lose the game. Your job is to explore a this imaginary world talking to residents trying to find the family members of the deceased.
You have two interfaces with this simulated Bad News world. One is an iPad telling you locations, addresses, and then descriptions of people who are in each location that you go to. After you pick a location and choose someone to talk with, then you explore this world though conversations with an improv actor. These improvised conversations are your primary interface with this imaginary world, and you have to use your detective skills to find the right person and your social engineering skills to come up with a convincing cover story without being too noisy or suspicious.
The improv actor has a list of information about each character that he’s embodying so that he can accurately represent the personality and openness as determined by the deep simulation. He also has access to a Wizard of Oz orchestrator who can query the database of the deep simulation asking about information on the location of other town residents. Because this world is based upon an actually deep simulation that’s running live, then the interactor has find a moving target and so the Wizard of Oz can provide subtle hints to keep the run time of the game at around one hour.
It was an amazingly rich experience to engage in a series of conversations with strangers played by the improv actor in this imaginary town. You’re fishing for information and clues as to who is related to the person who has passed and where to find them while trying to maintain your cover story. Bad News encourages the type of narrative-driven, role-playing interactions that only a human can do at this point, but this type of conversational interface is what AI-driven natural language processing is going to enable in the future. This is the trajectory of where storytelling and gaming and in virtual reality is headed, and this possible future is freaking out traditional storytellers who like to maintain control.
THE BATTLE BETWEEN AUTHORED & EMERGENT STORIES
There are some people from the Hollywood ecosystem who see the combination of gaming and film as “dystopian.” Variety Fair’s Nick Bilton says, “There are other, more dystopian theories, which predict that film and video games will merge, and we will become actors in a movie, reading lines or being told to “look out!” as an exploding car comes hurtling in our direction, not too dissimilar from Mildred Montag’s evening rituals in Fahrenheit 451.”
This vision of films and games merging is already happening, and I certainly wouldn’t call it “dystopian.” Why is there so much fear about this combination? It think some storytellers see it as dystopian because the types of open world, sandbox experiences have not had a very strong or compelling narratives integrated into them. Being able to express the full potential of your agency often completely disrupts the time-tested formula of an authored narrative.
Darnell was actually very skeptical of VR’s power of interactivity when I spoke to him in January 2016 where said that there’s a tradeoff between empathy and interactivity. The more that your engaging with an experience, then it becomes mostly about finding the limits of your agency and control. Stories are about receiving and if you’re completely focused on transmitting your will into the experience, then you’re not open to listening or receiving that narrative.
This is what Darnell believed last year, but over the past year he’s been humbled by observing the power and drive of VR to off the possibility of more interactivity. He watching so people who wanted to more directly interact with their bunny character in Invasion!, and he kept hearing that feedback. So at Sundance 2017, Darnell explored how empathy could be combined with interactivity to facilitate compassionate acts in Asteroids!.
He makes you a low-level sidekick, and you get to watch the protagonists play out their story independent of anything that you do. But I wouldn’t classify Asteroids! as a successful interactive narrative, because your agency is constrained to point where can’t really cultivate a meaningful sense of willful presence. Your interactions amount to local agency without any meaningful global impact on the story. There’s still a powerful story dictator with a very specific set of story beats that will unfold independent of your actions. While there are some interesting emotional branching relationships that are explored that give variation for how the characters relate to you based upon your decisions, you’re still ultimately a sidekick to the protagonists whose stories are fated beyond your control. This made me feel like a ghost trapped within an environment where it didn’t matter if I was embodied in the story or not.
Having explicit context switches with artificially constrained agency makes breaks my level of active presence. One of the keys to Job Simulator’s success of grossing over 3 million dollars is that they wanted to make everything completely interactive and dynamic. They applied this high-agency engine to stories with Rick and Morty’s Simulator, and they allow you to interrupt and intervene within a story at any moment. If you throw a shoe in the face of the main character, then he will react to that and then move on with the story. It’s this commitment to interruption that could be one key towards achieving a deep sense of active and willful presence.
But most storytellers are not taking a high-agency, inspired approach to narrative like Owlchemy Labs. The leadership of Oculus Story Studio has a strong bias towards tightly-controlled narratives with ghost-like characters without much meaningful agency. In my conversation with the three of the Oculus Story Studio co-founders they expressed their preference towards the time-based telling of an authored narrative. Saschka Unseld went as far as to say that if you have a branching narrative, then that’s an indication that the creator doesn’t really know what they want to say. Oculus Story Studio is exploring interactivity in their next piece, but given their preference for a strong authored story, then any interactivity is likely going to be some lightweight local agency that doesn’t change the outcome of the story.
There is an undeniable magic to a well told story, and Pearl and Oculus Story Studio’s latest Dear Angelica are the cream of the crop of the potential of narrative virtual reality. But without the meaningful ability to express your agency, then these types of tightly-controlled, authored narratives are destined to maintain the ghost-like, status quo of constrained active presence that Oculus coined as The Swayze Effect.”
FUTURE OF AI-DRIVEN INTERACTIVE NARRATIVES
In my interview with Façade co-creator Andrew Stern he said that there really hasn’t been a video game experience since 2005 that has provided a player with meaningful local and global agency where your small, highly dynamic action in every moment could dramatically alter the outcome of the story. In Façade, you’re at a dinner party with a husband and wife who are fighting, and you use natural language input to interact with them. The AI determines if you’re showing affinity towards the husband or wife, and a backend algorithm keeps track of your allegiances as you try to balance your relationships to push each character towards revealing a deep truth.
Here is Stern explaining his vision of the future of interactive drama driven by artificial intelligence:
There’s a spectrum of authored vs emergent story, and Façade uses a drama manager to manage input from the but also maintain the dramatic arc of the story. If you read their “Behind the Façade” Guide, then it reads more like a computer program than a film script, but the most authoritative blue-print for how to architect an interactive narrative that is fully listening to player and providing meaningful global agency. Here’s a visualization of Façade’s drama manager that balances user input with the unfolding of a traditional three-act dramatic structure of the overall story.
Human Interact’s Starship Commander is an important step towards allowing a natural language integration within a VR experience, but it’s still a highly authored experience given that they really provide performances from actors. Façade also features recorded performances, and so this hybrid approach provides a limit on the extent of how much true emergent behavior that you can achieve.
Looking to the future, Stern’s Playabl.ai is focusing on creating AI-driven interactive characters where trust and rapport can be built up over time. They’re leveraging research for modeling human behavior from a DARPA-funded AI programs called IMMERSE, and they’re hoping that interactions with these types of AI characters could start to mimic what it feels like to have an emergent conversation.
Stern is collaborating with his Façade co-creator Michael Mateas, who founded the Expressive Intelligence Studio at University of California, Santa Cruz. Mateas and his students are creating some of the most cutting-edge AI-driven, interactive narratives out there (I have an interview with Mateas and a number of his students that will be released with the upcoming Voices of AI podcast, including the creators of Bad News).
AI is going to be a critical part of the future of interactive narrative in VR, and 2017 promises to have many of the advances of machine learning start to be made available through cloud-based AI services from Microsoft, Google, and IBM. It’s already starting with natural language processing being integrating, but the ultimate affordances of AI will go much deeper.
THE FUTURE OF AI & VR IS “UNCOMFORTABLY EXCITING”
At the end of Brillhart’s Filmmaker Magazine article, she says, “Blaise Agüera y Arcas, is a principal scientist working on artificial intelligence at Google. He had this to say about the current state of AI, which I think also describes the current state of VR better than anything else I’ve heard: ‘We live in uncomfortably exciting times.’”
There’s so much potential of VR and AI that it’s hard to predict where it’ll eventually all go, but there are some early indications of some initial cross section will be. Brillhart talks about trippy Google Deepdream VR experiments that she’s done that starts to create such an unexpected experience that it starts to feel like you’re on a psychedelic trip.
Style transfer another area that is likely to be another early win for VR. This is where the primary features of artist’s style can be extracted and applied to new images. This is likely to start to be used in procedurally-generated textures and eventually 3D models within VR. Brillhart imagines that auteurs will train neural nets on image sets in order to use AI as a creative collaborator.
In terms of what else will be on the horizon, we can look at the principles of how AI is able to objectify subjective judgments and eventually be able to make the types of qualitative decisions that we ascribe to intelligent beings. So AI will have the ability to quantify the qualitative aspects of life, and then express it back to us within an immersive environment.
Neural nets have the ability to come up with averages of qualitative data like faces or create “interlingua” intermediary translation languages that amount to every language, but no language at the same time. Eventually, companies like Facebook, Google, or Twitter may be able to translate their vast repository of Big Data to create composite, AI-driven, chatbot NPCs that are embodied within a VR experience. We may be able to interact with a comprehensive representation of the beliefs and thinking of an average 25-year old white male or a mid-40s black lesbian. This will be a long-process and there are tons of open questions around creating a representative and unbiased set of training data to achieve this, but Brillhart think that it represents a future potential of where AI could go.
AI will be also involved in the procedural generation of both environmental and narrative content. Ross Goodwin explored using AI to automatically write a surrealist sci-fi movie script that they produced as a part of a 48-hour film competition. The script for Sunspring was automatically generated by feeding a Long Short-Term Memory recurrent neural network dozens of sci-fi scripts, and then the AI generated dialog and staging notes that the film crew used to produce the piece.
The resulting dialog is largely non-sensical and filled with confusion, but what makes Sunspring so compelling is the actors who are able to take the semantically-correct but otherwise jibberish dialog and imbue it with meaning through their acting and staging. You can feel what they mean with their body language and emotions but the actual words are meaningless, which gives it an other-worldly feel.
At a baseline, AI will provide creative constraints and inspiration for human interactions and collaborators that will allow for more freedom in role-playing and perhaps eventually provide the arc of a satisfying story. The ultimate realization of this type of collaborative storytelling is done by skilled Dungeons and Dragons dungeon masters who are able to guide the arc of an adventure, but at the same time allow each of the participants to do anything that they want at any time. The “theater of the mind” is currently the only medium that fully realizes the full potential of the human imagination.
Each participant has a fragmented picture of a scene in their minds, and the reality is dynamically and collaboratively constructed through each question asked and each line spoken. There’s a set of rules and constraints that allow the dungeon master or the roll of the dice to serve of the fates of these intentions, but any visualization of a virtual space by a VR artist or developer could serve as a limitation to the collective will imagination of the DnD characters who want to be in full control of their destiny.
Will AI be able to eventually serve as the dungeon master’s dual role as master storyteller and group facilitator? I asked long-time dungeon master Chris Perkins, and he was skeptical that AI and VR will be able to achieve the same level of emergent story that a DM can facilitate any time soon. But yet he’s also convinced that it will eventually happen and that it’s almost inevitable when looking at the overall trajectory of storytelling and technology. He says that life-long friendships are forged through playing DnD, and so the collaborative storytelling experiences that are created are so powerful that’s there’s enough intrinsic motivation for people to solve the technological roadblocks that will enable this type of collaboratively emergent form of storytelling. Currently Mindshow VR is one the leading edge of creating this type of collaborative storytelling platform.
VR Chat is empowering the most sophisticated level of collaborative storytelling with their open world explorations of interconnected metaverse worlds. But the stories that are being told here are largely generated from interpersonal relationships and the communication that happens with other people you happen to be going through the experience with you. Having meaning be co-created and constructed by a group of people shows that collaborative and social storytelling experiences are something that are unique to the VR medium.
But there’s still a lot of power in being able to use an environment to tell a story. Rand Miller’s Obduction and The Gallery: Call of the Starseed are two stand-out examples of environmental storytelling in VR, but there’s still a lot of room for where social and environmental storytelling will go in the future.
Having AI chatbots within an open worlds will be the next logical step, but Rand Miller says that architecting a non-linear story through an environment is not an easy task. Otherwise, there would be a lot more people doing it. Stern says that part of the “>open world storytelling dilemma is that there’s a tradeoff in being able to explore an open world without limits and still be able to communicate a story with an engaging dramatic arc without it feeling fractured and unsatisfying.
This is where emergent conversations with AI characters powered by global drama manager starts to come in. Stern envisions a future of interactive drama where you can have complete and meaningful expression of your agency through conversational interfaces with AI where there is giving and receiving with meaningful participation and active listening. If there is a drama manager in the background that’s planning the logistics system driving a simulation, then the future of AI and VR has a lot of promise of not just being able to watch and observe a meaningful story, but to fully participate in the co-creation of a living story, which is a process that Stern prefers to call “story making.”
Last year, Baobab Studios’ Eric Darnell was skeptical about adding interactivity to virtual reality stories because he felt like there was a tradeoff between empathy and interactivity. But after watching people experience their first VR short Invasion!, he saw that people were much more engaged with the story and wanted to get more involved. He came to that realization that it is possible to combine empathy and interactivity in the form of compassion acts, and so he started to construct Baobab’s next VR experience Asteroids! around the idea of allowing the user to participate in an act of compassion.
I had a chance to catch up with Darnell at Sundance where we talked about his latest thoughts about storytelling in VR, and explored his insights from their first explorations of what he calls “emotional branching.” Darnell says that one of the key ingredients of a story is “character being revealed by the choices that they make under pressure.” Rather than make you the central protagonist as a video game might, in Asteroids! you’re more of a side kick who can choose whether or not to help out the main characters. This allows an authored story to be told though the main characters that are ultimately independent of your actions, but your “local agency” choices still flavor your experience in the sense that there are different “emotional branches” of the story for how the main protagonists react to you based upon your decisions.
LISTEN TO THE VOICES OF VR PODCAST
Unpacking the nuances of these emotional branches showed me that Asteroids! was doing some of the most interesting explorations of interactive narrative at Sundance this year, and I would’ve completely missed them had I not had this conversation with him. We explore some of the more subtle nuances of the story, and so I’d recommend holding off on this interview if you don’t want to get too many spoilers (it should be released sometime in the first half of 2017). But Darnell is a master storyteller, and he’s got a lot of really fascinating thoughts about how stories might work in VR that are worth sharing out to the storytellers in the wider VR community.
They’re also doing some interesting experiments of adding in body language mirroring behaviors into the other sidekick characters that are based upon social science research in order to create subtle cues of connecting to the characters and story. There is another dog-like robot the experience that is in the same sidekick class as you where you can play fetch with it and interact with in subtle ways.
Storytelling is a time-based art form that has a physical impact of releasing chemicals in our bodies including cortisol at moments of dramatic tension, oxytocin with character interactions, and dopamine at the resolution of that dramatic tension. Given these chemical reactions, Darnell believes that the classic 3-act structure of a story taps is something that is encoded within our DNA. Storytelling is something that has helped humans evolve, and it’s part of what makes us human. He cites Kenneth Burke saying that “Stories are equipment for living.” Stories help us learn about the world by watching other people making choices under pressure.
There’s still a long ways to go before we achieve the Holy Grail of completely plausible interactive stories that provide full global agency while preserving the integrity of a good dramatic arc. It’s likely that artificial intelligence will eventually have a much larger role in accomplishing this, but Asteroids! is making some small and important steps with Darnell’s sidekick insights and “emotional branching” concept. It was one of the more significant interactive narrative experiments at Sundance this year, and showed that it’s possible to combine empathy and interactivity to make a compassionate act.
There are a number of immersive storytelling innovations Sundance 2017 in a number of experiences including Dear Angelica, Zero Day VR, Miyubi, and Life of Us, but Mindshow VR’s collaborative storytelling platform was the most significant long-term contribution to the future of storytelling in VR. I first saw Mindshow at it’s public launch at VRLA, and it’s still a really compelling experience to record myself playing multiple characters within a virtual space. It starts to leverage some of virtual reality’s unique affordances when it comes to adding a more spatial and embodied dimension to collaboratively telling stories.
I had a chance to catch up with Visionary VR’s CEO Gil Baron and Chief Creative Officer Jonnie Ross where we talk about how Mindshow is unlocking collaborative creative expression that allows you to explore a shared imagination space within their platform. We talk about character embodiment, and the magic of watching recordings of yourself within VR, how they’re working towards enabling more multiplayer and real-time improv interactions, and they announced at Sundance that they’re launching Mindshow as a closed alpha.
LISTEN TO THE VOICES OF VR PODCAST
This is also episode #500 of the Voices of VR podcast, and Jonnie and Gil turn the tables on me for what I think the ultimate potential of VR is. My full answer to this question that I’ve asked over 500 people will be fully covered in my forthcoming book The Ultimate Potential of VR. But briefly, I think that VR has the power to connect us more to ourselves, to other people, and to the larger cosmos. Mindshow VR is starting to live into that potential today of providing a way to expressing your inner life through the embodiment of virtual characters that you can then witness, reflect upon, and share with others, and Google Earth VR shows power of using VR to connect more to the earth as well as the wider cosmos.
I had a chance to catch up with Arora at Oculus’ VR for Good premiere party at Sundance where we talked about directing Clouds Over Sidra, his new social enterprise The LightShed Collective, and the importance of storytelling in creating VR empathy experiences.
LISTEN TO THE VOICES OF VR PODCAST
Arora’s work has been at cross section of storytelling and technology, and diplomacy and humanitarian efforts. He studied film in college, but was unable to launch a successful film career in Hollywood, and instead turned towards humanitarian work with NGOs after 9/11 and eventually with the United Nations in 2009. He used his creative sensibilities to move beyond written text reports, and look to the power of new media to tell humanitarian stories. He had some success with collaborating with social media sensation Humans of New York photographer Brandon Stanton by coordinating a 50-day global trip with in 2014 in order to raise awareness of millennium development goals. He proved the power of using emerging technology to promote humanitarian goals.
After he was introduced to Chris Milk in 2014, he gathered enough support to create a virtual reality lab at UN staring with creating an experience about the Syrian refugee crisis. Clouds Over Sidra was shot in two days in December 2014 at the Za’atari Refugee Camp, which had over 80,000 Syrian refugees. Arora wanted to focus on a day in the life of a 12-year old refugee, and collaborated with his UN contacts to find the young female protagonist named Sidra. Arora said that a big key to cultivating empathy in virtual reality is to focus on the common ordinary aspects of day-to-day living whether that’s eating a meal or preparing for school. While some of these scenes would seem like non-sequiturs in a 2D film, the sense of presence that’s cultivated in VR gives the feeling of being transported into their world and a feeling of being more connected to the place and story.
Arora acknowledges that merely showing suffering of others can have the opposite effect of cultivating empathy. He cites Susan Sontag’s Regarding the Pain of Others as a book that helped provide some guidelines for how to represent the pain of others. He’s aware that we can have a lustful relationship towards violence, and that there are risks of normalizing suffering can create an overwhelming sensory overload. He’s addresses some of Paul Bloom’s arguments in Against Empathy in that there’s a bias towards empathizing with people who look or act like you. If there’s too much of a difference, then it can be difficult to connect through on any common ground. This is a big reason why Arora has typically focused on finding ways of representing the moments of common humanity within the larger context of fleeing from war or coping with a spreading disease like Ebola.
Arora was able to show that Clouds Over Sidra was able to help the United Nations beat their projected fundraising goal of $2.3 billion dollars by raising over $3.8 billion, but he’s much more confident in showing the UNICEF’s numbers of being able to double face-to-face donations from 1 in 12 without VR to 1 in 6 with VR with an increase of 10% per donation. With these types of numbers, there’s been a bit of a gold rush for NGOs to start making VR experiences for a wide range of causes, but Arora cautions that not all have been successful because not all of them have had an emphasis on good storytelling or the technical expertise that he’s enjoyed with his collaborations with Within.
Hamlet on the Holodeck author Janet H. Murray recently echoed the importance of good storytelling in VR experiences by saying that “empathy in great literature or journalism comes from well-chosen and highly specific stories, insightful interpretation, and strong compositional skills within a mature medium of communication. A VR headset is not a mature medium — it is only a platform, and an unstable and uncomfortable one at that.” The storytelling conventions of VR are still emerging, and the early VR empathy pieces have been largely relying upon conventions of traditional filmmaking.
Arora admits that there’s a certain formulaic structure that most of these early VR empathy pieces have taken that rely upon voice over narration, but he says that he started to dial back the voice overs in his most recent piece The Ground Beneath Her. He says that his recent collaboration with Milk & Here Be Dragons on the U2 Song for Someone music video showed him that there’s a lot that can be communicated without resorting to voice overs.
Murray argues that “VR is not a film to be watched but a virtual space to be visited and navigated through,” and she actually recommends “no voice-overs, no text overlays, no background music.” I’ve independently come to the same conclusion, and generally agree with this sentiment because most voice over narrations or translations feel scripted and stilted. They are also often recorded within a studio that doesn’t match the direct and reflected sounds of the physical locations that are shown, which creates a fidelity mismatch that can break presence and prevent me from feeling completely immersed within the soundscapes of another place.
I’ve found that the cinéma vérité approach of having authentic dialog spoken directly within a scene works really well, or that it works best if the audio is directing me to pay attention to specific aspects to the physical locations that are being shown. After watching all ten of the Oculus for Good pieces at Sundance, one of the most common things that I saw is not having the physical location match whatever is being talked about. Sometimes they’re interesting locations to look at, but it ends up putting the majority of storytelling responsibility within the audio. If the audio were to be taken away, then the visual storytelling isn’t strong enough to stand on it’s own.
6×9′s Francesca Panetta used audio tour guides as an inspiration for how to use audio in order to cultivate a deeper sense of presence within the physical location being shown. One live-action VR piece that does this really well was a cinéma vérité piece by Condition One called Fierce Compassion, which features an animal rights activist speaking on camera taking you on a guided tour through an open rescue as it’s happening. The live delivery of narration feels much more dynamic when it’s spoken within the moment, and feels much more satisfying than a scripted narration that’s written and recorded after the fact.
A challenging limitation to many NGO empathy pieces is that they often feature non-English speakers who need to be translated later by a translator who doesn’t always match the emotional authenticity and dynamic speaking style of the original speaker. Emotional authenticity and capturing a live performance are some key elements of what I’ve found makes a live-action VR experience so captivating, but it’s been rare to find that in VR productions so far. There are often big constraints of limited time and budgets, which means that most of them end up featuring voice over narratives after the fact since this is the easiest way of telling a more sophisticated story. This formula has proven to be successful for Arora’s empathy pieces so far, but it still feels like a hybrid between traditional filmmaking techniques and what virtual reality experiences will eventually move towards, which I think Murray quite presciently lays out in her piece about emerging immersive storyforms.
Arora’s work with the UN in collaboration with Within has inspired everyone from the New York Times VR to Oculus’s VR for Good program and HTC’s VR for Impact. It also inspired Chris Milk’s TED talk about VR as the “ultimate empathy machine”, which is a meme that has been cited on the Voices of VR podcast dozens of times.
But the film medium is also a powerful empathy machine as Arora cites Moonlight as a particularly powerful empathy piece that was released in 2016. Roger Ebert actually cited movies as the “most powerful empathy machine” during his Walk of Fame speech in 2005. He said:
We are born into a box of space and time. We are who and when and what we are and we’re going to be that person until we die. But if we remain only that person, we will never grow and we will never change and things will never get better.
Movies are the most powerful empathy machine in all the arts. When I go to a great movie I can live somebody else’s life for a while. I can walk in somebody else’s shoes. I can see what it feels like to be a member of a different gender, a different race, a different economic class, to live in a different time, to have a different belief.
This is a liberalizing influence on me. It gives me a broader mind. It helps me to join my family of men and women on this planet. It helps me to identify with them, so I’m not just stuck being myself, day after day.
The great movies enlarge us, they civilize us, they make us more decent people.
Ebert’s words about film as a powerful empathy machine as just as true today as when he said it in 2005. I do believe that virtual reality has the power to create an even deeper sense of embodied presence that can trigger mirror neurons, and may eventually prove to become the “ultimate empathy machine.” VR may also eventually allow us to virtually walk in someone else’s shoes to the point where our brains may not be able to tell the difference between what’s reality and what’s a simulation. But as Murray warns, “empathy is not something that automatically happens when a user puts on a headset.” It’s something that is accomplished through evolving narrative techniques to take full advantage of the unique affordances of VR, and at the end of the day will come down to good storytelling just like any other medium.
Owlchemy Labs recently announced that Job Simulator has grossed over $3 million, and so it’s worth reflecting on some of the design principles of agency and plausibility that have proven to be some of the key affordances of the virtual reality medium. I had a chance to talk to Owlchemy Labs’ Cy Wise at PAX West where she shared with me some guiding principles for Job Simulator as well as some of the more existential reactions from users questioning the nature o reality.
LISTEN TO THE VOICES OF VR PODCAST
Wise says that one of the key design principles of Job Simulator was to make sure that everything was interactive. Their goal was to not make it feel like a game, but rather that people would get so lost in the plausible interactions that they’d be able to achieve a deep sense of presence. She cites the example of making tea in that they had to account for the dozens of different ways that people make their tea in order to maintain that level of plausibility that they’ve created in their virtual world. If it’s not intuitive, then the rules and constrictions of the simulation make it feel like a game rather than just executing a task given the affordances of the environment match their expectations of how it should behave.
Owlchemy Labs was able to do such a good job at creating a sense of presence in people that Wise said that it would often create a bit of an existential crisis since it blurred their boundaries of reality. VR developers talk about this as the sense of presence in VR, but there isn’t a common language for people who are having a direct experience of VR presence for the first time.
Wise asks, “How do you talk about the “not real” real? Or how do you talk about the imaginary real life?” And that if people were able to have a direct lived experiences within a virtual simulation, and it felt completely real, then it begs the question of whether or not we’re already living in a simulation. The Atlantic did a profile on people who experienced a post-VR existential crisis that made them question whether actual reality is real or not.
Hassan Karaouni recently told me that if we’re not already in a simulation, then we’re most certainly going to create virtual realities that are indistinguishable for reality that will have intelligent agents within these simulations who will be asking these exact same questions.
VR is starting to give us more and more experiences that are impossible to have in reality, and our memories of these experiences can be just as vivid as “real life” experiences, which further blurs the line between the “virtual” and “real.” The long-term implications of this are still unclear, but what is clear is that Owlchemy Labs has been focused on the principles of Plausibility and Agency, which mirrors what OSSIC CEO Jason Riggs recently declared that the future is going to be Immersive and Interactive.
If we are in a simulation, then it’s possible that we may never be able to reach base reality. As we continue to experience simulations that are more and more indistinguishable from reality, then perhaps the best that we can do is to strive to reach the deepest sense of presence at each layer of inception that we discover.
HTC announced the Vive Tracker at CES this year, which will enable a range of VR peripherals that are targeted to from consumers to high-end virtual reality arcades. One of the higher-end peripherals that debuted was VRsenal’s VR-15, which has built-in haptics and the same weight distribution as a M-15 and AR-15. I had a chance to catch up with VRsenal CEO Ben Davenport who talked about targeting the digital out-of-home entertainment and VR arcade market with their integrated solutions of commercial-off-the-shelf VR hardware, VR backpacks and haptic vests with customizations and top-of-the-line gun peripherals with an integrated Vive tracker.
LISTEN TO THE VOICES OF VR PODCAST
While VR hardware is expected to continually improve over each successive generation, Davenport makes the claim that
limited real estate within the homes will drive consumers to VR arcades that will be able to provide better compelling experiences given extra space. He says that competitive VR games are limited by teleportation and locomotion constraints, and that being able to physically move around large spaces will open up the types of social interactions that are possible with laser tag or paint ball.
He expects to see a return to the golden era of arcades when they could provide a more compelling and visceral experience than what’s possible with consumer VR within a home. High-end haptic devices will also likely be a differentiating factor as the passive haptic feedback from the VR-15 peripheral combined with embodied gameplay is able to deliver a compelling experience that people will be willing to pay for. He also expects to people eventually going through non-gaming and non-entertainment virtual and augmented experiences while they are co-located in the same physical environment.
SynTouch has created a system that can quantify the sense of touch on fifteen different dimensions called the SynTouch Standard, and they’re one of the most impressive haptic start-ups that I’ve seen so far. SynTouch isn’t creating haptic displays per se, but they are capturing the data that will vital for other VR haptic companies to work towards creating a display that’s capable of simulating a wide variety of different textures. SynTouch lists Oculus as one of their partners, and they’re also providing their data to a number of other unannounced haptic companies.
LISTEN TO THE VOICES OF VR PODCAST
I had a chance to talk with Matt Borzage, head of development and one of the co-founders of SynTouch at CES where we talked about the 15 different dimensions of their SynTouch Standard across the five major areas of Texture, Compliance, Friction, Thermal, and Adhesive. This research was originally funded by DARPA in order for adding the feeling of touch to prosthetics, and the founders have backgrounds in biomedical engineering. But their mechanical process of objectively measuring the different dimensions of textures has a lot of applications in virtual reality that creates a baseline of input data for haptic displays.
Here’s a comparison of denim and a sponge across the 15 dimensions of the SynTouch Standard:
SynTouch has found a great niche in the haptics space in being able to already provide a lot of insight and value to a number of different companies looking at the ergonomics of industrial design, and they’re a company to watch in the VR space as more and more different haptics companies try to solve some of the hardest engineering problems around creating a generalized haptic device for VR.
Deep in the basement of the Sands Expo Hall at CES was an area of emerging technologies called Eureka Park, which had a number of VR start-ups hoping to connect with suppliers, manufacturers, investors, or media in order to launch a product or idea. There was an early-stage haptic start-up called Go Touch VR showing off a haptic ring that simulated the type of pressure your finger might feel when pressing a button. I’d say that their demo was still firmly within the uncanny valley of awkwardness, but CEO Eric Vezzoli has a Ph.D. in haptics and was able to articulate an ambitious vision and technical roadmap towards a low-cost and low-fidelity haptics solution.
Vezzoli quoted haptics guru Vincent Hayward as claiming that haptics is an infinite degree of freedom problem that can never be 100& completely solved, but that the best to hope for is to trick the brain. Go Touch VR is aiming to provide a minimum viable way to trick the brain starting with simulating user interactions like button interactions.
I had a chance to catch up with Vezzoli at CES where we talked about the future challenges of haptics in VR including the 400-800 Hz frequency response of fingers, the mechanical limits of nanometer-accuracy of skin displacement, the ergonomic limitations of haptic suits, and the possibilty of fusing touch and vibrational fedback with force feedback haptic exoskeletons.