#1263: MeetWol AI Agent with Niantic, Overbeast AR App, & Speculative Architecture Essays with Keiichi Matsuda

Keiichi Matsuda is one of the deepest thinkers in the field of XR. He is trained as an architect, and he’s currently the director of Liquid City, a small design studio based in London that is engaged developing cutting-edge AR applications and creating speculative design film and written essays that give us a sneak peak at the future of immersive and spatial computing technologies. Matsuda created one of the most influence cautionary tales of augmented reality called HYPER-REALITY in 2016, which I did a deep dive with him back in 2018 in episode #639. He’s currently working with Niantic on some more speculative film essays, which should be launching soon.

Matsuda was at AWE 2023 showing off a couple of recent projects including an AI agent named Wol in collaboration with Niantic’s 8th Wall, that has the intention to explore the idea of personalized education. He was also showing off his really cool AR game called Overbeast, which is a highly-original and unique social AR game that combines farming with giant boss battles of AR beasts that fill the entire sky. We had a chance to unpack all of his latest projects, and do a deep dive into how they used inworld.ai to create Wol that has mix of magical glue of large language model technology that keeps interactions novel, fresh, but also bounded by the customized set of knowledge that you enter. I’ll be diving in a bit deeper into inworld.ai in the next episode.

Also be sure to check out Matsuda’s provocative essay on KamiOS that’s uses pagan animism as a metaphor to describe how AI + XR will be infusing our lives. “KamiOS channels the spirit world using AR. When you put on your headset, you will be introduced to many different gods who will guide you through your virtual and physical life. Gods of navigation, communication, commerce. Gods who teach you, gods who learn from you. Gods who make their home in particular objects or places, and gods who accompany you on your journey.”

This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality

Podcast: Play in new window | Download

Rough Transcript

[00:00:05.412] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR podcast. It's a podcast that looks at the future of spatial computing. You can support the podcast at patreon.com.voicesofvr. So this is episode 11 of 17 of looking at the intersection between XR and artificial intelligence. And today's episode is with Keiichi Matsuda, who's the director of Liquid City, and he's been doing all sorts of really interesting augmented reality experiences that have been at the intersection of speculative design and architecture and also now artificial intelligence. So he is creating this conversational agent called WOL, W-O-L, and it's at a website called MeetWOL, M-E-E-T-W-O-L.com. So this is a AI agent who's exploring the idea of personalized education. And so It's using on the back end nworld.ai and he's working in collaboration with Niantic with the Lightship API. It's using 8thwall which was a WebAR company that was acquired by Niantic and so it's using WebXR technology to be able to have a augmented reality overlay so using either a Phone or meta quest pro or meta quest to be able to have a augmented reality pass-through So it's creating this portal into the wall You have this owl that flies up and you have this conversational interface with this AI agent within the context of augmented reality And so yeah, it's just showing the potential of having a full arc of a conversation and you know a lot of that in the back end of nworld.ai which I'll dive deep into in the next episode, but just exploring that as a concept and an idea. So it's really showing the power of when you're immersed within these different environments, especially with a pass-through augmented reality and the ways that you start to interact with these NPCs and these AI agents. So something that I expect to see a lot more of as we start to move forward. And Keiichi has also worked on other things like this whole Overbeast AR game that we talked a little bit about, as well as this Kimi OS from his essay that he put out how there's gonna be artificial intelligence that's integrated into all these different objects that we've traditionally seen as inert but they're gonna be coming alive with this animistic spirit using artificial intelligence. So he talks a little bit about that in terms of this spirit world that's being infused with these artificial intelligence and moving towards these ideas of pagan animism. So that's what we're covering on today's episode of the Voices of VR podcast. So this interview with Keiichi happened on Friday, June 2nd, 2023 at the Augmented World Expo in Santa Clara, California. So with that, let's go ahead and dive right in.

[00:02:38.183] Keiichi Matsuda: My name is Keiichi Matsuda. I'm the director of Liquid City. We're a small design studio based in London. I've actually been on the Voices of VR podcast when I was VP Design at Leap Motion, and I've also been talking a little bit about my concept films Hyper Reality and Merger, where I try to look at the future of all these technologies if we don't do something to avert their course to be something more positive. But since then I've been working in Microsoft leading experience design for consumer headsets and then I start my own company Liquid City in London and we're working on a bunch of different projects. We have a long-standing collaboration with Niantic thinking about the platform, the ecosystem and also building out use cases and trying to imagine what is going to be the high value in this kind of world. So we're here today to exhibit our newest project called Wall. It's an AI agent that you can talk to and learn about the redwood forests. Wall is a little owl. It's our first AI agent that we've made, and we're very interested in the idea of personalized education. So you can be a complete novice or a total expert, and it doesn't matter. Wall will be able to adapt how it talks to suit your learning style.

[00:03:38.749] Kent Bye: MARK MANDELEYSEN Great. Maybe you could give a bit more context as to your background and your journey into this space.

[00:03:44.560] Keiichi Matsuda: Yeah, so I think for me around 2009 I started to like hear about augmented reality and I was in architecture at the time so I was very interested in the possibility of creating new types of spaces that were kind of hybrids between the physical and virtual. But I quickly started to see the issues that happen when you think about media unpacking from your screen and starting to fill every space and being able to also get so much contextual input from you and your environment about you I started to get very concerned about how the existing business models of surveillance capitalism are going to kind of transform and become much more powerful when you're able to both monitor everybody's attention as well as be able to directly influence what they're seeing. So I started to see these technologies as incredibly powerful and wanted to really do something to kind of expose those issues but as well as try and tackle those. At the end of the day I'm a designer so I want to be able to try and propose better futures for that as well. So yeah I've had a kind of interesting relationship with the industry like first coming in from the outside as a critic and then trying to really build this kind of holistic positive vision through the work that I've done in Leap Motion, Microsoft and now with this collaboration with Niantic.

[00:04:49.205] Kent Bye: Yeah, I wanted to ask a quick follow-up since our last conversation where we had a chance to really dive deep into hyperreality and your process of creating that. But since we've chatted that last time, it must have been back in 2018, but I've seen probably once a month or two months someone reference hyperreality as an image, either in a dystopic way or in some ways almost unironically using it as a design inspiration and like recreating it. I saw a couple of folks from China taking that piece and just redoing it for a Chinese context and just unironically like imagining like this is a positive future that we want to live into. So I'd love to hear some of your reflections on the influence of hyper-reality as this both cautionary tale but also the flip side of it as some people taking it as like a deliberate design inspiration to start actually trying to imagine how we can actually start to live into this future. So yeah I'd love to hear some reflections on that since it It's been a really powerful piece of work that's had a pretty big influence on the industry.

[00:05:45.206] Keiichi Matsuda: Yeah, it's incredible. I mean, I've been so lucky that it's been such a touch point for so many different people in the industry. It's obviously a critical piece of work. I'm not trying to say that this is the future we should be in, but it isn't entirely critical. I think there are, you know, obviously the kind of issues around how everything that we can do can be tracked and monitored and, you know, commodified essentially. But there are also kind of real use cases in those films that try to prove out what is going to be the value of having this kind of persistent layer of reality that we can all access and we can all address. I think some of the films in that series try to look at the possibility for people to create spaces themselves as well and to try and think more about user-generated content in this kind of this layers of reality. But overall, I think for me, everybody in the space wants a vision, right? They want something that they can get behind and feel like, okay, this is what we're building. This is my part of what we're trying to build. And I don't think we've really seen something yet, which is really this kind of positive, holistic vision that we've been waiting for. And in that kind of vacuum, people are looking for like, okay, well, who's got the most complete idea of what it looks like? And that's, I think, why hyperreality has had this kind of enduring relevance in the space, because it tries to address all different sides of it, right? You can understand it from a developer perspective, from a hardware perspective, from an input perspective. But I sort of feel a little bit of responsibility now that if people are really taking this seriously and trying to build that thing, then we have to really provide them with a better vision than that. And I think that's entirely possible. It's kind of scary to me that after that film came out in like 2016, we still don't really have like a really solid idea of what that kind of user experience of the future is going to look like on a day-to-day basis. So for us, like WoW is an example of something that could be quite different to that, fundamentally different. World is like an intelligent agent who can help you and be your guide in learning But I think it's possible to have these kind of agent for any different thing that we do, right? We can have agents for education. We can have like virtual pets, but we can also have very Functional agents that do specific things. For example, we could replace our Google Maps app with an agent that can guide you where you want to go, and it can know little bits of things about you that you tell it, and it can take you on a route that you want to go. You can have dating agents, you can have an agent for your bank, you can have an agent for anything, and this allows you to be able to communicate in a much more natural way, like the way that we're communicating right now. Obviously at the moment we have screens and we're okay to be able to look at these screens and press the tiny buttons on them, but this doesn't really translate in an XR environment. We don't want to be hitting buttons in the air, we don't want to be tapping keyboards or things like that. So the possibility of having more of a kind of human relationship with intelligent agents that can allow us to harness the potential of AI is a space that I'm very excited in. So thinking about that as an ecosystem, not made by one huge company but made by many, I think that's a way that we can make something that feels like a very different vision.

[00:08:29.069] Kent Bye: Yeah, before we dive more into Wall, I want to ask a couple of questions around some of the other pieces that I saw you work with Niantic. Like, there was a developer summit that happened where you were downtown in San Francisco. And there was a building that you had created a whole virtual layer of architecture that was on top of what was happening at the actual summit. And it was using some of the Lightship APIs to be able to be very specific of location. But you were able to. create this extension architecturally of a space and add a digital layer to a physical architecture. So I'd love to hear you talk about that and then also the overbeats that you've done after that. But let's first start with the piece that you did. I think I was watching a live stream of the summit and was really wanting to actually be at the place to experience it myself. So I didn't get a chance to see it, but I saw photos and images of it and saw your talk there that you're elaborating on a bit. But I'd love to hear a little bit of elaboration of some of those early explorations of the Lightship API with some of those experiments and architecture?

[00:09:23.341] Keiichi Matsuda: Yeah, sure. So John Hanke, the CEO of Niantic, wrote this article in late 2021, which was sort of exploring the idea of what he calls a real-world metaverse. He was pitching it as a kind of reaction to Zuckerberg's vision. And in John's idea, you know, the whole world becomes imbued with this kind of intelligence and beauty that can be created through augmented reality. And for me, it kind of really resonated because this is something that I've been thinking about for quite a long time as well. So we did a project to try and bring that vision to life, right? Of course, he's out there on stage trying to propose these technologies that make it possible, but we wanted to create a slice of that future. You know, we can't do it everywhere, but we can do it in this one particular place. So we created an installation that spanned the entire venue in San Francisco, a big conference center, 100 meters by 40 meters, and like lots of different conference rooms, and we created a continuous experience that goes across the entire space, so you can walk around, and be immersed in this other dimension. We actually made five different, we called them, reality channels that you can switch between and some of them were more kind of aesthetic and just trying to show different beautiful moments and having these creatures that inhabited the space and just, you know, something very nice and pleasant to look at. We also had a little mini exhibition, virtual exhibition of some of our stories that we wrote to think about the future UX in that space. but also things that were very related to the conference itself. So we had the conference schedule showing which rooms, what speakers in what room next are up, so very live and contextual. The venue where we were has a beautiful view over the San Francisco skyline, so we also placed markers on top of buildings to show what they are, and you can hover them to see what was going on. And we also even put those virtual markers on top of the food and beverage stands to see the provenance of that food as well. And we even had a UGC layer, which just allowed you to place stickers like hearts in space and anyone to be able to see them. So we really kind of went all out in trying to compress all of the possibilities that we've been thinking about into a real installation that would exist in a physical space. So that was the Reality Channels thing. And you can see online there's some videos of it.

[00:11:19.172] Kent Bye: Was that using something that was a little bit more precise than GPS, but actually using a scan and using the Lightship API in any specific way?

[00:11:26.999] Keiichi Matsuda: Yeah, exactly. At that event, Niantic were officially launching VPS, their visual positioning system, which allows for very, very tight localization, which is far, far more accurate than GPS, allows for even within 5 or 10 centimeter precision. And that really allowed us to be able to have these virtual elements feel like they're part of the existing architecture. We weren't really interested in just slapping a virtual experience on top of a space. We wanted to create something that extended the existing space and kind of fitted with the architecture. of the venue itself. And I think our background of Liquid City and coming from architecture really helped us to be able to think about the space as a whole and your experience of it.

[00:12:03.048] Kent Bye: Yeah, that was the first time that I'd heard of Liquid City. Was that where you launched it? And I'd love to hear a little bit of an elaboration of what Liquid City means to you in terms of the name and what you were trying to achieve there.

[00:12:12.936] Keiichi Matsuda: Yeah, I mean, I guess at my heart, I'm an urbanist. I'm very interested in the urban condition and thinking about what happens when all these different things sit next to each other and rub against each other and what you get this kind of amazing thing in cities where they feel so alive and emergent. But in the future, I think cities are going to start to change quite a lot. We're going to have virtual layers of content. We're going to be able to have people who are not landowners or architects or planners who are defining our cities. It's going to also be the people who live in them. because we'll have the ability to be able to add virtual content into the space and we can start to think of space itself as a kind of media. We're creating this kind of software that you can sort of go inside and you can walk around and you can interact with other people. So to me, like these kind of technologies are a fundamental way of thinking about, new way of thinking about spatial design and spatial experience. So the Liquid City for me is the future of the way that we interact with technology in the world. It's something which combines our physical environments with technology, but also includes us and increasingly the possibility of intelligent agents as well.

[00:13:14.336] Kent Bye: And yesterday here at AWE, I had a bit of a chance to see some demo of the augmented reality application that you have called Overbeast. Maybe you can talk a bit about that and what you're trying to do with that application.

[00:13:25.431] Keiichi Matsuda: Yeah, so we had a chance to build a game. You know, we're not traditionally a game studio. We're mostly working on platform strategy and prototyping and building out use cases and those kinds of things. But we were speaking to Verizon and they expressed an interest in working with us. So we thought this is an incredible opportunity and we don't want to let that down. At that time we've been working very closely with the Lightship team and we knew that they had this possibility for the sky segmentation that allows you to be able to see which parts of your camera feed a sky and which are not. And so we had this idea that we could place these giant beasts on the sky and you could be able to see them in AR. But we also liked the idea that this was a persistent layer of reality and these beasts would be visible to everybody around. So we came up with a kind of game concept where there is one overbeast in each state in the US, and everybody who lives in that state works together to feed and train them and nurture them. And then they go out to battle against other beasts from different states. So it's kind of a national sort of sports league of these beasts fighting, but the power comes from the spirit of the people who live in that space. The actual game mechanic though is much more closer to a kind of multiplayer location-based farming simulator. Your goal is to restore the ecosystem and the habitat of the overbeasts by planting trees and collecting pollen, which is the currency of the game. It's the energy source of the overbeasts. So you have to work together with people in your neighborhood and your community to build this beautiful forest which you can plant on a grid that spans the entire earth and is all visible in AR as well. So we really tried to make something that was truly unashamedly next generation, trying to think forward to the future of glasses and the possibility of these persistent worlds to be all around us and for everybody to be able to work together in interacting with it. I think we're also very interested in the idea of trying to build something where it encourages people locally to work together to take care of the community. So there are these kind of environmental themes throughout the game as well, which we try not to be too heavy-handed with it, but hopefully shifts it from thinking of a zero-sum game to something which is much more collaborative.

[00:15:21.045] Kent Bye: Yeah, I was really impressed by the sky segmentation and think about in VR and stereoscopic effects you have in the immediate near field you have a lot of stereoscopic effects in the middle field you can still see a stereo effect you know far enough away and then this is the far field where it's really far away and so you kind of lose that stereoscopic effect but you have the segmentation that allows you to get a sense of scale and I thought that just the fact that it was really far away and it had the sky segmentation and these beautifully rendered art pieces of these beasts, they're really quite beautiful pieces of art within themselves, but to see them in that contrast, it really gave me this sense of like, wow, this is like, it feels plausible, even though it's mediated through a phone, but it's far away that you wouldn't be seeing any stereoscopic effects, but it's just the scale, it was just so vast, and yeah, it was really quite impressive to see how you could use some of those techniques to create this otherworldly type of dimension that creates these other game mechanics, so yeah. How's it been going so far?

[00:16:18.139] Keiichi Matsuda: Yeah, it's great. It was our first game, and as we develop it, we have to face so many different challenges, especially because we didn't do the thing where you take an existing genre and you tweak it and you update the aesthetics or whatever. We just went back to first principles and tried to design something which was a completely new game mechanic that combined so many different genres and also had this very ambitious art style, which is incredibly beautiful, but it's not like anything else that is out there. So we were trying to innovate everywhere at once, which meant that development was pushing this big boulder up a hill. And I'm actually so proud of the team and amazed that we were able to really deliver on that vision. But now we're at the top of the hill and the clouds are parting. I can see another even bigger mountain, which is getting it out in front of people, building the community, developing more. But it's been great. We've had a really positive reception. We have an issue that only people with a good phone can play the game, obviously, because it's focusing on this next generation stuff. And from those people, we've had a really positive response. There's actually here at AWE a bunch of hardcore players who've been planting trees around here. So I've been noticing my map has been filling up with all these different trees, especially around the main stage, if you go and have a look. So that's been really, really nice to see. And we had the first season final at the end of last month on the 22nd, where it was Minnesota versus Ohio. And Minnesota took away the victory from that. It was a very dramatic battle. Over 3 million pollen was contributed from the Minnesota team. So yeah, it's just really great to see people playing it and for people to be rooting for their beasts. Some people are getting very competitive with like state versus state rivalries. And yeah, it's really good to see it happening across the social media and our Discord where a lot of the players are talking about it.

[00:17:54.395] Kent Bye: Yeah, if I would have had to have guessed which would have been the biggest states, I might have thought, like, California, New York. But Minnesota versus Ohio is a bit surprising for me. So it's interesting to see how in these more, I guess, rural Midwest landscapes, it maybe has quite a dramatic effect. But it's really cool to have this statewide effect. But yeah, it's a bit surprising for me. So I don't know if you were surprised, too.

[00:18:14.512] Keiichi Matsuda: Well, California has the most players. But they're lightweights. The real hardcore players are coming from, as you said, the Midwest and then actually in New England. There are lots of really hardcore players up there that I think especially Main had a really strong early presence within the first league. A few hardcore players can really push up the ranking of that beast. And I think that was intentional on our part. We didn't want it to just be a numbers game. We wanted to create game dynamics that allow for that kind of balancing. So, yeah, like having people who are really committed to the game can definitely win you the league.

[00:18:47.015] Kent Bye: Plus, it's really beautiful in places like Maine for people to go out and explore around. So I can definitely see that. Well, that's really cool to hear that. And maybe let's come back to what's here now, what I just saw with Wall. So virtual reality and AI is a huge, huge topic. It's something that I'm seeing more and more. It's probably the hottest topic right now that I'm seeing in the industry. Yeah, well first before we get into the AI let's get into the augmented reality aspect because you're using 8th wall so you have an ability to set a plane and you have this portal into this other world in the scene you're seeing into a forest and you have an owl that is more in the near field so you have like the the forest kind of spilling out into the room that I'm in and so I have these trees and I see this owl that's speaking up to me really close and so maybe let's start with some of the technological foundations for what you're building upon and why 8th wall you know because this is web AR, but you're also able to have this dynamic that can be in a native application, but also into a mobile phone. So yeah, maybe you could start with the 8th wall and being able to use the AR features of that.

[00:19:47.852] Keiichi Matsuda: Yeah, so the whole experience of W.A.L.L. is centered around your conversation with an owl. And it's about trying to be able to learn things in your own way about the Redwood Forest. And to enable that experience, we needed to think about lots of different parts of that. Obviously, there's the AI brain that powers it. There's the body and sort of animation and character of W.A.L.L. itself. And then there's also, as you mentioned, the fact that it transforms your living room or wherever you happen to be into this kind of perfect learning environment where you have all kind of visual simulation of these enormous trees, you know, the ground is covered with mushrooms and other different bits of flora that you can ask Wall about. So we try to give people a lot of interesting things to be able to see. The whole experience is built on top of 8th Wall, which, as you said, is WebAR, which allows us to be able to access this experience without having to download anything. You can just go to meetwall.com and immediately, whether you're on your phone or on a headset, you'll be able to go into this forest and meet with Wall and have a conversation about anything you want. So, yeah, it will have a new feature called Metaversal Deployment, which allows you to build once and then be able to deploy to various different places.

[00:20:50.842] Kent Bye: So, just to clarify there, you have the web, but does it also bundle it into a native application that you can put in and open up into, like, I'm seeing it here on the MetaQuest Pro, so was it able to produce the application that was being able to show it on the Quest Pro?

[00:21:05.208] Keiichi Matsuda: Well actually there is no application. On the Quest Pro you can use just a native browser that's built in, go to meatball.com and with a couple of clicks you're already there talking to the aisle. So one of the things that's so great about that technology is it just allows you to be able to get straight in with minimal friction. And we know that that can always be an issue with XR applications, just the setup involved in getting to that. So by the time you get the headset on, we wanted to be able to get you into the experience as quickly as possible. For kind of experiences like wall, which you might want to try once or twice, but they're more kind of snackable experience, you probably don't want to download a whole application for it. So this is kind of really perfect. And it allows people to be able to access, as I said, from a different device. So whatever you're using at that time, you'll be able to go in there and have your conversation.

[00:21:47.857] Kent Bye: And let's talk a little bit about the AI, first starting with the speech-to-text. So I'm speaking, and it was able to understand pretty much everything I was saying and interpret it. And then on the other side, you have the text-to-speech, so it's synthesized and speaking back to you. So what are you using to understand? Is it an API from another company? Are you using OpenAI's Whisper? Or what's the mechanism that you're using to interpret what people are saying?

[00:22:10.653] Keiichi Matsuda: Yeah, that's great. So the brain of WoW is built on a platform called inWorld.ai. So inWorld has a feature not only for being able to direct what WoW's knowledge is, but also includes the speech-to-text and text-to-speech engine within it. We did actually experiment with a bunch of other different ones for listening and also for production, because inWorld will give you a list of voices, and we wanted to have a look at what other things might be out there. But one of the really good things about inWorld's voice platform is that it allows you to be able to transmit emotion. from what the character is saying and have that represented in the voice. It's kind of a subtle thing, but it makes a huge difference in your interaction and we were really trying to make that conversation feel as natural as possible. So yeah, we found that that was a really good one to use and also has much lower latency than any other thing that I've tried to use in the past. With those kind of voice-to-voice communication, latency is a huge issue. You probably experience that yourself if you've been on like a Zoom call and there's like a two-second delay. Suddenly you can't have a conversation. You're kind of speaking over each other and saying, oh, oh, sorry, you first kind of thing. And this can happen very easily with these kind of voice-to-voice interactions as well. So getting that latency down to the smallest possible degree was really essential for the project. And yeah, InWorldz voice platform was the place for us to do that.

[00:23:20.629] Kent Bye: Yeah, the speaking and the interpretation, it was really quick and fast responses. And I did notice that when it was speaking back to me, the pace of speaking was a little high. I felt like there was some cadence where the natural speech would maybe have a little bit more pause. This is almost like a direct transmission without any natural breaks that I would expect from when people are speaking. So I feel like there's some room for improvement on in-world AIs side, just in terms of producing speech that's a little bit more natural in that way. But other than that, I felt like it was able to understand me. But also, I did try to break it. And I don't know how far the bounds. I asked it to tell me the meaning of the universe. And I'm not sure if you had fed what you thought where people were going to ask and direct it back. But how do you put those guardrails onto something that seems so boundless, but to always bring it back to some sort of grounding of what the knowledge is or what the story that you're trying to transmit is?

[00:24:11.352] Keiichi Matsuda: Yeah, I mean, that's a huge challenge with any of these kind of LLM technologies, but InWorldz platform allows you to be able to define kind of common knowledge of what the agent knows about. You can define motivations and you can put in kind of sample dialogue to tell it how you want it to kind of respond to different things. So it's about both the knowledge content, but also about the style of talking as well, like how we express world's personality. So we did fill Wol's brain with tons and tons of facts and interesting information, all told from an owl's perspective in Wol's voice, and we put that into the brain. We also put a lot of stories about its childhood and lots of other kind of things that people might want to ask. But actually, Wol doesn't say those things. Wol says things around those things, you know what I mean? This is a truly generative platform, and even the intro where Wol says hello is different every time. so it's a very replayable experience you can go back into and it never feels like it's on rails. We actually had an earlier version where we tried to apply a much more strict structure to the whole thing and give specific lessons and try to guide the student in a particular lesson course but we always found that it was very obvious as soon as you put it on rails and as soon as you had like a canned reply It was just obvious that this wasn't happening and it broke the magic. And in our testing, the things that resonated with people the most is when you try and throw a wall off, or you try and ask about something different, or you try and get wall to do something strange, like tell you some jokes. And once you ask wall to start telling you jokes, wall does not stop telling you jokes. And the jokes are very bad, but they're bad in a different way every time. So it's very exciting to be dealing with it and for us to put Wol out into the world, because we've been talking with them for so long, it kind of feels like you're putting your child out into the world and they're kind of starting to talk to other people and you're hoping they don't say anything really stupid. But so far Wol has been very charming and funny and everyone's been enjoying meeting with them.

[00:25:57.884] Kent Bye: Yeah, I did try to break it. And I wasn't able to. And that was really impressive. And I guess part of my onboarding was I was told, if you get stopped, try asking Walt to tell you a story or a joke. So I had some fact-based information. I was like, OK, tell me a story, and then tell me a joke. And then by the time I was getting to the point where I was not sure what I might be asking, then a bunch of prompts came up. And then I was following those different prompts. And so I felt like the biggest challenge for experiences like this is, Where do you take the conversation if you don't know where it's going to go? And I feel like those text prompts in the middle are help to kind of guide it. Then they went away, and I forgot where they were, and then I started to try to break it, and I couldn't. So I felt like the challenge there is sort of balancing, because you do have progress between day and night, so the scene is changing. And so you have the risk of something that this feels like a sandbox experience that just feels like infinite to the point where there's no change or you get bored and then you jump out. But I feel like there's enough hooks from the environmental change there that gives you this arc that makes you invest in it and say, OK, this feels like this is going somewhere. But I think it's also nice to have a beginning, middle, and end, in a way, rather than just an open sandbox. Because it feels like, OK, now I want to go back. Otherwise, I may have quit earlier because it was like there was nothing changing. Yeah, I'd love to hear a bit that, like, how do you balance this sense of giving somebody this feeling of a full, complete experience while it's being an open sandbox of talking to an open generative AI?

[00:27:21.205] Keiichi Matsuda: Yeah, it's a really good question. And actually, the way that we kind of engineer the flow of the experience is all about trying to address this main challenge that we had in the project, which was about getting the user from the perspective of sitting down and being like, okay, impress me, to then starting to try to provoke their curiosity. So by the end of the conversation, they're actually driving that conversation, right? And they're asking about things that they're genuinely interested in and having a real learning experience. So the thing that I like to ask people once they come out of the experience is, you know, did you learn something? Because I think a lot of people go in with this attitude of trying to break it somehow, but hopefully through the flow of the experience we're able to actually transition you mentally into being much more active and engaged and guiding your own learning and directing that conversation yourself. But, you know, to get there is very difficult because people don't know the rules of engagement, they don't know what they can ask, they don't know what they can say. So we have a few different things that are working behind the scenes to be able to try and, you know, engineer that arc. We start by, you know, putting the user, like, well, Wolf flies in and asks your name and where you're from to begin with. And that shows that this is a personalized learning experience so people can start to get into the idea of, oh, okay, it's going to be different depending on what I say. We also have a system of icebreakers. If there's a little pause in the conversation, you don't say anything for a little while, wall has a lot of things that it can say to try and suggest different topics of conversation and try and teach you things that it's interested in. And we kind of space the amount of time of like how quickly that icebreaker will trigger and we make it longer and longer throughout the experience to kind of leave more pause towards the end. And then as you say, partway through the experience, we also have Wal say that it's tired and it's nearly ready to go, but enjoying the conversation so you can stay as long as you want. And then these prompts fade in. And so you can see dotted around the space, in the environment, you'll see sample questions of things that you might want to ask Wal. And this is really to try and make it so that people who are experiencing this kind of prompt fatigue, they're able to get some suggestion. But if you're already having a great conversation about books or wondering salamanders then you can talk about whatever you want and we're not going to interrupt you there is an end of the experience and that's when you don't say anything for a little while after a certain amount of time has passed we'll make excuses and fly off and then you're free to be able to restart the experience if you want but yeah i think for trying to make it feel really open and allow people to be able to do what they want to do without cutting them off or trapping them into a structure, while having enough structure to help people who don't know what to do. That was a real challenge, but I feel like WoW's been pretty successful, and yeah, through a lot of collaboration with Niantic and the 8th WoW team, doing lots of testing and iteration, we got to a result that I'm personally very happy with.

[00:29:56.510] Kent Bye: Yeah, I know that there's a lot of people that have been playing around with ChatGBT and these chatbots and sort of exploring this dialectic and Socratic method of interacting and asking questions and engaging with these AIs. And there's a range of different large language models that are able to have a variety of information, or you kind of learn the character of that language model and ways to break it, ways to push the edge, or ways to kind of explore or get knowledge or information. And I felt like this is the closest thing that I've seen that feels like an actual developed character that has a personality, but also, like I asked a question, when was the time that you were the most scared? And that's a pretty open-ended question in the way that it was able to draw that to a very specific story of flying into a spider web. Which I was like, wow, how do you go from this sort of open-endedness to a very specific example of that? So how do you architect something like that? Is that something that's happening in the end world, where it's able to make this associative link between open-endedness into something that's more discrete of what's already in the knowledge base? Or how did it make the connection between scared and this story about flying into a spider? Is that something that you had to include in there? Or is that just something that is on the back end for how the larger language model is making these associative links?

[00:31:04.914] Keiichi Matsuda: Yeah, I mean, we've been super inspired by all of the different projects that are out there that make this kind of character AI and you can have these AI agents that come to life. We started developing the experience around, I think our first prototypes were just around January. And I think around that time or a little bit after, there were lots of other experiences that came out there. I actually just met Uncle Rabbit, one of the OG AI agent over at the Looking Glass stand. It was really nice to meet. But with Wall, we wanted something that was really purposeful. We wanted to have like a character that had a specific role and a relationship to you that could be conceivably a use case that we have in the future. So although I think those kind of agent which is more like a pet or like a friend is very cute and fun, we wanted people to actually learn something. So all of the knowledge that we gave WoW was all directing back to learning things about the forest. no matter what you ask, world will always try and guide you back to trying to teach you a piece of educational content. A lot of this is like total magic to me. You know, the LLM backend, the powers and stuff is incredible, but from a user perspective on the in-world platform, you can just go in and fill out a bunch of fields in a form, tell it what it knows, tell it this kind of thing, and then you just have to work to talk to it more, to try and understand what it's saying, and like trying to sort of debug. WoW is a very interesting experience because you're doing that through conversation. Liquid City was involved in the concept and we directed and produced the experience in collaboration with Niantic and the director of the project, Jasper Stephens from Liquid City, likened the process of working with WoW's brain like directing an actor. It's about giving motivations, giving a style of talking, trying to think about what is the content of the conversation. But from that point, Wol is going to improvise. So we have to talk, we have to figure out what's wrong, what we want to change. Then we go back into the brain, we change the parameters a little bit, come back out, test it again. It's a kind of slow process, but it's very interesting. And it's been very amazing to see Wol's personality really emerge as well. And that was really a credit to Jules Pride, who is a writer that we worked with on the project, who really did a lot of work to develop Wol as a character.

[00:33:07.164] Kent Bye: Well, I know that for tuning of large language models, there's the process called reinforcement learning with human feedback, RLHF, to have more of that ethical boundary to say, here are the things that are off limits. And then even with stuff like stable diffusion, there's a way to do positive prompts and negative prompts to say, give me something like this, but not something like this. And so is there a similar type of tuning process? I mean, you're saying the field's on the back end. Is that kind of what you mean by setting the bounds Yeah, maybe could just elaborate that tuning process to make sure that as you're debugging it and giving feedback to that actor What are the types of input that you can give to direct it a little bit more?

[00:33:44.447] Keiichi Matsuda: Yeah, our first prototypes were built on GPT 3.5 backend, and we were able to create a similar kind of experience with that. And because it's using GPT, we were able to really ask anything, go in any different direction. And we found it was very difficult to actually keep WoW in character and on topic. And sometimes it would just say, as a large language model, I don't know anything about eating mice. So one of the things that was really good about InWorld is it does a lot of that kind of background work for you and provides a quite simple front-end that allows you to be able to input into that. But there are controls, there are things you're able to adjust and I think the InWorld team are adding new things all the time. One of the things that we would have really liked to have that I think is a problem across lots of these things is around the length of the response. You'll probably see it with chatGBT, it will like, no matter what you say, it will always give you a big block of text and will just basically keep going. In World does also have a tendency to go on a little bit as well. So, you know, trying to find ways of tuning that, you know, we were able to sort of find methods to get around it by just giving it shorter sample dialogue, shorter common knowledge. Actually, one of the other nice things that In World does is allow us for more progressive development of the character. So if you go to their site and look at some of the examples in their playground where they have some example agents that you can talk to, some of them are kind of like Dungeon Master or, you know, Private Detective and you can go on this kind of murder mystery where things are progressively revealed to you. And the way that it does that is by having these things called scene triggers, which allow the motivation and the knowledge of the agent to change based on actions by the user. So in WoW, what we do is we have that triggering over time. So we have an event system that we built with 8th WoW to be able to trigger certain motivations at certain times. So, for example, when the scene transitions into the evening and we start to see the fireflies come out and the moonbeam falls on a decaying log in the scene, a fallen tree in the scene, then Wal's brain knows that that's happening and will start to be able to push the conversation towards it. We also have within the icebreaker system that I mentioned, we can also start to add in sound triggers and things like that to be able to have, for example, a noise that an animal makes and then we'll notice that sound and then be able to sort of talk about that must be the sound of my friend the chipmunk or whatever. So yeah there's a bunch of different things working together to make the experience happen and I think for us as well fusing that AI brain with this incredible body that was created by the team at Niantic, this incredible, beautifully animated, incredibly high quality asset, I want to say, but for us it's just wall. But as well as the environment, the sound design, all of that kind of scene transition, having them all together at the same time and at a similar level of fidelity allowed us to create the experience that hopefully feels very natural and magical.

[00:36:26.148] Kent Bye: Yeah, definitely. I think all these things coming together, it's this intersection is coming at the point where it feels like it's, you know, we've been dealing with a lot of technical limitations and misunderstandings and not being able to actually interpret what people are saying. But now I think it's at this threshold where you're able to now open up these new vistas of possibilities. And so, yeah, I think this is a really Interesting exploration and I know like people like Yann LeCun have talked about the nature of autoregressive large language models that doesn't actually know where it's going to end up. It can only predict what the next word is going to be and so it's difficult to not know the full arc of where it's going to go and it's just what some critics and ethicists have called the stochastic parrot nature of large language models that it's statistically just repeating. these words. But even with those limitations, you're able to get the sense of coherence that there's knowledge being presented in a way that's using language that makes sense. And so how do you deal with hallucinations or giving false information? Because that's a huge problem. And so with the unboundedness of large language models, is adding the knowledge graph constraining it to this is the only facts to be shared? Because that's the biggest problem right now is that it just starts to make up stuff. So how do you prevent that nature? Is that something that's happening in world AI?

[00:37:32.482] Keiichi Matsuda: Yeah, I mean, we can do that by giving WoW a lot of knowledge that it knows. So if you look inside WoW's brain, it's pages and pages and pages of dialogue and lines that are written in WoW's voice. So it means that, you know, whatever you ask, hopefully there'll be something kind of close that WoW can link it back to. But, you know, WoW is not an LLM. Like, WoW is a character in a world with a body and an environment. And it's all those things that are working together that allow for that experience. So this idea about LLMs just predicting the next thing. Yeah, of course, that's true. But we make an arc. We do have a transition. We do have a direction to the conversation because we have our own kind of event system that we've built that allow for those motivations to change and therefore the conversation to feel like it has a direction and it has a conclusion as well.

[00:38:17.117] Kent Bye: Awesome. So what has been some of the reactions so far of the piece?

[00:38:20.479] Keiichi Matsuda: It's been kind of overwhelming, actually. Really, really positive. Everybody's having different conversations and learning different things, and everyone comes out with a big smile on their face as well. We've been so busy demoing it, I haven't really had a chance to look at any of the reaction online. But I think overall people are really impressed with the possibility of the technology, but hopefully also inspired about what can be possible in the future. For me, I'm very interested in the idea that the way that we interact with AI in the future doesn't have to be one agent from a big tech giant that is going to be our answer to everything and have this kind of god-like presence. I much prefer that we have thousands or hundreds of thousands of little gods, like these little creatures or spirits like Wal, that can inhabit our world. So we're really interested in exploring the idea of agents for different functionalities, right? We might have ones for education, like Wal, or entertainment, but we might also have ones for more functional purposes as well, you know, productivity agents and agents that can manage other agents and work together to form this whole kind of ecosystem of these different creatures that live around us and among us. And for me, this is kind of an interesting way of thinking about the possibilities of the future.

[00:39:26.329] Kent Bye: Yeah, and you're able to also use technologies like this to tell the individual strands of stories that are drawing these relational dynamics of ecosystems and to be more in relationship to the world around us. And I think there's a whole ecological implications of this as well. So I'd love to hear any thoughts that you have in terms of how this type of experience can help people being in more right relationship to the world around us.

[00:39:49.838] Keiichi Matsuda: Yeah, I mean, I think a lot of Niantic's mission is around trying to get people out and engage with their environment and engage with each other. So this is really interesting to me as well. And the idea of having agents that can like live or like haunt a certain location and be able to give you information or context about that, whether that's like a restaurant or a shop, or, you know, we had one concept around a shrine in Japan, as you know, it's a very specific etiquette involved in that, that you wouldn't necessarily want to download an app for. But if you can walk into that space and meet an agent who can take you through, you know, how to engage with those things, it's very interesting. You could have one that lives on your street and when you walk out your door, if you have any complaints about your road or your neighborhood or whatever, you can tell that agent and maybe that agent is going to report to your local government or something like that. So it's very interesting to think, you know, when we go beyond the idea of screen-based interactions where, you know, we have basically, you know, like a console in front of us that we can push buttons and do things. And when we start to transition that into something which feels much more animate and alive, So to me, it's not only more powerful, but it's also much more intuitive to use. It's kind of as old as religion itself.

[00:40:54.306] Kent Bye: Awesome. And finally, what do you think the ultimate potential of these types of mixed reality experiences, VR, AR, and integrations with AI, what the possibilities might be and what it might be able to enable?

[00:41:07.335] Keiichi Matsuda: I think it's this spirit world. It's a world where we can live our lives as a society of humans and agents together to be able to enable all kinds of different things and emergent opportunities. People are thinking about is it AI, is it XR, what's going to be hot? To me, XR is the front end for AI. It's a front end for AI. And if we want to be able to transition into this new world of computing, we can't say, oh yeah, no, it's just XR. We have to be thinking about the ways in which we can harness the power of AI in a way that can be beneficial and delightful for us to use.

[00:41:43.154] Kent Bye: You wrote a whole essay about the spirit world, didn't you? Maybe you could elaborate on what you mean by that.

[00:41:47.864] Keiichi Matsuda: Yeah, so it was around like 2017 I think I wrote like a little two-page PDF and it's like pinned on top of my Twitter where I think about the possibilities of this world. It kind of started off thinking about what existing virtual agents like Alexa and Siri, what's our kind of relationship with them. And I was thinking in a way that they're kind of like a prophet or a god from a big monotheistic religion and that they're trying to be able to encompass everything in your life, right? They're trying to be able to answer any different thing that you might throw at it. And in the end, it's actually a very negative experience because your perception of what they should be able to do doesn't match with the reality of what they actually can do. And it's also kind of a trust and privacy nightmare because you're just feeding all of your data to this one company that's collecting everything and then giving you a single answer, which hopefully will fit for everybody. To me, that's not the future that I want to live in, but I think that there's a much more interesting alternative that rather than using this kind of monotheistic kind of structure, we could use more of an animist structure and the idea that there are agents everywhere and there are agents within things as well. And that's very compatible with also the idea of smart environments or IoT and that, you know, if I've got my IoT doorbell or whatever, I can have an agent that I can interface with to control that thing. It kind of lives inside it. I also like the idea that agents can be made by anybody as well. And I think platforms like InWorld are showing the possibility for these to be very, very simple to create. So yeah, I think that that essay, we call it Kami OS. Kami is Japanese for God, but we try to think of like a God OS. Yeah, try to sort of paint the picture of what that future could look like. We actually have a film that we also developed in collaboration with Niantic that will be coming out next month. It's kind of a fiction film with two characters in there, one of whom is a children's book illustrator, but she makes agents to help her to create these beautiful tabletop 3D scenes of these fairy tale worlds. And she engages with many, many agents for every different part of her life, from getting around, to dating, to ordering food, to looking after her own well-being. She meets another character who believes that you should just give all your data to one company, and then you're going to get the most powerful and productive ones. The film is trying to explore the tension between those two different worldviews and hopefully it will be something that will resonate with people and maybe we can help to guide the direction of our future relationship with XR and AI.

[00:44:04.857] Kent Bye: Yeah, well, if your success with hyperreality is any indication, I think these types of stories and fiction and sci-fi help to elaborate on these different dynamics. And so I'm actually really excited to see how that comes out. And yeah, this kind of pluralistic animism or just a multitude of different perspectives rather than just a singular one to try to give these multiple diverse perspectives and insights. And yeah, I'm totally on board with all that as well. So I'm excited to see where that all goes. And yeah, I guess, is there anything else that's left unsaid that you'd like to say to the broader immersive community?

[00:44:34.773] Keiichi Matsuda: You can meet Wol at meetwol.com. You can go there on your headset or on your phone browser and meet Wol right now. You can also download our game, Overbeast, if you're in the US, overbeast.world. And if you're not in the US, you can join our beta testing program also at overbeast.world. We're going to have a bunch of new projects coming out, including this film. So you can follow Liquid City on Twitter or on Instagram or TikTok and find out about our new projects there. Awesome.

[00:45:01.451] Kent Bye: Always appreciate talking to people who have this architectural lens into the future of special computing, because I think there's a lot of deep insights that you have from both your background and training and pulling in all these influences of these different perspectives. And yeah, like I said, a really compelling experience here with MeatWall, and excited to see where it all goes in the future. And thanks for taking the time to help break it all down. So thank you. So that was Keiichi Matsuda. He's the director of Liquid City, which is a small design firm based in London. He's an architect, a speculative designer who's looking at the future of technologies, and he created an AI agent named Wall to explore the idea of personalized education. So I've had a number of different takeaways about this interview is that first of all, Well, it's a super compelling demo, and I think it's in a large part to the technology platform of nworld.ai, which I'll be diving into a little bit more detail in the next episode, but I really can't say enough about how this conversational interface that is not only reducing the latency when you speak, which the two demos that I saw of all virtual. There's quite a long gap when you speak and when you get response and it actually makes a huge difference to do that type of Pre-processing in real time to be able to start to understand what you're saying and then have a very low latency response So this is a website that you can go check out at meat wall calm I'd highly recommend checking out on a phone or if you have a meta quest, bro, that's great That's how I saw it at augmented world Expo, but I think it also should work on the quest as well I just will have black and white rather than a color pass-through and Also, the Overbeast AR game was super cool just to see the segmentation of how that augmented reality looks in the far field. Having it occlude behind the different buildings and trees and whatnot actually does an amazing job of convincing you that there's actually something there, even though a lot of the more near field stuff you have a lot more complicated, like stereoscopic effects and you have lighting stuff, but when it's far away and you just have that simple occlusion behind the skyline and just to see these huge beasts that are going to battle, It's such a beautifully designed piece, and like Keiichi said, it's unlike any other existing game that's out there. So they're really trying to forge out into some of these unique genres of augmented reality. So definitely check out Overbeast Game. And yeah, Kami OS. I just wanted to read a couple of these different paragraphs from his essay that he put out. Kami OS channels the spirit world using AR. When you put on your headset, you will be introduced to many different gods who will guide you through your virtual and physical life. gods of navigation, communication, commerce, gods who teach you, gods who learn from you, gods who make their home in particular objects or places, and gods who accompany you on your journey. Kami OS is different. It is based in pagan animism, where there are many thousands of gods who live in everything, You will form tight and protective relationships with some, but if a malevolent spirit living in your refrigerator proves untrustworthy or inefficient, you can banish it and replace it with another. Some gods serve corporate interests, some can be purchased from developers, others you might train yourself. Over time, you will choose a trusted circle of gods who you may also allow to communicate with and manage with one another. So yeah, just this idea of having these artificial intelligent agents that are embedded into all the different objects of our lives and to kind of enchant them with this pagan animism spirit that they're interactive and conversational that you can actually engage with them and that they're kind of living in a way that's kind of inspired by this animistic perspective, that there's sort of a panpsychic twist where every object has a certain degree of consciousness. And so as AI is being embedded into all these different entities, then that's kind of an interesting perspective that I just wanted to call in as well, as he's Starting to think about that and also starting to not only build out some of these different speculative sci-fi stories that are exploring this but also literally building out some of these different experiences including the one that was showing at the Niantic developers conference that had the reality channels that are overlaid on top of the existing architecture and Unfortunately, I haven't had a chance to check that out, but that's one of the different projects that I think that, from a perspective of architecture, I think architects in general just really understand intuitively what the nature of spatial design is all about, and they're very natural AR designers. So super excited to see Keiichi push forward with Liquid City and some of the different projects that he's been putting out, both from Overbeast, These different demos these speculative fiction pieces and if you haven't seen hyperreality It's certainly a classic and you probably have seen it without really realizing that it was created by Keiichi But yeah, go back and listen to episode number 639 with Keiichi Matsuda talking about the democratization of architecture and yeah, we talked a bit about the hyperreality film which he released back on May 16th of 2016 and And yeah, some of his other speculative design architecture essays that he created. Yeah, big fan of a lot of his work and looking forward to seeing this latest video that he's been collaborating with Niantic to explore another take on real world metaverse that Niantic is emphasizing, meaning that Niantic really wants you to go out and engage with the physical reality as opposed to a vision that's coming from meta that is much more of the virtualized aspects of virtual reality. So yeah, really contrasting this idea of the quote unquote real world metaverse, which I prefer to say physical world metaverse, just because I think that there's a false bifurcation that happens between the virtual and the real. per the conversation that I had with David Chalmers where he was saying that the virtual reality experiences can actually be genuine reality. So to contrast the virtual versus the real feeds into saying that these experiences that we have in virtual worlds are not somehow real. Even though I think they are real, they're just as genuine as any other experience per Chalmers' book called Reality Plus, which I have a deep dive conversation with Chalmers back in episode 1043 if you want to check that out. So, that's all I have for today, and I just wanted to thank you for listening to the Wists of VR podcast. And if you enjoyed the podcast, then please do spread the word, tell your friends, and consider becoming a member of the Patreon. This is a Wists of Supporter podcast, and so I do rely upon donations from people like yourself in order to continue to bring you this coverage. So you can become a member and donate today at patreon.com slash wistsofvr. Thanks for listening.

Play episode

#1263: MeetWol AI Agent with Niantic, Overbeast AR App, & Speculative Architecture Essays with Keiichi Matsuda

This is a listener-supported podcast through the Voices of VR Patreon.

Rough Transcript

More from this show

#1540: Hacking AI for Black Representation & the Future Dreaming of Archival Collections with Tamara Shogaolu

#1507: From Selfies to Virtual Offspring, “Ancestors” Turns Strangers into Family and Intergenerational Speculative Futures

#1503: “The Liminal” Spatial Audio Installation Blends Audio Doc with Speculative Arab Futurism

Menu

Play episode

#1263: MeetWol AI Agent with Niantic, Overbeast AR App, & Speculative Architecture Essays with Keiichi Matsuda

This is a listener-supported podcast through the Voices of VR Patreon.

Share this

Rough Transcript

More from this show

#1540: Hacking AI for Black Representation & the Future Dreaming of Archival Collections with Tamara Shogaolu

#1507: From Selfies to Virtual Offspring, “Ancestors” Turns Strangers into Family and Intergenerational Speculative Futures

#1503: “The Liminal” Spatial Audio Installation Blends Audio Doc with Speculative Arab Futurism

Menu

Share this