#1260: Using ChatGPT for XR Education and Persistent Virtual Assistant via AR Headsets

Javier Fadul is Director of Innovation at HTX Labs, and I ran into him on the expo floor of the Augmented World Expo 2023. We talked about how he’s been using ChatGPT via AR Glasses as a impromptu research assistant, curriculum planner, and children’s book co-author. We also talk about how he sees how AI will continued to be integrated into virtual worlds within an educational context. Fadul has gone all in with creating casual conversation and educational dialects large language models via AR, and so he shares some of his initial insights and aspirations for where this could all go in the future.

This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR Podcast. It's a podcast that looks at the future of spatial computing. You can support the podcast at patreon.com slash Voices of VR. So this is the eighth of 17 of my series of looking at the intersection between XR and artificial intelligence. And today's episode is with Javier Fadul, who's the Director of Innovation at HTX Labs. So I ran into Javier at the Augmented World Expo floor where he had these pair of augmented reality glasses where he had integrations with chat GPT and he's starting to use these conversational interfaces as a virtual assistant but also to make different lessons plans and to create children's books and to research things on the fly as he's out and about in the world and the hands-free environment and He's also looking at education in the context of virtual reality and thinking about how these types of artificial intelligence are going to be starting to integrate with XR technologies in the context of education. So that's what we're covering on today's episode of the Voices of VR podcast. So this interview with Javier happened on Thursday, June 1st, 2023 at the Augmented World Expo in Santa Clara, California. So with that, let's go ahead and dive right in.

[00:01:21.817] Javier Fadul: Yeah, Kent, it's so good to see you. Here we are at AWE. And yeah, my name is Javier Fadul, Director of Innovation at HDX Labs. You know, the first time we connected, you were down in Houston, and it was the day my son was born. So unfortunately, we weren't able to have this conversation back then. But you know, 2019 was a really good event. And it's really nice to be here in 2023, sort of in a post-COVID world, during the most exciting time in the industry so far. Yeah.

[00:01:48.882] Kent Bye: So maybe give a bit more context for what HTX Labs is doing.

[00:01:52.443] Javier Fadul: Sure. We have a training platform called Impact that's very focused on procedural training and learning. And we have had the fortune to work within the Air Force and Department of Defense the last several years, helping keep the maintainers safe, that help keep the pilots safe. And safety training has been a big part of our narrative since the beginning. As you remember back in 2017, survival mindset and everything we're doing are in fire safety. A lot of lessons learned over the last six years that are now being applied to help scale these solutions to large enterprise. Yeah. And it's really exciting to see the community here and all the ways that they're applying it to every vertical, right? So it's exciting times, man. Absolutely. Mm-hmm.

[00:02:33.640] Kent Bye: Maybe you could give a bit more context as to your background and your journey into the XR space.

[00:02:38.442] Javier Fadul: Yeah. So I would say the summary would be cognitive background, like very interested in the processes and behaviors and how do people work, right? That's always been a passion of mine. A particular focus on perception and community, I would say that's also a big part of my journey. We organized an event in prior years around visualization of data and storytelling. It was called Visualized. This event, which was in cooperation with a team called CulturePilot and some of the TED community, if you're familiar with their groups, right? We brought people together to have discussions around using data visualization and storytelling and design to help solve really big problems. And at the end of that six years of working in that space, we organized the event in New York City, brought people from all over the world to talk about this stuff. The conclusion was essentially that these immersive displays were going to help us solve the really big challenges, because as spatial beings, using these tools that help us reason spatially would help us really be sort of our full selves in the digital abstraction worlds that we use to reason, right? And then, yeah, since then, the last seven years now coming up, we've been working in the space of bringing this to the enterprise with HTX Labs and the team there, who are doing amazing work. So shout out to everyone over there. Yeah.

[00:03:50.471] Kent Bye: Mm-hmm. Yeah, I remember doing an interview with a firefighter that I think was using some of your platform. And so, yeah, maybe you could talk about some of the specific applications and use cases that you see with HTX Labs.

[00:04:01.038] Javier Fadul: Yeah, that's great, man. Thanks for remembering that. I think one of the biggest lessons learned then is the importance of collaborating with the subject matter experts, right? So yeah, being able to bring them into the virtual worlds, have those discussions, have their insights presented in a context that allows them to be their full selves, really allows for a deeper context, solution, really connection between an instructor and a student, right? And that's something we've been really focused on since the beginning, and it's given us a lot of really great opportunity, for sure.

[00:04:31.640] Kent Bye: Well, you asked me a question about what I was interested in. And one of the things that I said was that I think a really hot topic right now is the intersection between artificial intelligence and XR. And so maybe you could share a little bit about what you've been tinkering around with for the last couple of years.

[00:04:45.664] Javier Fadul: Yeah, absolutely. It's been an amazing thing to see how quickly it has become the zeitgeist. Everyone's talking about it and we were fortunate to get early access already a few years back and it's been incredible the journey that I've been on personally as I've been really trying to understand and reason through what these new kinds of reasoning tools are enabling. And we have, at the core of our product, the idea that the subject matter expertise is the most important thing and central to the process. So we're thinking a lot about how to make sure that the humans in the loop that are part of our training system are empowered by these AI models, rather than taking the intelligence and insight of the A-models on a blank slate, if you will, right? Like, we're using them again as reasoning tools, rather than just information sources, if you know what I mean.

[00:05:36.630] Kent Bye: Yeah, there was a AI researcher named Simone Plant who was talking to Diana Polotska, who's a religious studies scholar. And one of the things that she had said in terms of AI was thinking about AI is like entering into a dialectic or using the Socratic method of being able to interface and have these conversations. And I think there's a... a key component of ways that you can engage and dialogue with these large language models, which you can see them, I see them, at least, as these stochastic parrots that don't have a deep understanding. And so I'm sort of skeptical about what they are able to provide in terms of legitimate information or if they're hallucinating. But at least they're a provocation of something to push back on when you're interacting with them. And so I'd love to hear how you think of them as these cognitive tools when you have all these potential limitations for what they are and what they can and cannot do. So yeah, how do you make sense of them as cognitive tools relative to the positives and negatives of all that?

[00:06:32.929] Javier Fadul: Yeah, absolutely. That's such a great question, Kent. And also, I love that you reference the Socratic method as well as contemporary AI within that same breath. As you and I have talked in the past, my background also is in philosophy. I ended up studying a lot in that space. Philosophy of logic, of language, how does it that brains combine information? This is part of the reason I've been so enthralled really by these new kinds of systems. But like you said, this sort of dialectic, having what you could consider an adversarial network. that's helping optimize your own neural states is a really powerful insight. The reason why deep learning in some cases has worked so well is the adversarial architectures where they have one AI that's sort of generating insight and another one that's discriminatory and trying to shape it, if you will. And having that at a next level such that your own reasoning is being sort of honed with an external objective, relatively speaking, system is a very powerful idea that we're thinking a lot about in many ways. And I'm personally thinking a lot about in relation to my kids. I have a five year old now, Kent, and a one and a half year old at this point in time. And I'm thinking a lot about the future they're going to be living in and how to introduce and leverage these systems to give them the best possible future. Yeah, absolutely.

[00:07:48.167] Kent Bye: Yeah, well, you had mentioned to me that you've been doing your own personal explorations with the large language models for the last couple of years and have pulled out some sort of augmented device. And so what kind of tinkerings have you been doing and exploring with interfacing these chat-GBT-like entities with XR and augmented reality?

[00:08:05.042] Javier Fadul: Yeah, absolutely, man. I mean, I'll tell you quickly, since I mentioned my kids, more like as a father first, right? Early on, it was just an amazing opportunity to help create the kinds of reference material and insights that I want to have my kids have access to. So on a Saturday while taking care of both at the time was a three-year-old and a new baby Yeah, we were able to synthesize a children's book based on Essentially the evolution of plants right and so within half a Saturday while I'm taking care of the kids I was leveraging both the illustration systems. It was a dolly at the time and I think it was GPT-3 at the time and and within just a couple of hours had a full-on designed curriculum around how to teach my kids the evolution of plants and continue to drive their love for nature, right? And so it's a good example, I think, of this alignment idea where, you know, if we as responsible users of the systems try to ensure that the things they create lead to possible better futures, it's going to lead to really good things. Since then, I've thought a lot about the different form factors in which these systems will help us process information and think through things. But most recently, yeah, you're right. I have been using a wearable device that allows me to communicate with GPT-4 at a very rapid pace on demand. And I'm not wearing it all the time. But it has become a really useful tool to reason through a lot of different insights, if you will. Yeah.

[00:09:30.156] Kent Bye: Can you give a specific example of how you're able to do that in a given context?

[00:09:34.972] Javier Fadul: Yeah, absolutely. It's one mundane kind of example, but I think it's kind of exciting. I left a fish store with my kids, and since it's a wearable, hands-free, I was able to, as I was walking into the vehicle and loading my kids and buckling them up, which, you know, it's a process, have the AI generate a small curriculum, like five-point insights into how having a fish tank would help my kid understand biomes, And in terms of offloading your cognitive processes to a cloud-based, artificial, general, relatively speaking, intelligence, it was really delightful to finally be back in the front seat of my car and hear the chime that gave me exactly what I needed to hear, right? And as I was driving, I started having this conversation with my kid. And it was totally seamless, having access to that information without having to interrupt the flow of my life. So it's an exciting future when you can imagine these systems being embedded into the environment and helping us in different ways, on demand or otherwise.

[00:10:33.039] Kent Bye: Yeah, maybe we could break that down a little bit in terms of I know there's things like Whisper that allows you to speak and have that translated into text. And so are you using some of this speech-to-text systems to be able to just speak in a conversational interface? And then how are you getting this result back if you're reading it or if you're having it read back to you with a text-to-speech synthesis on the other end?

[00:10:54.545] Javier Fadul: I mean, I think a lot of the audience will be interested in the technical specificity of the system I'm leveraging, but I think I'd rather point to the idea that these systems are not just language models, but they're multimodal in themselves, right? Which is to your point, I can say something and it transcribes it, and then it understands it, and it can say something back. That's like sort of an audio-to-text example. But very soon, they're going to be available and make inference based on images and context, right? So it's going to be really exciting to see, as they understand in more ways the real world that we're facing, how they'll be able to provide insights into the real world that we're shaping. And that piece of the shaping is a collective process at this point. It's humanity and these kinds of systems, which is to say, maybe broadly, technology itself, trying to figure out how to help the biome, essentially. That's at the core of where I think the long term of this evolution goes.

[00:11:49.544] Kent Bye: The thing that I get concerned about when I hear something like that is, what if ChatGBT gets it wrong and is giving instructions that's actually destructive to the biome rather than helping to sustain it? So how do you know the veracity of what you're being told in any given moment? Part of the challenge is that For the first of all, a lot of these open AI models don't have a provenance for what data is being trained on, what their architecture and the models are. And then there's often no follow-up to a direct citation. But I know that Bing, their search has had more deliberate citations where you can go look at the original sorts. And I really appreciate that. But right now, it's all collapsing it down into a single response, but having no way of really being able to vet it or interrogate it a little bit more. So how do you deal with that aspect?

[00:12:31.350] Javier Fadul: Yeah, that's a great point, Kent. Yeah, the problem around both the hallucinations generally, right, and the misrepresentation of sources, right, this is a very important piece of the puzzle that the people that are building these foundation models and productizing them have to address. And what we're thinking about some of these ideas, really, is that essentially it enhances our ability to do critical thinking. Ultimately, like that adversarial model example, unless it's a primary source, everything should be questioned, let's say. And even the primary source, as you understand it and consume it, you have to question it against your own understanding and beliefs. So I think this is sort of an augmentation of that process, which is important. That being said, not everyone in the world is as willing to challenge and question and process and think, which is why these safety guardrails are super essential for the broader distribution of these systems, generally speaking. And we're definitely thinking a lot about that when it comes to everything we're doing within our current platforms and systems as a product offering, let's say. Again, that idea of the subject matter expertise that needs to mediate these insights that are essentially multipliers, if we do it right, to help empower the people that are helping others.

[00:13:48.583] Kent Bye: Yeah, it really emphasizes for me the point that we already live in a media landscape where we have to cultivate a certain amount of literacy and media literacy for understanding how to digest information. And as AI is entering into the mix, now we have a whole other dimension of that media literacy that actually, for me at least, starts to point to other models that go beyond this monolithic, like here is a perspective. And for me, I turn to like feminist perspectives, like situated knowledges, that is saying that you can't always necessarily dissociate the information from someone's relational context and their background and their life experiences. And so, and even, you know, to some extent, their deeper philosophical, metaphysical commitments to how they are, speaking about these things. And I think that's one of my critiques, I guess, with the chat GPT is that has no ability to navigate the pluralism of differing perspectives. It just collapses everything into one singular perspective without preserving that perspectival aspect of that situated knowledges and to be able to explore more the dialectical debates and conversations. And I'm sure that will eventually get to models that are able to preserve that a little bit more. But as of right now, it seems like this violation of Godel's incompleteness, which is trying to take the entirety of the corpus of the internet and reduce it down into one single formal system, which is obviously going to be incomplete. Everything is going to be incomplete. But there's going to be with those inconsistencies, not preserving that pluralistic perspective taking of these different situated knowledges. So that's when I think about this new aspects of media literacy, there's not only stuff that has to happen on the technological architecture aspects of the AI, but even once that happens, then there's an additional component of people having to triangulate and navigate these multiple perspectives as we move forward.

[00:15:32.486] Javier Fadul: Yeah, absolutely, man. And I think this is why the convergence of the XR space and the AI space is probably the most exciting field period right now. It is the opportunity to create more context for these assistants, if you want to generalize it, or co-pilots, the way Microsoft has been branding some of it. these assistants need to have the right context in order to really be able to hone in their broad language abstraction ocean, if you will, towards a very specific use case, right? That being said, that context, you know, it's super important for us to really consider how much of the information about who we are we want to share with the system itself and who owns the different rights to the data. And I mean, the hearings that were just happening in Congress, you know, just a couple of weeks back, It's really fascinating to actually see so much alignment on both sides of the aisle, if you will, and the general agreement that, yes, there needs to be some regulation. This stuff is really powerful, especially as it comes to the larger scale models, like what GPT-4 has done and what potential future models will do when it comes to multimodality, right? So, I mean, you and I could riff probably for hours about the ways the behavior of people, their voices, their identity, right, is potentially going to be virtualized. and then who really owns that right, you know? And it's an important part of the conversation because ultimately that's the kind of insight that will make these systems truly enlightened and helpful and precise to address very specific needs that we have, right? Those are the kinds of insights I've gathered from my more recent interactions with these models is when you do personify yourself in relation to your interaction, they're able to better really hone in on exactly the things that will help you, yeah.

[00:17:14.259] Kent Bye: I'd love to hear a little bit more about the multimodal interactions that you've been having. Maybe talk about the device that you're using and then what kind of output you're getting, both from a text perspective and an audio perspective, and if you have images or video that you've been playing around with at all with using a wearable and being able to interface with these AI entities on the fly.

[00:17:33.151] Javier Fadul: Sure. Yeah, there's a broad range of applications on the multimodality of it. And I think the main examples that I'm thinking about here is really that, The reason why ChatGBT became so powerful is that language is the most core of the way that our brains map information. The interesting insight I learned recently that I was thinking about talking to you about, actually, which is funny, is the idea that some people don't have an internal monologue. You know about this, Kent?

[00:17:57.773] Kent Bye: I've heard about it. Some people, yeah, it's a mystery because I talk to myself all the time.

[00:18:01.998] Javier Fadul: Right. And I think the majority of people tend to, but there are some people who have, I guess what you could consider a more, I don't know if it's necessarily objective, but more visual way of thinking. And I think maybe less linguistic way of thinking. Right. And so that kind of speaks to the multimodality approach. You know, the fact is that the GPT-4 visual model is able to reason through images and it's able to reason through even like visualizations in some cases, right. To go back to the visualized years, data visualizations and, you know, actually, synthesize and generate visual reasoning as well as language reasoning. It has this broad range of capabilities to help us compress information in a faster way and to help us reason through context in a more powerful way as well. It's really interesting exploring all of the ways that you can interact with these AIs and what they already know just in the language context. But if you give them a little bit more insight into what you're seeing at the time, where you are at the time, what you're doing at the time, they can really help shape the message.

[00:19:01.170] Kent Bye: Well, you've been doing a lot of personal exploration for using these things. And how do you foresee some of these AI systems being integrated into what you're doing with HDX Labs?

[00:19:11.253] Javier Fadul: Yeah, that's a really good question. I think at the core it's this idea of empowering the subject matter expert. That is the core of the mission that we've been on since the beginning. And so it's going to be really interesting how we introduce them to the new methods for generating content and educational experiences. The onboarding of these processes is, as you and I have talked about many years, one of the most important things. Putting on a headset is a big ask. to have a wearable that is able to not only immerse you in these possible worlds, but also remove you from your current context. So I think it's going to be a similar process and a similar transition where we have to help people understand what are the better ways to have conversations with these models? What are the better ways to use them? Not just as knowledge retrieval tools like I mentioned earlier, but really more as a way to reason through possible outcomes. So yeah, so we're thinking a lot about that. We're also thinking about how it's going to help in terms of content generation, which is one of the most harder to scale processes. And some of our partners are doing some really great work in that area that we'll be making some announcements on in the near future as well. Yeah.

[00:20:12.992] Kent Bye: Awesome. And finally, what do you think the ultimate potential of virtual reality might be, and what it might be able to enable?

[00:20:20.671] Javier Fadul: Yeah, love to hear that, Kent. First, I'll apologize to you because I have used that question in conversations with friends and family. You know, I always attribute it. I definitely do, you know, and my references are accurate. I do point a lot of people to your podcast. But yeah, in terms of my own response to that question, I think it's a combination of answers. My key one, though, is that this helps us envision possible futures that we want to build. And so we can build the future we want to live in, help people understand what it'll take us to get there. and help us address some of the challenges that we could be facing in our possible futures. So to tie back to the AI systems, for example, at some point, humanoid robots will be hanging out with us. And that's going to be a very different world. But it'll be a lot safer if we experience it and explore it in a virtual world to help us and the us, I do think of the broader bucket of us and the systems, understand how to work together to, again, build better futures. So I'm very optimistic that this, in some ways, is the end of the beginning. We're at the stage where DXR dependencies have been pretty much checked off, infrastructure is really rolling, and the hardware is at the price point where it's really becoming accessible. and the AI systems are now possible to reason through and solve the big challenges that we're facing as a society. So excited to see where it can go. And ultimately, I just want to be with my favorite people at any time to be able to collaborate and build a greater future.

[00:21:59.970] Kent Bye: Yeah, before we started recording, you had said that it's the beginning of the end?

[00:22:05.015] Javier Fadul: The end of the beginning, right? Yeah, so the idea there is, yeah, the past many years we've been in the beginning phase, and it's about to end, the beginning of it is about to end, but this next phase is going to be the middle of it, right? So we're at the beginning of the middle, and it's going to be the most exciting time, I think.

[00:22:21.670] Kent Bye: Yeah, with the impending Apple announcement that's coming on Monday, I think, is going to be a new epoch, I think, as well. So definitely a liminal transitional point that we're all here and reflecting on it all. But yeah, I'd love to hear if there's anything else that's left unsaid that you'd like to say to the broader Immersive community.

[00:22:37.230] Javier Fadul: I think I would like to thank you, Kent, for all the work that you've done, you know, in all these years, like having the conversations, the important conversations with the community to help us really understand what it is we're doing and what it is we're building and how we solve the problems that are still definitely relevant in terms of how do we get this stuff to scale and how do we use it in an ethical and powerful way. So thanks for being a voice in that and it's the most exciting time to be part of this space ever. So here we go, Kent. Here we go. Yeah.

[00:23:04.793] Kent Bye: Awesome. Well, Javier, it was great to be able to catch up with you. And yeah, I look forward to seeing where some of your cognitive tinkering goes in the future and how it gets fed into HTX Labs. And yeah, this intersection between XR and AI is something that I feel like is coming up to this culmination and concrescence, as Whitehead would say. The many become one and then are increased by one. So I feel like we're in this mode of integration with all these things. And yeah, like you said, it's a very exciting time. taking the time to help break down your journey and your thoughts and where you think it might be going in the future. So, thank you.

[00:23:36.309] Javier Fadul: Thanks so much, Kent. And thanks, everybody.

[00:23:38.671] Kent Bye: So that was Javier Fadul. He's the Director of Innovation at HTX Labs. And we were talking about different ways that he's using these large language models and artificial intelligence and machine learning in the context of augmented reality devices and in the context of education. So I have a number of different takeaways about this interview is that, first of all, Well, it was really quite striking to see how much that Javier was just diving deep into these large language models and creating this sort of Socratic dialogue in conversation and creating lesson plans and creating children's books and just being on the fly on the road and being able to query these large language models as a sort of a virtual assistant to be able to get real time feedback and to be able to have these conversational interfaces. And I think that we're certainly going to see a lot more of this. And so. I'm a big fan of Simon Morley, who talks about how there's these different evolutionary phases of these technologies. So you start to look at the academic idea of what's possible, and then you start to look at these custom bespoke, handcrafted implementations, especially in the context of these enterprise applications. And then from there, you start to see how they ripple out into these mainstream consumer technologies. And then all of a sudden, they're in this state of mass ubiquity. So that's a big reason why I wanted to have this series of looking at XR and AI because I'm seeing a lot of the very early phases of these artists and these creators and these developers who are starting to tinker with these new technologies and start to integrate them in these novel ways. And so for Javier, he was the first person that I come across who was actively having these augmented reality glasses that had a microphone that he was able to query and get information back both in a multimodal fashion of getting both audio and text and images and At some point, he's going to be able to take a photo and get additional context on things. And so, yeah, just the way that you can start to, I mean, already with Google Lens, you have the capability of taking a photo and getting this interface with machine learning, artificial intelligence. And, you know, people have been using AI and their phones for a long, long time. But just to see in the hands-free context and more of these conversational interfaces with the head-worn device, I think it's going to be something that we see a lot more of. I guess my hesitation is some of the different inherent limitations of some of these large language models and to what degree you're able to verify some of this different information that you're getting. I know that with Bing's integration, with OpenAI's Chat2BT, it at least is citing some of the different sources so you can go and check some of those sources. But I personally would be a little bit more hesitant of these large language models Just because of the propensity to do these types of hallucinations if it's a low-stakes thing then I guess it's okay But you know, how do you know the information that you're getting is really true just as a test of chat GPT I asked it about me and to see who I am and you know The first paragraph was spot-on but then as it went on it just said that you know I had done this and I'd done that and I wrote this book and it was like no it was just Completely hallucinated and completely made up and so I guess I have a little bit more skepticism when it comes to the types of information that I'm getting from chat to BT and these large language models. But for Javier, he's doing a full deep dive and going all in and being able to think about how is it going to start to be integrated into these immersive experiences and educational context. And yeah, certainly, I can see the way that it's going to have these personalized AI agents that are maybe tuned to the way that you like to learn or the way that you like to consume information. And yeah, I guess I'm waiting for a little bit more reliable cognitive architectures beyond the large language models before I start to dive headfirst into using them in a more regular fashion. So, that's all I have for today, and I just wanted to thank you for listening to the Voices of VR podcast. And if you enjoy the podcast, then please do spread the word, tell your friends, and consider becoming a member of the Patreon. This is a listener-supported podcast, and so I do rely upon donations from people like yourself in order to continue to bring you this coverage. So you can become a member and donate today at patreon.com slash voicesofvr. Thanks for listening.

More from this show