#1149: Adding Interactivity to glTF via Node Graph-Based Extension as a Step Towards an Open Metaverse

The Khronos Group published a blog announcement on November 3rd titled, “Deliver Interactive Experiences with glTF: A Node Graph-Based Approach,” which is adding an interactive extension called KHR_Behaviors to glTF (the *.jpg format for 3D objects). This adds interactive capabilities to glTF objects that are similar to Unreal Engines Blueprint system, Unity’s Visual Scripting Language, and NVIDIA Omniverse’s Omnigraph systems.

I had a chance to sit down with Threekit’s Ben Houston, who has been collaborating with the Khronos Group on this interactive behavior extension to get a run down of how this fits into an overall development of an open and interoperable Metaverse. (see reference implementation here).

Layered Approach of glTF interactions as described in this webinar

The Metaverse Standards Forum is an independent and separate process that started after this specific effort, and it’s role is to foster collaboration among Standards Development Organization to make recommendations for standards that are in service towards creating an open and interoperable Metaverse, and does not have any power to enforce adoption or promote any one standard at the expense of others.

While glTF with interactive extensions may eventually be a building block towards creating a complex of standards that replicates many of the functionality of a game engine, but within a broader set of open standards. Moving from static objects to interactive experiences that are dynamic, unfold over time, respond and interact with the environment is certainly a key feature of an immersive virtual world, and the abstraction of that interactivity into a visual scripting language. Tune in for more details, and then check out the behave-graph reference implementation and the introductory webinar for more context.

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to The Voices of VR Podcast. So today's episode is a bit of a deep dive into some of the progress of what's happening with the overall effort to make these open standards within the context of spatial computing. So the Kronos Group has been pushing forth a number of different standards, including GLTF. And then there's a separate effort of the Metaverse Standards Forum, which whole purpose is to try to gather all these different standards development organizations and to come up with some recommendations for how to essentially recreate a game engine with a complex of all these different open standards. And they're trying to create a way to both have standards within the context of Unity, Unreal Engine, or NVIDIA Omniverse to have interoperability between them, but also to create a whole other strand of the open metaverse on the web and these whole other tool sets within the context of the open metaverse. And so there's a number of different paths that are happening there, and they're independent efforts. And so this conversation today is around how to make these objects, these 3D objects. It's the GLTF, which is commonly referred to as the JPEG of 3D objects, and they're adding interactivity onto these 3D objects. And so taking something that's normally static and adding some way to dynamically interact and engage with it. And so they have to try to decide, okay, what is the programming language? What is the approach to try to make that interactivity? And they've settled upon this approach of this node graph-based system that you could see within, say, Unreal Engine's Blueprint or Unity's Visual Scripting, or NVIDIA's Omniverse has an OmniGraph system to do this kind of node graph system. It's a way of taking abstractions of the programming languages and making it more of a visual language, but it gives them the ability to say, okay, excerpt all these different for loops and have different layers of security on top of that. So they're settling upon this node graph system. And I just had a chance to talk to Ben Houston, who's been working with the Kronos group on helping develop this standard. And he works for a three kit and doing his own process of taking 3d objects and in the context of e-commerce, but I Back on November 3rd, 2022, there's a blog post from the Kronos group around delivering interactive experiences with the GLTF and node graph based approach. And there was a previous webinar from September 27th, where they start to really elaborate all of this, but I had a chance to sit down with Ben to walk through all these different aspects of the GLTF and trying to abstract. interactivity onto these 3d objects. And so as we move forward into the future of the metaverse, I think the open and interoperable metaverse is to try to take what is already happening in these game engines and abstract it into all these component parts, and then reconstruct it into the future of a more open and interoperable metaverse. So that's what we're covering on today's episode of the voice of your podcast. So this interview with Ben happened on Tuesday, November 8 2022. So with that, let's go ahead and dive right in.

[00:02:57.310] Ben Houston: Hi i'm ben houston i'm the founder and cto of three it's a three d free commerce company with three kit we visualize people's products so we do that with a number of companies like ikea is one of the cases another one. And what comes up with this is we create basically interactive product experiences. And so a lot of the work that i've been doing with gltf recently is sort of taking that experience that we gained doing this within our company and then trying to standardize a subset of that or encode that in a way that could be wider to the broader industry. I've been involved with standard processes for quite some time. I've worked with a little bit of PBR standardization back in like 2015. I did some stuff with the OBJ MTL format on adding PBR to that. That's recently been added to Blender. And before that, I was doing some stuff with Alembic, which was another interchange format for mesh data. So I've done sort of the mesh data interchange work. I've done the material interchange work and a lot more of that with glTF group. And then this is sort of like the next level. After you've got the models and other forms of geometry, converting materials, then you want to start getting to the behavior. And then you can start having reusable elements. in different game engines and maybe different metaverses. It's sort of the holy grail. If you paid at all attention to Twitter, like, I don't know, like last year, there's been actually a lot of derision thrown at the idea of NFTs and like, what, you can buy an NFT to work in multiple games? How is that even going to work? Nobody knows how that's going to work. This is actually part of the solution to that type of problem. It may allow for people to have building blocks within games that are more interchangeable rather than just a static mesh. GLTF has been very successful. in making a high-quality transferable object that can be used at real-time. So there's GLTF importers for Unreal Engine. There's GLTF importers for Unity, 3JS, BabylonJS. A large number of game engines can now import these at real-time, and they look great. So we've solved the static mesh, even animated mesh, along with materials. But what's next is that behavioral aspect. I want a door that can open i want a gun that can shoot i want a car that can drive. There's different levels of complexities in this but this is where we're sort of driving to next that commoditization of objects so you don't have to recreate them every time in every game.

[00:05:12.819] Kent Bye: Okay. Yeah. So the Kronos group is pushing forth a lot of different types of standards. I know there is scene graph standards like USD as an example, but it seems like in some ways, if you just create glTF behaviors and then put nested glTFs, it almost is like, what's the point of USD at that point. And so maybe you talk a bit about the relationship between USD and glTF as we start to move forward. And if you imagine. what is existing right now with Unity, Unreal Engine, and Omniverse from NVIDIA. These are all game engines that are able to take these assets and have all these other layers of programming on top of it. And if you imagine that there's these other layers of programming that are included in like something like USD or if you feel like something like glTF would be able to self-contain each of that so that glTF functionally becomes like a game engine to do all the same functionality of a game engine, but all contained within the context of nested glTF. So that's the first question I had when I was looking at this is how does this glTF with behaviors and these behavioral graphs fit into something like USD?

[00:06:16.866] Ben Houston: Okay. So I'm going to separate it out. The behavior graphs are independent of the glTF nesting. They're sort of two separate groups are actually working on those as well. I'm mostly working on the node-based behavior graphs as well. That's interesting, the nesting, but I'm not an expert to talk about that. I can talk about the difference between USD and GLTF. My background is visual effects and film, like a long time ago. And so USD came out of the film world, the VFX world, and it's an excellent format. It's designed to store everything you need for rendering and interacting with the scene in the VFX world perfectly, absolutely perfectly. So it will store everything and it will store it sort of in a format that you can still edit it in. So with regards to meshes, it can store it as polygons. It can actually store some of the modifiers on top of that. It's not guaranteed to be real-time, but it can store all of that, and it can store incredibly complex materials and many different types of materials. It can store V-Ray materials, it can store Arnold materials, it can store RenderMan materials. So it is an incredibly flexible thing to store just about anything you need for VFX. And so you're right, it can store everything for a game engine as well. It basically can store anything. But that's sort of where the challenge comes. If you've used USD in production, You actually have to make sure that your pipeline every tool that writes usd can actually read that data back to the promise it does so much that you're actually not guaranteed that this other tool actually read the data you read the usd over here because it can basically do a lot. And so GLTF comes from a different source. It came from, this is going to be the JPEG of 3D, and it was purposely constricted in what it's going to support. It doesn't support polygons. It supports triangle data. It supports data that could be uploaded immediately to the GPU. Its material system is designed, there's only a couple materials, and everyone says you should actually just use this one called PBRnext, which is a single PBR material, and you should use that one. It supports animations, but it has the limitations on how many skin weights each bone can influence each skin vertex. So you can ensure that you can do that on the GPU. Whereas in USD, you can have the most complex meshes, but then you can't actually simulate those easily on a GPU. So USCC has been very prescriptive and limited in its functionality, but what it's done is it made it easy for most tools to actually support all of GLTF. So if you have a GLTF, you can then load it. USD though can store everything and so these are different needs one if you want a runtime application and you want to be able to have people create content for just random people on the internet is a great way to go because you know you can look. But you're running a VFX studio and you have. huge story sport of different tools and you want them to start communicating instead of usd is the way to communicate between them you want to format to communicate between these different permits you want to standardize on usd because it has that rich high fidelity representation and if you make sure all your tools are actually compatible if you have to actually test out then it's going to work great for you. Basically USD is used to sort of allow the reading department to create a ring your mom's apartment to make the model that your texturing group to add textures to all the different usds you can layer together then you have an animator who creates another usd this is a brilliant absolutely brilliant workflow it changed the effects. But it's not prescriptive and designed for only real time and therefore it does everything.

[00:09:45.200] Kent Bye: Yeah, that's a really helpful breakdown of the different use cases, as well as the functionality and how glTF as the JPEG of the web needs to necessarily have a little bit more constraints. But I think what you're talking about here with this latest update is adding the behaviors and the interactivity into the glTF, which gets into these questions of like, what is the programming language? Because if you're using these on the web, it's JavaScript. But yet, if you importing it into Unreal Engine, then they have C++, or if it's in Unity, then it's C Sharp, and Omniverse has their own language and system, I'm sure. So maybe you could talk a bit about this node graph system that is maybe inspired from the Blueprint system from Unreal, and then there's a visual scripting language within Unity, and then Omniverse has their own visual scripting. And so there seems to be a standardization of creating these Turing-complete node graph systems, and the decision for glTF to implement that And then how do you author that? And once you ingest something like a GLTF into one of these programs, how do you imagine that these programs are going to interpret and display those in the context of the existing visual scripting system?

[00:10:51.917] Ben Houston: Yes, we had a choice. We could have embedded a programming language. We could have said it was a WASM blob. But we decided to go with a node-based system. The reason why is this doesn't assume what engine you're running. So it doesn't require you to have a JavaScript engine. It doesn't require you to have a WASM sandbox. By picking nodes, a node-graph-based system, sort of inspired by the three you said, OmniGraph, Unity Visual Scripting, and Unreal Engine Blueprints, we're basically trying to standardize a subset of what has already become Everyone sort of landed upon this as the way to create content for high-end experiences. So we're basically taking the common set of denominators from those and encoding that into a standard. What's great about using those as the common denominators of the systems is that it should be relatively easy to be able to import these behavior graphs into those systems, so into Blueprints and then have it work into the Unity Visual Scripting and have it work. So there doesn't have to be a separate runtime just for interacting with glTFs, but it fits within their existing system. They can still sandbox it if they want, but they don't have to. It should just work. Those systems are incredibly similar. In many ways, it's sort of like the process of how PBR Next in the GLTF was created. PBR Next is sort of a PBR standardization. They looked at all the different PBR implementations in a number of game engines because it was sort of the state-of-the-art. They tried to distill that to a common denominator that was sort of best practice. And then they codified it incredibly rigidly so that everyone could implement that. Then MaterialX, have you heard of that? It's this great... So one of the problems with PBR is you get a flat material. Okay, so you get a bunch of texture maps you can plug it in. But the next level is to actually have a material graph. So MaterialX is sort of trying to encode the material graph using best practices. Sort of looking at the state-of-the-art and then figuring out the common nodes. This is sort of following in that same pattern. Look at the state-of-the-art where there's no standards, come up with that common denominator, and encode that in a standard. And then it can work really well with all those existing systems. So yeah, I think it's designed to work with them. It mimics those systems. It should be relatively straightforward to implement. We also have a reference JavaScript implementation. So if you're on the web like 3GS or Babylon, it can just work with it.

[00:13:05.974] Kent Bye: And so maybe you could explain how if you imported a glTF into Unreal Engine as an example, because they already have a blueprint system that's native to Unreal Engine. When you import an object that has its own behavioral system, how do you imagine that being displayed and integrated into the existing blueprint system?

[00:13:24.301] Ben Houston: There's two different ways that it could happen. So one is, like, it's still in the formulation process. So if our system actually does map exactly onto Unreal Engine blueprint nodes, or it's and make a few modifications to their Unreal Engine Blueprint nodes, it is likely that when you import a GLTF with behaviors, it would actually create an actor with an attached Blueprint for its behaviors. Because we have variables, we have events that was mapped onto variables and events in the Blueprint system. Now, that's one option. That may not happen. That would allow you to actually edit that GLTF after it's imported. A second option that could happen, let's say that we're very close to Unreal Engine nodes, but not exactly. If you want to actually simulate this correctly, you can't represent it fully as a Blueprint. They might still have it expose the variables and the events on that actor, but not let you see the graph. And now it's a bit more of a black box, but it still has that behavior. I think either of those are feasible, and it's sort of almost a choice on the Unreal Engine side on how they want to do it. The best would be if it actually imported as a Blueprint. The problem is, Unity, Visual Scripting, Unreal Engine, Blueprints, and OmniGraph, there still is some minor differences between the graph systems. So if we choose one over the other, we probably have to make some choices. It may not map perfectly to each one of those systems. Does that make sense?

[00:14:41.760] Kent Bye: Yeah, I imagine if I were wanting to actually use this, I would want to see it. I wouldn't want to have it just be a black box, especially if it is doing interactions that I can't necessarily have control over. But maybe that's a good segue into the different security layers that you have, like the zeroth layer, which what you're imagining is that if you upload a glTF into a sandbox web environment where you don't want it to basically have any of those interactions, they're having more predictability. So you're limiting the data processing and the dynamic control flow. And then in terms of the security, you have different layers of the allocation, iteration, data access, and web access. And so you have the most locked down version of the glTF and how it's interpreted. And then existing layers of layer one, two, and up to layer n, where it has the most expressivity, where you're able to use the full extent of all the different behavior of graph nodes. So maybe you could sort of explain this layering system and how that is being embedded into the glTF to discern what is at the zeroth layer and what is at the nth layer of the most expressivity.

[00:15:44.788] Ben Houston: This sort of derives itself from the idea that some of these things would be too incomplete and that Some people on the web, when they import a glTF, may want a little bit of interactivity, but they want to know that it's still safe and it can't use up arbitrary amounts of processing time. So the idea with the zeroth layer is that, yes, it may do a little bit of interactivity, such as a hover or a click, and then it runs an animation. So maybe you can tie a click together of an object. It runs an animation. Then maybe there's a UI you can click again, maybe some limited stuff. It's basically a subset of all possible modes. And that a subset of nodes and a limited execution amount, so that you can't really do too much, but you can have something simple. And then the person consuming these GLTFs doesn't have to worry that these GLTFs are going to misbehave, become some type of security risk. We're already looking at a layer zero, and we're also looking at a layer one. The layer one, Does allow for you to have for loops or have events that you can call your your own event and there you can basically possibly get into like a non stopping situation that's really easy to deal with. The example engine we have will actually limit the amount of processing power in each time slice. So if you were giving this GLTF a time slice on a per frame basis, you'd say you have 10 milliseconds or 5 milliseconds. And then if it doesn't do anything useful in that time, well, it didn't do anything, but it won't halt your system. Does that make sense? So you have a sandbox and you give limited resources to it on a frame basis, and therefore it's still safe. But you have to do a bit more work now in that sandbox to keep it safe, whereas the idea with the layer zero, there's not enough nodes that it can't do anything unsafe. Does that make sense?

[00:17:22.825] Kent Bye: Yeah. So there's, I guess there's going to be kind of a tiered system that when you're authoring it, that there's going to be like, like you said, the different types of if or for loops, or maybe a little bit more security risk. If you have just something uploaded and you have this logic that you don't want to have executed, that if it's in a lockdown web environment, it just wouldn't even read that code, but it may be able to have other access to other interactions or behaviors. I guess when I start to think about this as we move forward is the authoring process and this idea of an IDE or an integrated development environment, because there's the authoring for the glTF, which may be happening in something like Blender. But I don't know if, you know, when you start to author some of these interactive node, if you imagine people piecing it together within an authoring tool like Blender, because there are these different behaviors, you might want to have it actually within the context of the game engine, like Unreal Engine or Unity or or even in the Omniverse, do you imagine people importing glTF without behaviors and then adding the behaviors and then being able to export it out? Like, how do you imagine people authoring it?

[00:18:26.369] Ben Houston: So I know Adobe probably views Arrow as an authoring environment for this. As part of this runtime that we've made in JavaScript, we've also made a web-based, a graph-based editor. So you can edit graphs there as well. So we're sort of making a prototype web-based one, showing how it would work. And then we can make some there to test it out. I think that over time, more and more tools will create probably ways to export to this. I do think that even if we don't match Unreal Engine's or Unity's nodes exactly, there can be a distillation process where during export, they might change a few nodes around to achieve the same behavior. It's sort of like right now, sometimes when you create a model, you might use a Blinn-Phong model for the material. But when it exports, it exports to PBR, because there's just a bit of conversion there. There's a lot of possibilities for many of these tools to be able to create behaviors and then export. Now, what you create in Unreal Engine may not easily fit into Layer 0. It may like, by default, you get like Layer 1 or 2, because they don't care about those security models very much. But yeah, I think there's a lot of opportunities for new editing tools. As well, 3Kit, my company, we're probably going to support this format as well.

[00:19:36.675] Kent Bye: So I've seen GLTF as a standard is going to be moving into like an ISO standard. And then I've seen the phrase GLXF. Is a GLXF, is that what we're talking about here with behavioral graph or is everything we're talking about with this node based system, a part of the default GLTF standard?

[00:19:52.501] Ben Houston: Currently the GLXF has been very much focused on multiple GLTFs referencing each other or multiple different GLTFs. So behaviors is sort of orthogonal to that means you could probably use it in GLXF. You can also just use it in GLTF. So what is GLXF? It's mostly the container format for multiple different GLTFs. And so it's more that referencing multiple files, maybe orchestration of multiple files. That's GLXF. Whereas GLTF is a single, you sort of know what you're getting. Okay. And so behavior graphs can be in either. they don't necessarily have to be in one or the other. But people could choose to only use them in GLXF, but there's no reason. Like, if I had a single object, I could have a top and it could spin. I don't have to reference multiple GLTFs for that type of scenario.

[00:20:41.539] Kent Bye: Okay, so I'm seeing here in the chat that GLXF is a proposed format for GLTF scene building. So is GLXF like the equivalent of USD then?

[00:20:50.906] Ben Houston: USD is both GLTF and GLXF. So a USD can contain a material, it can contain a geometry, it can contain both, it can reference multiple USDs. USD can do anything. That's the great thing about USD.

[00:21:08.009] Kent Bye: Okay, so everything we're talking about here with glTF as the spec that you're imagining that all these behaviors will be a part of the eventual finalized glTF spec, or is there a separate spec for all these node graphs?

[00:21:19.151] Ben Houston: So all of glTF has always been done as a series of extensions. So there's the base extension glTF currently working with 2.0. And then there's a series of extensions that specify additional material properties. So a lot of the materials that I say that glTF does great are specified by extensions. Same with this. This right now, I think the tentative name is KHR underscore behaviors.

[00:21:42.481] Kent Bye: What's that stand for KHR? Khronos. Oh, okay. Okay, so Khronos behaviors. So as I watched your presentation that you gave, kind of introducing this, there was a detailed graphic that talked about different aspects of the core functionality, the scene graph functionality, the animation functionality, the input method functionality. flow control functionality. So there's a bunch of different things, like in the core functionality, there's things like the basic flow control, the events, the logic, the actions, and the variables. So maybe you could talk about how each of these different components are building towards this churn completeness, or maybe if this was directly inspired by, say, the Unreal Engine Blueprint system, how they're able to have a taxonomy or these different types of behaviors as they're interacting with each other. So you're creating these boxes and they're interacting with each other in this node-based system, but they have, in the scene graph system, there's like events, logic, variables, and actions. Maybe you could elaborate on that a little bit just to explain.

[00:22:42.423] Ben Houston: These are actually just both in Unity Visual Scripting, OmniGraph, and Blueprints. They just have different node types. So these are different node types. Usually in Unreal Engine, you'll see a red node. That's actually an event that starts execution. It might be on update or tick. It might be keyboard event. It might be mouse click event. It might be proximity event. That starts execution. And then you connect other nodes via these flow links, like the little arrows in the Unreal Engine. And then you can have a bunch of flow control nodes. That can be like a branch, a sequence, a delay node, those types of things up for loop those are your flow control nodes and then that can change your path of execution then you might have some queries queries will basically get you information about the scene or larger context you can ask where a player is what the transform of an object is what is the current key press or is a key down or not those are queries the logic is sort of taking different values either from like variables or from a query and then doing like mathematics with it so i could ask is This transform, I can then offset the translation from that by like five units in the x dimension. That sort of logic includes the math nodes. So I can do a bunch of math operations to come up with a new position I want to have. And then actions are sort of affecting something. So all setters are a subset of actions. So if I'm going to set a transform, am I going to have a player take damage? Those are examples of actions. Now, we are not expecting to have damage as a built-in system as part of glTF. That's going a bit too far, at least initially, but yes. So those are the main types of nodes. Now you also have events and variables. So variables give you state. So I could have a variable, like I could call it, is door open? And it's Boolean. And now you would know if the door is open. So every time if the door, someone comes up to this object and the door closes, Then I could set that variable, and then you could read that externally to that object, that actor. Events, you can have custom events we're envisioning. So custom events, you basically name your own event. You can maybe call it open door. And then you can have another event called closed door. And then this becomes the external interface to this object. So this door, now you can see when you're in other actors, they can actually see that I can cause things to happen on this actor, this door actor. I can cause it to open, I can cause it to close. Then I can also ask what its current state is by asking the variable. is door open. That would be a query, right? And causing these events to occur on this actor, that would be done by actions, which would then start execution within this actor.

[00:25:14.580] Kent Bye: Make sense? So it sounds like for each of these different types of nodes that combine all together, you're able to replicate because it's Turing complete. You can then replicate any algorithm you want with all these different node types combined together. Is this sort of the baseline of all the minimum types of nodes you need in order to achieve that Turing completeness?

[00:25:34.482] Ben Houston: Uh, Turing complete, uh, you can actually make very simple things Turing complete. I think TuringComplete is mostly, the reason people mention TuringComplete and some people get scared is because it means you don't know when things will stop. Maybe it's going to use up all my CPU resources. It's just going to keep executing something. You can get that pretty easily. Just make a really big for loop. Or you can have an event just call itself. It keeps calling the same event over and over again. Then you've got an infinite loop. So that's sort of what they mean by TuringComplete. It just means that you don't know when it's going to stop. You don't need all these events and all these different type node types for TuringComplete, but you need it to have a useful behavioral system. So these are mostly to be useful and to map onto existing known concepts.

[00:26:15.023] Kent Bye: So that, I guess in some ways, that implies that as you start to implement a glTF object into a scene, whether it's in a game engine or if it's something like a nested version, that's more of an open standard, that you have your existing programming that you're creating and you have your own process of optimization. And then as you're adding these glTF objects, it may be taking up resources as it's being processed at runtime. So you have what I see as a potential optimization nightmare where you have all these what could be black boxes that are off kind of running their programs and you have to somehow dial down in. And so I'd imagine as you're importing these GLTFs, you'd want it to be translated into your, your existing blueprint system so that you could have more of a stack trace and be able to determine where there may be some optimization bottlenecks. Otherwise these GLTFs are introducing a lot of these unknowns when you're trying to really dial down and optimize things, you know, especially in the terms of XR, you have to have things at a certain frame rate. So I'd love to hear a little bit about how does that start to play into your design decisions. And when you are adding these behaviors, it's great to have that interactivity, but the trade-off is maybe having a little bit more of an occluded optimization process.

[00:27:25.418] Ben Houston: I think as long as it can import into say a blueprint graph, which we talked that that would be very, very nice if it did, because then you can modify the behavior and see it. I think that's fine. I'm not too worried that people might make inefficient node graphs. For the same reason that, like, when you find a random 3D model on the web and you import your game engine, then you find out, oh, wait, it used 4 million polygons to make a box. And you're like, well, that sucks. And now my game is slow. You're going to have the same issue. You cannot standardize making good content. You still have to have, like, the standard allows you to get the content in. Making good and efficient content you still have to make sure that it has a good rating from the store you buy maybe the store or the source has done some analysis to make sure that doesn't contain ten thousand dollars to open a door. And then hopefully the tool that you imported to also gives you some stats on is this an efficient object or when i'm running my game forty percent of my cpu time is running on this door that hasn't even opened. Because I don't know if someone decided to put a huge for loop in there for no reason. So I think the standard process cannot make good content. But hopefully our tools can allow us to identify when content is bad. And we should have ways to surface good content.

[00:28:37.534] Kent Bye: Does that make sense? Yeah. Yeah. And certainly anybody who's downloaded a object from like Sketchfab and try to put it into like a game engine, they've run into that where it has to be then optimized or decimated, or there's the process that maybe you have the baseline there, but you may have to go in and edit something if it does have something that's unreasonable.

[00:28:57.194] Ben Houston: Yeah. And so I think it's going to be the same as pulling in some random mesh. And I think that you're going to find some people in some stores have just amazing meshes. I've run into certain stores where they make most of their meshes themselves rather than just reselling others. They often have a very high level of quality and consistency that you might not find in a more open marketplace. And I think that'll happen. You'll have people who are very good at making reusable game objects. And when you get it from them, you're like, yep, that was from Joe, and Joe knows what he's doing.

[00:29:28.045] Kent Bye: I've had some experience with playing around with Mozilla Hubs, which uses 3GS on top of A-Frame that are pulling in more of a scene graph that has their own way of composing up these different scenes with their declarative systems that they have. And the challenge that I found is that I wanted to have objects that had more dynamic movement. Even animations were, at the time at least, more difficult to get an object into Mozilla Hubs and have it move around. And being able to be interactable is a whole other layer that you'd have to at that point, have a whole layer of JavaScript on the... What would require you to write JavaScript for the interactivity? Right. So on the outside from the game engine and then how to do that. So I would love to hear you expand a little bit about how like animation as an example, where you're able to animate dynamically, how the GLTF is moving. And if you imagine that there's other existing extensions for animations that are being generated from other methods of animating and moving GLTF, or how's that different than say the node-based animation functionality that you may be implementing into GLTF?

[00:30:31.703] Ben Houston: Node-based systems generally do not replace existing animation systems, but they act generally often as coordinators. So if I have a door, it's not a great example, the animation is very simple, but you're probably going to have two existing animations in that system called open door and closed door. Like maybe it has a handle that first turns and then the door opens and it may not have a consistent velocity as it's opening, but when the person comes up to that door and interacts with it, that's where the node graph And it's determining, is the door locked? And then if the person does the right thing at the door, then it runs the animation to open it. And then if the person walks away and maybe the proximity between the player and the door is far enough, it automatically closes. And then it'll run the closed-door animation. So the Node Graph system generally is a coordinator of existing functionality in the GLTF. So we don't maybe switch between existing materials. It may run animations. there's a sound extension in the GLTF, it could trigger sounds as well, but it doesn't have its own sound system. It's merely using the existing capabilities within the GLTF. So this NOAA Graph system mostly is about modifying the GLTF and playing its existing capabilities, but doing so via triggers and proximity or other ways the user can interact with it. Now, it is possible to manually set translation and scale and position. So if you really didn't want to have, say you wanted an object to just follow a player everywhere he went, that should be possible. You're manually setting the position. Now it's not running an animation, it's doing something more manual. That kind of stuff should be possible. It's more work than an animation. And now you have a coder doing it. It is better to, generally animation quality is better if you have the artist make it in a true artist tool than rather having a programmer. But sometimes you need very specific functionality.

[00:32:20.697] Kent Bye: Yeah, I guess in the case of Mozilla Hubs, it uses A-Frame, which has its own way of dealing with extensions and locking things down. So it's a declarative system that is essentially the framework that allows these scene graphs to be defined in this declarative code-based system. But I guess as we're moving forward and we have these new standards, each of these systems are going to have to adapt. And I guess there's this dialectic between people who prefer to just write all the code out versus the visual scripting approach. And I'm wondering how Unreal Engine has dealt with this. If there's still the ability to have, like, if people prefer to just write out the programming language, if there's going to be a way to write it out that way and then do a translation so that it translates that code into a node based system, if that already exists within the context of Unreal Engine. Or if you imagine that people are just going to have to get used to authoring these behaviors in a node graph way, because that's just easier to do than trying to write the code to then translate that code into this node graph standard.

[00:33:21.776] Ben Houston: Yeah. Okay. So let's see. Um, yes. Number of really good questions there. I think one of them was, you're talking about who authors no graphs, or if some people like to code and some people don't, what I found, is that if you require everyone to code, you have to have pretty good programmers on staff. Programmers tend to be overloaded, expensive, and actually they don't have the artistic talents that your artists do. Whereas node graphs generally are much more accessible to artists and to TDs. And they can actually be much more productive in these systems, because they often avoid the need to compile everything. They're sort of like what you see, what you get tools. So now you have your artists who have a very good visual sense of what they want, or game designers who may not be hardcore coders, or TDs, all being able to create content in your game. And it's really hard to shoot yourself in the foot with a node graph. It is super easy to screw up code. So these make it more accessible to a wider audience to create interactive behaviors. So that's why I'm sure you can code everything. That's a rarefied person that can code really, really good stuff all the time and also be good at the art and the interactivity. So I view this as democratizing content creation. And I think that's actually been very proven out with Unreal. While you have always been able to code in C++ to create behaviors in Unreal, Almost all the behaviors for most major games are all done in blueprints, because then anyone in the game who's working on the game can actually get in there and start modifying things. So yeah, I think it's phenomenal. Now, it is possible to take most code and convert it from say a programming language to events, but it's like a lot of work. I'm not sure, like theoretically it's possible. Whether it's going to happen, I'm not sure. I think for the most part, it's probably going to come from various GUI type tools like Adobe Arrow or 3Kid or Unity Visual Scripting, OmniGraph, Blueprints. That's sort of where you're going to be authoring this content. That is probably more likely. In the future, maybe if you want to have code that just runs It's more like you want a WASM block or something. That's like embedded JavaScript. But that has a lot of security risks. And that's sort of the issue. When you have arbitrary code there, you can't audit it as easily. Whereas when you have a node graph, you can basically say, you're only allowed to use these nodes. And you can audit it super easily. And it's not presupposing the runtime environment. And you don't need a compiler.

[00:35:45.688] Kent Bye: I guess when I'm imagining this as it moves forward, I would want to be able to, like the analog of hitting the play button in a Unity game engine, where it then starts to throw you into the simulation where you're able to then go in there and actually play around. So when you talk about these GLTFs that are interactive, How do you imagine people testing individual GLTFs and what do you do when GLTFs are actually interacting with each other? So you have two objects that are sending data back and forth and you'd want to just see how they interact with each other. So you'd really want to have like this real time environment for that to kind of see if it's. working within your sandbox environment before you start to implement it or import it into the game engine. And so do you imagine that you'd be able to simulate this interactivity either as one glTF and you're able to play around with it or these glTF objects interacting with each other?

[00:36:36.740] Ben Houston: We're probably not going to create a full game engine just to like to create a new standard. We are creating the ability to create behavior graphs using a web-based GUI tool. I've shared some screenshots of that in prior presentations. and you'll be able to sort of view how one interacts. I think that in other tools, various GUI tools, you can see how that your object's going to interact when then you export it to glTF. You don't have to see it in glTF to begin with. Like I can create really good glTFs in Blender because it has the same material model. So I don't have to like preview the glTF in real time while I'm editing in Blender. I just have to make it look correct in Blender. And then when I export it, I know it's going to work. So I think that If there is an ability to export GLTFs from Unity, I believe, so if they could do that with behaviors, that would be phenomenal. I think that's the best case scenario. You can import and export behaviors from all of these different tools, and it just works. You might have to use a subset of all of the tools because we won't support everything, but if you're using that subset, then it just works. That would be fun.

[00:37:41.808] Kent Bye: Yeah, and I'm curious to hear your thoughts in terms of where we're at with creating the open metaverse and how adding these behaviors as an extension to GLTF, how that's one step towards creating something where we have these proprietary game engines like Unity, Unreal Engine, and Omniverse, and they're all great with what they do, but if we want to replicate the core essence of what they're doing, how you see this latest innovation of trying to look at what those game engines are doing and trying to abstract them out into this common standard, how that fits into this vision of moving towards more of an open metaverse?

[00:38:17.914] Ben Houston: I think that this adds basic interactivity. This is just the first step. So we're making layer zero and layer one. We have ideas what could be in the other layers, but we're not there yet. I think that this will achieve basic interactivity. You can click on things, things react. It's not always clear what the needs of the metaverse are, because the metaverses are so diverse right now. Like, am I making a skin for Fortnite? That's very specific to the Fortnite skeleton. Am I making a gun for Fortnite? But then I have to interact with, like, the Fortnite skeleton standards. So making interoperable objects for the metaverse requires some understanding of the context of how that object is going to fit in with the rest of the game. And that's not always very clear or even transferable at this point. Like getting an object into Fortnite is very different from getting an object into Roblox or into the sandbox. They have different sort of game engines and points to the game. So it's hard to know. I think we'll at least be able to have like a chest for like rewards you can come up click on it opens up and then maybe something can pop out so you can put something into it so you can create those games quicker but like making a sword that can kill people in fortnite but also in roblox that's hard because we're presupposing a lot of contextual knowledge about how that game operates and that that's like a few steps ahead maybe like layer seven but like for now we're going to keep like this is a missing gap we're going to start filling the gap with the simple stuff And then if that works, we're going to go to the next layer and the next layer. Like some of the layers are like, let's add sound. Maybe we can do some network traffic or something like that, or maybe can be aware of geospatial for like AR applications. That's very interesting stuff, but it's super hard to fit into the existing game logic. Yeah so i think it's mostly for authoring to start with it'll be like hey if you use these hey i can bring in these interactive objects into my authoring environment so i i don't have to make doors and chests that open and stuff like that and then it gets you most.

[00:40:19.765] Kent Bye: Yeah, I think there's a delicate balance of trying to add this more complex interactivity while also not over-complicating some of these basic standards to the point where no one wants to implement it or work with it just because it's too difficult to integrate with or to author these, because then you basically created something that's too complex for anybody to actually pragmatically use.

[00:40:43.120] Ben Houston: That makes sense. Yeah, that's part of the problem with USD in some ways. It can do everything. And it's so much work to implement everything that nobody supports all of it.

[00:40:56.194] Kent Bye: Yeah, that's a good point that taking them a little bit more of a constrained approach with like glTF, like you're saying, it's deliberately constrained to fit a use case. Like the JPEG is an example. For me, at least that's a metaphor where, you know, sometimes you want a PNG or a PSD to have more layers. And so, but the JPEG is something that is the most minimal way of expressing an image. And just the same, this is maybe the minimal way of expressing some of these functionalities. And I know that the Kronos Group has the Metaverse Standards Forum, that is, maybe it's a separate or parallel effort, but I'm curious what this behavior is for GLTF, how that fits into the Metaverse Standards Forum, and if this was independently created with its own efforts within the context of GLTF. It wasn't independently created.

[00:41:37.531] Ben Houston: It started before the formal announcement of the Metaverse Standards Forum. And it is with right now, this interview is about the Kronos Group work on behaviors for GLTF. I will be talking with Guido who developed USD later today for the real time conference. We're going to be doing a session sponsored by Adobe where we're talking more about interoperability in the metaverse. We're going to cover some of that there. There is interest in standardizing this in a wider fashion, but in many ways, The Metaverse Standards Initiative, it's not really set up to create new standards, but it's more to pick winners. At least initially, because it's sort of like, hey, why don't we use this and have everyone do that? But there isn't an existing standard for node-based systems. There's just Blueprints and Unity and that. So this is trying to be one that then can be picked to then standardize on. Does that make sense?

[00:42:32.493] Kent Bye: Yeah, I guess a metaphor that I would have is that at the end of the metaverse standards form, maybe you have the complex of all the different standards required to replicate the functionality of something like Unreal Engine, Unity, or the Omniverse. But this glTFWithBehaviors is maybe one of the component standards that may be a part of this larger complex. that like you were saying, like USD is implementing everything. So in order to have the more complexity of a game-like engine, you would need maybe a combination of other types of standards to really reach that point where you're able to create these immersive and interactive environments with the combination of all these different standards. So it's more about coming up with those standards while this is more narrowly focused on the GLTF and creating behaviors that's able to be implemented into both game engines and the web, but to just focus on these objects or like the JPEG of images. This is like the JPEG of 3D objects with the GLTF with interactivity.

[00:43:26.054] Ben Houston: Yeah. And this is a standard. If at some point USD goes, Hey, we think that's great. You've done a good job in coding behaviors in a interchangeable way. Maybe they could also add it to USD. That would actually be cool. I think that kind of stuff is possible because then you can author it and save it to USD. And then when you want to use it on the web, you just distill it to GLTF and it just works. Right. And vice versa. We don't want to have another geo. The worst case is we make one node behavior system that's trying to be a standard and then someone else creates a different one that's just ever so slightly different. And then everyone's got to sort of, well, that usually happens though. But it'd be nice if there was only one and like we did a good job. Right.

[00:44:07.385] Kent Bye: Yeah, especially if you have buy-in from each of the different game engine developers who are willing to implement the full specification of all these things. Cause I think that's the, you can generate it, but if no one can input it and use it, then I think that's where it starts to maybe not be fully adopted because it doesn't have the wide adoption. You want to be able to author it and use it when all these different places. Yeah.

[00:44:29.684] Ben Houston: And that is the goal. Unreal Engine has been participating in this as well. But I do think even before that the game engines tend to be laggards and adoption. GLTF was out for years before they started importing and exporting them. What happened first though was it was embraced by the web community. The web community tends to move pretty quickly. And so I would expect that this gets embraced by the web community first and then later on gets adopted in other areas.

[00:44:56.203] Kent Bye: So how does this interact with something like Three.js or Babylon.js? Cause I think those are the ways in which that these are JavaScript frameworks to be able to work with these. Would you imagine that there would be an implementation to be able to say, import something into Three.js and then in some sort of editor, be able to either see, or when you actually execute it, it's able to then just interact. You would imagine that frameworks like that, they're able to make use of some of these things. How would you see that playing out?

[00:45:25.863] Ben Houston: Yeah, I love 3JS. I've been a significant contributor almost for about a decade now. So we've created an open source behavior graph library written in JavaScript. And one of the frameworks we've integrated it with is 3JS. So that's how we're proving it out.

[00:45:43.815] Kent Bye: Tell me again, because there's 3JS, but there's 3Kit. Like what does 3Kit do again?

[00:45:48.098] Ben Houston: 3Kit is a company that sells product visualization solutions for e-commerce. It is distinct from 3JS. They just, we do use three JS, but it has no formal affiliation.

[00:46:00.364] Kent Bye: Gotcha. Okay. So you're, you're using it as a part of the display of these objects.

[00:46:05.606] Ben Houston: Yeah, but it is distinct. Yeah. So the behavior graph library is this one. I'll just link to it.

[00:46:12.169] Kent Bye: Okay, great. I'll add that in the show notes description so people can check that out.

[00:46:17.851] Ben Houston: This is a realization of the specification. So the idea is we're creating this library so that you can see the behavior graph systems run. You can see how well they perform. We can test out ideas as we're trying to come up with the standards. Because it's always a challenge to come up with a standard without actually having a working version. You can make a lot of mistakes that way. We're just not smart enough. You have to simulate it in your head, right? So here we are. We've got a working version. We're integrating with 3GS. It's designed to also be easy to integrate into Babylon. And there's a screenshot of the behave flow library, which is a GUI for creating these graphs. So we have another open source project, and then that can be tied in. So if you want to make a creation environment, I would be surprised if something in the future, like Mozilla Hubs, will include that type of library for creating your behaviors. Because it makes it so much more accessible. Remember I was saying that once you went to Blueprints, like I remember Unreal Engine 3. It was much more harder to create content for it and like blueprints was a revolution like people make fun blueprints now they're like oh my god you can make horrible graphs in it that are a mess yes but that's better than not making content at all which was the case before it's sort of like a the curse by success but blueprints was revolutionary.

[00:47:30.957] Kent Bye: Awesome. And finally, what do you think the ultimate potential of all of these spatial computing technologies and tools, virtual reality, what do you think the ultimate potential of spatial computing and 3d on the web might be and what it might be able to enable?

[00:47:46.583] Ben Houston: I don't know. Everyone's seen the sci-fi films where we're wearing AR glasses somehow that don't take up our whole head walking down the street and you see your friends wearing AR clothes that don't exist. That would be cool i think we're a little bit far away from that so the question is what's the near term successes that will drive us forward and usually you have to have an economic reason to be doing them. Like with three kit we've embraced. Show the virtualization of products to be able to create unlimited product that is a huge game changer for a ton of companies and then we also because we did the price we do in a are we gonna do room planners. we can do try-on, virtual try-on. So to me, that's sort of like taking all this interactivity, machine learning, and these standard processes and turning that into something that has e-commerce value. And then that sort of pushes for the technology because that sort of funds my ability to work on this. So I guess I'm a practical person in the end, but I've seen all the sci-fi films. It's very practical until you get to the sci-fi outcome.

[00:48:51.982] Kent Bye: What's interesting to me is just how so much of the discourse around the metaverse is centered in gaming and things like Roblox, Fortnite, VRChat, Rec Room, Minecraft. You have all these games and people are trying to imagine how that gets translated into these larger contexts. But what I find interesting about the Kronos Group and all these efforts is that it's these development of these standards that are going to really dictate how the metaverse is really going to unfold. And so to see not only what the standards are being created, but also to look at the companies and how they want to use them and how there is an economic use case for many of these things that may be not intuitive for when people think about the metaverse, that there really is this business enterprise context for some of these things that are really driving the standards forward and how this is going to maybe eventually get back into the more exotic sci-fi versions that we see in the movies?

[00:49:42.594] Ben Houston: Yeah, glTF is incredibly valuable, especially for like these e-commerce things, because if companies, when they digitize their products, they make sure it's compatible with glTF. That means now it can be interchanged. You can take that glTF, you can make an Unreal experience, you can make a Unity experience, you make a 3Kit experience. Maybe I can combine glTFs from multiple different vendors to make a room planning experience. Phenomenal what is possible there and then maybe in the future maybe if you want to create a game you can actually just go to like some furniture vendors and pull down their furniture and start using it maybe there's like. They might license it in that fashion and then you don't have to create content like. I think people are not going quite that far but people are standardizing on GLTF for basically digital goods your virtual versions of their products. And maybe don't have configurators when i pick this couch from creighton barrel maybe comes with a configurator that i can say i want the three c version or i want the love cedar the single c version and it just changes that would be a neat object. Yeah i think it isn't driven by the game engines one of the problems with the game engines is they don't view interoperability game engines often as the top priority. Because when you're already successful, people are already making content for you, so you don't need that interoperability. But it's sort of like these non-game engine situations where they don't have everyone working in one game engine. They don't have 200 people working on a single game. They have different vendors working with various tools. Having a standard like GLTF allows for all these tools to sort of work together. It's a different world. It's just a completely different world.

[00:51:12.168] Kent Bye: Yeah. And in the demo, you have an object that you're able to interact and change and see how it, how you have different configuration options to change the color or whatnot. So just this node based system to be able to have interactivity, just to show how you have different versions of the same product, but with different skins or different colors, different materials. And so, yeah, there seems to be a strong e-commerce use case for some of these things. Um, as we, uh, just wrap up, is there any other final thoughts or last words that you'd like to communicate to the wider immersive community?

[00:51:40.356] Ben Houston: I think it's been a long time coming for a standardization of note graphs. This happened with PBR for the PBR material model. It's happening with material note graphs through material X. And this is happening for the behavior systems. It's very exciting. For me, it's actually an honor to be able to be contributing to this. It's really cool. Hopefully, it can have a lasting impact on the community.

[00:52:02.905] Kent Bye: Awesome. Well, excited to see where this goes, because I think it's a key part of the overall complex of like we talked about these different standards development organizations that are moving towards the metaverse standards form. But this is one component part that, like you said, this is a separate effort and it's exciting to see a development on this and to see where it goes and how it starts to create yet another tool of learning the lessons of the game engines and what has been developed there, but to kind of standardize it into these other contexts to be able to be flourishing on these online contexts and the future of these open and interoperable metaverse realms. So thanks again for coming on and help breaking it all down.

[00:52:38.874] Ben Houston: Yeah, well, thanks, Kent. You're incredibly knowledgeable about this space. It's easy to interview with you.

[00:52:44.067] Kent Bye: So that was Ben Houston. He's the founder of 3kit, but he's also been working with the Kronos Group on this GLTF, a node graph-based approach towards making interoperable GLTF objects. So I have a number of different takeaways about this interview is that first of all, Well, this is the process of trying to take all the different component parts of what we already have with these game engines like Unity, Unreal Engine, and NVIDIA's Omniverse, but they're these closed proprietary platforms. If you were trying to make a game, they work perfectly well for you to be able to use that in that context. I think one of the things that I find interesting about what's happening right now is, in this vision of the open metaverse, is to try to abstract each of these different component parts of something like a game engine into all these different objects, and then also from the more top-down of the metaverse standards form of trying to find the different standards to do interoperable aspects of the metaverse. But in essence, the way that I at least think of it is that we're trying to, in some ways, recreate all the different core functionality of those game engines into a set of these different open standards. The Metaverse Standards Forum is something that is different. Ben said something along the lines that the Metaverse Standards Forum was trying to pick winners. There was a follow-up email that I got that was clarifying that that's not exactly right. I just want to read this. It says, The Metaverse Standards Forum exists to foster collaboration among standards development organizations. It doesn't have any power to enforce adoption or promote any one standard at the expense of others. I think that's the process of all standards, is that they're recommendations, and that a standard will either live or die based upon the larger community's decision to adopt it or not, and the utility that is serving. That's just the point that the Meriverse Standards Forum is trying to come up with, the complex of all these different standards. There's something like these other standards with these interactable objects. What was interesting and striking to me was that Ben was saying the game engines were real laggards of adopting these standards because they already have their existing workflows and pipelines and ways that they're adjusting these 3D objects. For them, it wasn't a high priority because there's stuff that already works. But in terms of trying to create these similar functionalities on the open web, something like Three.js or Babylon.js, they quickly adopted a lot of these open standards. So he said that a lot of the fastest innovation in terms of adopting these emerging standards has been on the web where they're trying to bootstrap a lot of these similar functionalities that as they go from more of a static mindset into these more dynamic interactive mindsets, then they're building towards from the bottom up of this open web stack towards this vision of trying to have this same capabilities of a game engine. So yeah, just really interesting to see how the node graph based system is able to abstract all these different aspects of the algorithms and to see how there's existing systems within Unreal engine and unity and the omniverse as the Omni graph system. So with all these things, they have these existing ways. And so this open standard is trying to replicate that. They have a JavaScript reference implementation to then develop that standard and then the process, okay. Now, how are each of these different game engines going to ingest this? How does it interpret that information in the context of their existing, let's say, Blueprint, or Visual Scripting, or OmniGraph systems? But there's also the whole opportunity to do more of a greenfield implementation within the context of the open web, where you can start to have additional tools to be able to implement that as well. So, yeah, it's still very early as we start to implement all these various aspects of all these things. And, yeah, I'll be keeping track for each of these different announcements as things progress and move forward. You know, the Kronos Group is really on the frontiers of defining all these different things and what the open metaverse is going to be. And for me, as I look at the future of the metaverse, It's going to be really defined by these standards and how all these standards come together. And we have the capability and the experience of having embodied experience within, say, Fortnite or Roblox or Minecraft or VRChat or Rec Room or MetaHorizon World whatever your favorite social VR application or virtual world application that may not even have a VR component, that these immersive experiences are giving some flavor of the metaverse, but the metaverse is going to also be defined by all these other enterprise applications. So if we look at Nvidia's GTX keynote where they are talking about all the different implementations of spatial computing and all these different enterprise contexts and what Microsoft is doing with all their different enterprise contexts and also what the Kronos Group is doing with all these different collaborators and all the different reasons where there's a real economic outcome on the other side of it of using these different types of spatial computing technologies to be able to build up a complex of all these different open standards towards interoperability and eventually towards this vision of the open metaverse. Anyway, for me, if you take a look at the Kronos Group and all their standards efforts, as well as some of these different enterprise applications of the metaverse, I think that'll actually, for me at least, give a better sense of how there's going to be an economic driver that's going to be developing some of these core functionalities that'll eventually get back into the consumer space and the more experiential aspects. But for now, I think if we take a look at some of these core building blocks for this, adding one step at a time. And like Ben said, if they are able to abstract this node graph based system for implementing these interactive behaviors, that may be a separate extension that is maybe split off and implemented as an open standard into something like USD. So yeah, trying to come up with all these different component parts as they move towards this vision of trying to recreate the open reverse. So, that's all that I have for today, and I just wanted to thank you for listening to the Voices of ER podcast. And if you enjoyed the podcast, then please do spread the word, tell your friends, and consider becoming a member of the Patreon. This is a list of supported podcasts, and so I do rely upon donations from people like yourself in order to continue to bring you this coverage. So you can become a member and donate today at patreon.com slash voicesofer. Thanks for listening.

More from this show