#421: Khronos Group President Neil Trevett on glTF: The JPEG for 3D Objects

Neil_TrevettThe Khronos Group announced that the open standard of glTF was gaining momentum by some of the key players within the graphics industry before SIGGRAPH this year. glTF provides a standardized baseline and interchange format to deliver 3D meshes to different tools and services, and it’s been described as being analogous to the JPEG format for images. The traction for a glTF open standard means that one of the fundamental building blocks for the metaverse is coming into place.

I had a chance to sit down with the Khronos Group President Neil Trevett at SIGGRAPH where he explained the significance of the emerging consensus around the glTF standard. He expands upon what glTF includes, and what it doesn’t. For example, there are not (yet) any point clouds or light fields within glTF, but that glTF is extensible. He also emphasized that previous efforts for an open formats such as VRML and X3D have included definition of run-time behavior, but glTF is meant to be simply a general-purpose, lightweight container for 3D objects and textures. The code and logic for what to do with these assets will be left to the application coded in any language such as JavaScript, C#, C++ or other emerging languages

LISTEN TO THE VOICES OF VR PODCAST

Neil said that many major companies had been working independently on proprietary formats for transmitting 3D asset data so that agreeing on a common open standard prevents fragmentation and silo’d content that can only be understood by a single application. glTF is solving a different problem than authoring formats such as COLLADA, which enables exchange of 3D objects between all of the major authoring programs, and instead focuses on the efficient transmission of 3D assets to a run-time application, a much simpler problem. The glTF spec was released by Khronos in December 2015, the feedback from a growing number of companies such as Oculus and OTOY has been positive.

There are extensions being developed for glTF, such as physically-based rendering to compactly describe realistic material properties. But Neil emphasized that they want to keep the initial glTF specification lean and simple in order to make it simple to implement and to maximize adoption. They’ll be paying attention to industry adoption, and popular extensions can be rolled into future versions of the official glTF specification.

There’s a glTF validator that’s already available, and for more information, then be sure to check out this glTF resource page on the Khronos Group’s website.

UPDATE: I’ve incorporated a number of clarifications from Neil into this article.

Subscribe on iTunes

Donate to the Voices of VR Podcast Patreon

Music: Fatality & Summer Trip


Support Voices of VR

Music: Fatality & Summer Trip

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. My name is Kent Bye, and welcome to the Voices of VR Podcast. So right before SIGGRAPH this year, there was an announcement that came out from the Kronos Group talking about GLTF, which was this new interchange format for 3D objects. And John Carmack was quoted as saying that this is kind of like the JPEG for 3D objects. So at SIGGRAPH, I was talking to a number of different people to try to figure out how GLTF was going to fit into the overall ecosystem of virtual reality, and I eventually went to the Kronos Group party and had a chance to sit down with Neil Trevitt, who's the president of the Kronos Group, and really gave me an overview of how this GLT format is serving a lot of the open problems that are out there within virtual reality and will kind of become one of the foundational building blocks of the metaverse. So we'll be talking about GLTF, what it is and what it isn't on today's episode of the Voices of VR podcast. But first, a quick word from our sponsor. This is a paid sponsored ad by the Intel Core i7 processor. VR really forced me to buy my first high-end gaming PC, and so Intel asked me to come talk about my process. So, my philosophy was to get the absolute best parts on everything. Because I really don't want to have to worry about replacing components once the 2nd gen headsets come out, and the VR min specs will inevitably go up at some point. So I did rigorous research online, looked at all the benchmarks, online reviews, and what I found was that the best CPU was the Intel Core i7 processor. But don't take my word for it, go do your own research, and I think what you'll find is that the i7 really is the best option that's out there. So this interview with Neil happened at the Kronos Group party that was happening at SIGGRAPH in Anaheim, California from July 24th to 28th. So with that, let's go ahead and dive right in.

[00:02:03.501] Neil Trevett: My name is Neil Trevett. I work for NVIDIA. I'm also the president of the Kronos Group, and I'm chair of the GLTF working group inside Kronos. and we've been working at GLTF to define an asset format. It's a file format, not an API. Most of the standards that Khronos does are APIs. So this is like Collada or some of the other file formats we've done, but it has a very specific purpose and it's because we've identified, we think, a gap in the ecosystem that needs to be filled, and that is a low-cost, low-complexity, straightforward way of transmitting scene data, 3D scene data, across a network. And the most obvious use case, but certainly not the only one, is transmitting 3D assets to a WebGL application, which is by definition on a network, so that's why it's in front of the line to get this problem solved. And we're kind of calling it like the equivalent of JPEG for 3D, and it's a bit tongue-in-cheek, but actually it's a surprisingly accurate analogy because if you look at the different data types out there, like music has MP3, images have JPEG or PNG, videos have H.264 or VP8, VP9. 3D doesn't really have an equivalent. There are authoring formats like FBX or Collada or kind of CAD formats like OBJ, but there isn't a format, surprisingly, this far into 3D. There isn't a format that is both effective and simple to use, compact because it has a binary representation, and has the full representation for a scene, geometry, textures, shaders, animations, and skins. So what we try to do with glTF is define this file format that it's not rocket science. Pretty well everyone who's doing this has done their own version of it. The value of glTF is that we can all agree on one. And that means we can avoid siloed content. So the example I like to use is imagine the dystopian existence we'd have if every web browser and every application used its own format for an image. You'd have to Either the web browser wouldn't work, you'd have to use a different web browser if you want to access someone's content, pictures, or you'd have to do expensive translations between formats and it would be a disaster. And that's exactly where we are today in 3D, before we have GLTF. And that's why I think the last few months we've seen a real surge in momentum. Everyone from John Carmack, released last week, expressing support. Oculus were very strongly in favor. Other companies have expressed support, Autodesk, Adobe. everyone realizing that this is not a rocket science problem and this is something we can solve and everyone seems to agree that what they were doing was pretty much like GLTF anyway so hey why not just use GLTF and then looking forward I think there is the opportunity to apply a little bit of rocket science So, for example, we've been talking to the MPEG consortium, and it is interesting. MP3 is MPEG, JPEG is from MPEG consortium, H.264 is from MPEG consortium. These are the guys that have done the streaming and compressed versions of other media type file formats. And it turns out that they've had a mesh compression technology called 3DGDC, it's a snappy name, that's been waiting for this kind of opportunity. You get much better compression of a mesh than using GZIP because you know you're compressing a mesh, so a priori knowledge nets you. So they've offered that technology for us to use in glTF in a future version royalty-free, which is important and companies like Fraunhofer are doing quite cool stuff with streaming 3d and there's lots of talk about physically based rendering in the future version of glTF and So, GLTF 1.0 is pretty simple and straightforward, which is its strength. But in the future versions, there are some interesting ways that we could take the standard for compression and better quality representation of materials and surfaces.

[00:06:07.396] Kent Bye: So right now, I'm just trying to wrap my mind around a way of thinking about this. If you create a 3D scene within Unity and export a binary, it sort of, in some ways, can contain all the different assets that it needs. And it's kind of like a package that contains all the data that it needs to render that scene. So is this essentially like an open format to be able to recreate a scene that can be rendered in virtual reality then?

[00:06:31.558] Neil Trevett: Yes. So without getting too detailed, glTF is very simple. It uses web standards as far as possible. So the scene is described by JSON. So that's a JSON file. But the JSON refers to some binary payloads like JPEG textures. We use JPEG for textures. Binary representation of mesh, which is in this typed array, which is the native format that WebGL uses. So you can just take that mesh and just drop it. I call it shovel ready. So it's a very compact format and it does have animations and skins, so exactly as you say, you can describe a scene. But the critical thing is it doesn't dictate at all what the application does with those assets. We call it runtime neutral. And this is the key difference between GLTF and some of the prior formats like VIRML or X3D, which definitely have their place and definitely have their uses. but as a general universal asset format. They have a file format too, but they also define the runtime. So, you get a vermo file or an X3D file, there's one thing you can do with it, process it in a way that X3D defines. So, it's good for some verticals, but it's not universal. So, glTF is just a bucket of polygons, which means different applications can do anything it wants with these assets. So, if you have, to answer your question, if you have a virtual reality application, Now, you can use those assets however you wish. So we're not prescribing at all or limiting how people innovate with virtual reality applications. It doesn't matter what platform it is. It doesn't matter how you're going to display it. GLTF isn't trying to solve any of those problems. It's just trying to get assets from one place to another. And I think it's particularly relevant to virtual reality because just like the web, you don't want to have to load a different web browser or a different app every time you go to a different place on the internet. I mean, that would be crazy, right? The whole point is a universal browser can access any content on the web. And we're about to find, I think, VR, and particularly even perhaps AR, when you're out and about in the world, you're going to find content in the world that you want to access and use. If every place you go to, if you go to Nordstrom, and their offer, 3D offer advert to you, is in a different asset format to Macy's, and you have to load up a different app, again we fall back into this dystopian siloed content future which is going to be a huge limiter on how fast VR and AR can grow. We want to have a universal format so anyone can serve up content that any app or browser can understand. Like with non-VR apps, if you download Doom, it's a gigabyte download and it's worth that big download and it's a big entertainment experience. I think lots and lots and lots, perhaps even most AR and VR experiences are going to be short, interstitial experiences and you're not going to want to download something special. You're going to need to want to grab that data and know how to process it in the application without having to do anything special. And the content providers are going to want to be able to get their content out there everywhere. So it is just like JPEG, but for 3D. It's just a way of representing those assets.

[00:09:41.668] Kent Bye: I get the sense that this is a way to kind of bundle together all the 3D assets that you may need to generate a scene, but I still don't get a sense if there's any logic or code that's included within GLTF that is going to enable any sort of interactivity, or if that's going to be some separate sort of format to be able to do something with the GLTF.

[00:10:01.080] Neil Trevett: No. GLTF, and I think this is the main strength of GLTF, deliberately does not define anything about logic, about how you use the assets. We want to enable a huge diversity of applications and runtimes to use those assets however you wish. And again, I think that's kind of where X3D and Virmal limited their adoption. And it was understandable at the time. because they were trying to bring 3D to the web when everything was declarative. It was a markup language for 3D, which fitted into the web model at the time. But now we have JavaScript that's running almost at native speeds, and we have a lot more flexibility, which is why the time is right for having a much more runtime-neutral or completely runtime-neutral format in the web, in WebGL. Your JavaScript can interpret the JSON scene graph however it wishes.

[00:10:49.394] Kent Bye: In talking to Jules from Otoy, he mentioned that he sees a future of being able to actually also include digital light field objects within GLTF. And talking to Bob Petty from NVIDIA and asked him whether or not he thought the IRAE from NVIDIA was going to be able to export into the GLTF format as well. And he said, well, we have to kind of wait and see whether or not this is going to be a viable format. Because there may be some additional innovation that we need to do in order to do that. So it seems like the same sort of open versus proprietary tension that may be happening there. But I'm just curious to hear your perspective on whether or not you see that GLTF is going to be robust enough to also include digital light field objects, or if that's going to be something separate.

[00:11:33.926] Neil Trevett: So, from a technical point of view, the answer is yes, with some explanation. So, GLTF doesn't have light fields, it doesn't have point clouds, it doesn't have a bunch of stuff. It has meshes and textures, animations today. That's what it does. But, there's definitely potentially markets that would need those alternative representations of 3D data. In the short term, GLTF is extensible, just like OpenGL or Vulkan. So anyone, without permission from anybody, including Kronos, can come along and say, this is how I want to represent a point cloud or a light field as a GLTF payload. In my JSON, I will point to the light field data. Anyone is free to define that for their immediate customer and market needs. And I think over time, if that gets picked up and used, then sure, it's certainly a possibility that that could find its way into the core spec. I think you have to be careful, because if you make the core spec too big and tricky to read and to generate, then you begin to defeat the purpose. Some things I think will make it in to the core. My favorite is physically based rendering and that seems like such a low-hanging fruit because it's a compact description of surface properties and you can unpack it in very flexible ways on the client. That's kind of a no-brainer, I think. Obviously point clouds and light fields are much more specialized so they would probably stay as extensions either for longer or perhaps even forever. There's no problem having an extension for a particular market if the market needs it. It's actually very interesting. I mean, Jules at OTOI gave us a supportive quote, too. He has been very supportive of GLTF, which is very interesting and it's awesome, you know, because OTOI do such awesome work. And it's interesting because it's quite a long way from the original design goal of GLTF. GLTF was not designed to be an authoring format. It was designed to be this like WebGL transition format. But the feedback that Jules is giving us is that he does need a couple things. But with those couple things, it kind of gives him everything he needs. So it's interesting that simplicity of design is a virtue. That's the lesson I'm extracting from this. Keep it simple. It does have advantages. Collada, for good reasons, had a lot of complexity. It was trying to be the lingua franca between any authoring systems. And it is complex. It was so complex we find it difficult to export it and import it. We want to avoid that trap with GLTF.

[00:14:09.045] Kent Bye: By calling GLTF the JPEG of images, I immediately think of the compression loss that can happen within JPEG, and you can dial down the quality that you want. And so I'm curious if there is a lossless version of GLTF, or if it's kind of inherent that there is going to be some compression and some loss, and if there needs to be sort of an equivalent of a PNG of GLTF.

[00:14:30.062] Neil Trevett: Well, right now, just to be clear, so there's no confusion, GLTF 1.0 has no compression. It's binary, so it's efficient and compact, but we're not compressing. So by definition, it's lossless. But if we do begin to adopt compression technologies like 3D DGC from MPEG, that has a dial. You can say, oh, less quality, higher quality. But I don't think there's going to be one compression standard that fits all. Just like in images and video, there are different codecs for different uses. It's an art form, flexibility versus fragmentation. It's always the art form in defining effective standard to get that balance right. But there's nothing in the design of GLTF that limits us. We can have as many defined payload formats as the industry ends up needing and wanting. And I think the design process of OpenGL, where you try stuff out in extensions, and the extensions that get wide adoption, well, they're good candidates, potentially, for bringing in to the core.

[00:15:30.512] Kent Bye: And so why now? Why was this announced just this weekend? And why is there so much momentum? You know, this is a specification that was announced a while ago. So what were the contributing factors that you think were kind of leading up to this moment where this really needed to be taken to the next level?

[00:15:46.360] Neil Trevett: Now we've been working on it for a couple years. We ratified the spec and released it in December and I think kind of the six months that have passed since finalizing the spec and now demonstrating strong support from luminaries out there like John Carmack and Jules Herbeck from Otoy. It just took six months to get the word out there and to demonstrate that this is good. And the other company that's really strongly supporting and they were at the BOF that we had here today at SIGGRAPH is Oculus, with CarMax support, really came out strongly today. And they actually showed the format that they were developing in-house, and they were kind of like, it's almost identical. And that was the story we hear over and over again. We were doing something almost like this, but, oh wow, you guys have done more work than we did, and everyone else is going to agree with us? This is awesome. So we'll just switch over to using the standard. I think it's just taken six months to kind of educate people what we were trying to do, the reasons behind it, and to get people to agree that this is a good thing for the industry.

[00:16:46.404] Kent Bye: So what do you think this is going to enable then? What does GLTF as a platform, what does come out of that then from here? Well, it's not a platform.

[00:16:56.125] Neil Trevett: I guess a format, yeah. Right. But that's important because there's nothing as pretentious as a platform. It really is not rocket science. It's going to enable people that create content to be able to, without friction, get it out there to many different platforms and devices. And it's going to enable them to create client applications, including web browsers with WebGL. That's just one example, though. Examples of applications that want to bring down 3D assets and do interesting things with them. It's going to enable them to reach out into a huge diversity of servers and content providers and, again, without friction, access the content they need and use it in the way that they want to. It's just removing friction in the industry.

[00:17:40.863] Kent Bye: And finally, what do you see as kind of the ultimate potential of virtual reality and what it might be able to enable?

[00:17:47.327] Neil Trevett: Well, virtual reality and augmented reality? Well, I mean, virtual reality is going to be a way to be transported into other realities and other experiences that we couldn't experience in any other way. So I think we're just beginning to imagine a number of different ways that virtual reality could be used. But again, I think it's going to be a real spectrum from the blockbuster, big virtual reality experience down to these much more smaller interstitial, like it'll be as casual as you go to a web page. Now you'll load up this brief virtual reality experience and you might only visit it once. It's that end of the spectrum that GLTF is really going to enable because you won't have to download something special to interpret the special data that's coming. You'll be able to roam the landscape of VR experiences just like you roam the web today in 2D. And AR, I think it's going to be even more critical because you're going to be out there in the world. As you wander through the real world, there's going to be thousands and thousands of sources of content that are location-specific, activity-specific, that you're going to want to overlay, augment your reality with. And again, if you have to download a different app for every one of those, the whole thing just collapses in a heap. You want to be wandering down the street, getting the data that's available in the metaverse out there, without friction, without having to do anything special. It's kind of obvious, it's not rocket science.

[00:19:22.090] Kent Bye: Yeah, it sounds like the GLTF is going to be like a building block to the metaverse, is what I'm hearing you saying there.

[00:19:26.255] Neil Trevett: And only one. I mean, the metaverse is going to take thousands of things to come together, and this is just one small piece. But it's an important piece, and it's kind of a fundamental piece. And it's not that hard to solve, so it's good we're making progress. Awesome. Well, thank you so much. OK. No, you're welcome. It's good to see you.

[00:19:43.606] Kent Bye: So that was Neil Trivett, he's the president of the Khronos Group, and he was talking about glTF, which was this new interchange format to be able to create this open standard for 3D mesh objects. So, I have a number of different takeaways from this interview, is that first of all, glTF is kind of like the JPEG for 3D objects. And it's actually kind of like the PNG for 3D objects, because there's no compression initially, but eventually there will be compression, so it will be kind of like more the JPEG for 3D objects. And so I think the big takeaway here is that there needs to be some open standards to be able to deal with transferring 3D assets back and forth efficiently so that we can do all sorts of different new things with virtual reality technologies. I think primarily the big use case here seems to be not only with being able to move around 3D assets within the process of the content production line for whatever you're creating within VR, but also within delivering this content over the web. And so being able to actually serve GLTF files over the web browser and be able to unpack them and then be able to render them out within the browser. Now, it sounds like one of the key differences about GLTF is that there's no code or logic about what to do with these 3D objects once you actually open it up. It's just simply the objects without any intelligence or logic built into it. I kind of think about the World Wide Web and how it's kind of got this separation between code and design, with code kind of happening within the combination of HTML and JavaScript, but the design is happening all within the CSS. At least the 3D objects and assets are kind of being referenced within that CSS code. And so in some ways, the previous iterations of VRML as well as X3D seem to be a little bit more fusing those two things of having both the 3D objects as well as some logic. But it seems like that is a little bit too prescriptive because people may be showing this on the web or they may be actually rendering this out in Blender or in Unity or all sorts of different outcomes. And so to actually define that logic language within the asset, I think doesn't really make sense. And so they're trying to simplify things in order to come up with the standard format to be able to enable all sorts of new things. And so in a lot of ways, I think about glTF as kind of like this primary fundamental building block for the future of the metaverse, for what it's going to take. And one of the quotes that really caught my eye was from Jules Urbach from Otoy saying that they were going to be able to use this GLT format in what they were doing. And so in my mind and conceptualization, I was really trying to figure out like, well, does it mean that this GLT format was going to actually be able to encompass the full complexity of point clouds or digital light field data? And it turns out the answer to that is no, that a lot of what Otoy is going to be doing is Doing a bit of translation to be able to take these digital light fields and kind of create this... ...mesh-like format to be able to use GLTF. And so, Otoy essentially wants to build all of what they're doing on top of the basis of Open Standards, like GLTF. but that there's going to be certain extensions that people are going to be able to add on into glTF. And I think they're trying to keep it really simple at first, but essentially some of the extensions, like one of the things that Neil said that he has a personal favorite of physically based rendering, So these extensions are kind of add-ons that you add in. And so if there's a bit of community consensus in terms of the extensions that everybody ends up using, then eventually they'll start to think about, well, OK, well, maybe this is really meant to be part of the core specification and kind of roll that into the future iterations of glTF. And so I think one of the other ones that Jules mentioned was this open shader language as well. So it seems like this is going to be one of the fundamental building blocks and ways for you to export files from, let's say, Tilt Brush or other Maya, Blender, 3ds Max, all these different content creation tools to be able to export this and be able to deliver it into many different contexts, whether it's to the web or online. So one of the things that I wasn't sure of is to whether or not this would be the equivalent of kind of like a Unity executable binary, which it's not it's more like a photoshop file or more accurately like a bundle of 3d assets that are all kind of contained within one file but no specific ability to be able to pop it open and have any type of dynamic interaction that's happening when you're in a vr experience and so that's going to require adding more code and other logic So that's all that I have for today. I wanted to thank you for listening. And if you enjoy the Voices of VR podcast, then please spread the word, tell your friends, and tell the world by leaving a review on iTunes. And if you'd like to financially support the Voices of VR podcast, then become a donor at patreon.com slash Voices of VR.

More from this show