#648: New ARCore Features, Google I/O Highlights, & Open Questions about Ethics & Privacy

nathan-martzGoogle announced some new features for ARCore at Google I/O last week including Sceneform to help Java developers integrated 3D content into apps, augmented images to trigger immersive AR experiences off of trained images, and cloud anchors to enable multi-player AR experiences in the same environment. I had a chance to catch up with Nathan Martz, Lead Product Manager of ARCore, at Google I/O to talk about each of these new features, where AR is at and where it is going, and the fundamentals of ARCore including position tracking, environmental understanding, and lighting estimation. I share some of my highlights from a number of the experiential marketing demos at Google I/O, my impressions on the Lenovo Mirage Solo, and some of the open questions around privacy and ethics at Google.


Members of Google’s engineering teams do not have the authority or expertise to comment on larger ethical or privacy questions about Google’s products or policies, and so there were a number of questions about ethics and privacy that Martz wasn’t qualified or authorized to answer. This speaks to the larger issue that Google currently doesn’t have a specific contact person who is authorized to discuss the larger ethical or philosophical implications of their AI, AR, or VR technologies. There were many ethical questions that were raised from Google’s Duplex demo of AI talking to a hair salon receptionist without disclosing itself as AI, and there will continue to be many ethical questions around how much of our surrounding environments that Google will need to scan and monitor in order to determine the context of computing. There needs to be a better process for technology companies like Google to engage in ethical dialogues with the media and the public about how immersive technologies are designed and deployed.

Ambient computing & spatial computing will have an increasing need to collect more and more information about our surrounding environment in order to fully understand our context in order to better serve our needs within each of these different contexts. This means that technology will continue to desire more and more intimate and private information about our lives. There will continue to be complicated tradeoffs between the benefits of the amazing functionality that technology applications can afford but also with nuanced costs of the overall erosion of our fourth amendment rights to privacy and the risks of these surveillance capitalism business models capturing data that could be also shared and used by abusive governments — or breached and leaked onto the Dark Web.

The ethics around privacy is a huge open topic with Google, and they symbolically swept privacy under the rug by quietly announcing their GDPR updates to their privacy policy and updated privacy tools on the Friday after Google I/O. Google could have announced their new GDPR regulations either before or during Google I/O, and make a very public commitment to the changes they’re making with regards to their new privacy obligations. But they didn’t. They waited until the Friday afternoon after a three-day festival after journalists were finished covering Google’s latest advances in AI and AR. All of these amazing AI innovations would be impossible without the data they’re collecting, and so this type of behavior reinforces the impression that privacy is Google’s unconscious blindspot that they don’t want to have an honest conversation about.

Both Google and Facebook have been taking a very reactive approach to discussing the implications of biometric data. The companies have yet to deploy any technologies that have biometric sensors like eye-tracking or facial tracking for recording emotional sentiment, but Google was showing off emotional sentiment detection in some Google I/O flower experiments but also in TendAR AR demo that originally premiered at Sundance and was shown again at I/O. Neither Google or Facebook have made any public comments on the unknown ethical thresholds and implications around biometric data from immersive technologies.

Oculus’ privacy policy allows for the tracking of physical movements, and Oculus’ Max Cohen told me that the data they’re recording is at a very low sample frequency. The problem is that Oculus’ privacy policy doesn’t specify any sampling frequency, and there’s no obligation for disclosure if Oculus decides to increase the sample rate of what data are recorded. GDPR has obligations for Facebook and Google to disclose what identified data are being recorded, but there are no policy obligations to report or disclose what de-identified data are recorded. They can capture whatever anonymized data that they want, and it’s not going to show up in any of their privacy tools. Oculus’ privacy policy has a lot of vague and open-ended permissions for what they can record, while Google still hasn’t disclosed many specifics of what they’re recording with AR or VR.

Google’s general privacy policy doesn’t have any specific sections for data collected from the use of immersive technologies. Perhaps more tools will be deployed by GDPR’s enforcement date of May 25th, but until then, there are many open questions about: What data are being collected by AR and VR? What data are tied to our personal identities? What are the obligations of Google to disclose and report what de-identified data is captured and stored? As VR & AR immersive technology evolves, then Google is going to have a lot more access to biometric data like eye-tracking data, emotional expressions, facial movements, and eventually galvanic skin response, EEG, EMG, and ECG. How does Google plan on treating biometric data? Will they record it? Will they connect it to our identities? Is it possible that de-identified biometric data could actually have biometric keys that could unlock that supposedly anonymous data and transform it into personally-identifiable information? What are the risks to having massive amounts of biometric data breached and leaked onto the Dark Web?

Ethics and technology is going to continue to be a huge topic in the evolution of AI and VR/AR technologies, and companies like Google and Facebook need to evolve how they directly engage with the public in an embodied dialogue about these topics. These companies should really have cross-functional ethics teams focused on bridging the gap between the technological potential and the larger ethical & cultural impact on society. These technology companies are becoming larger and arguably more influence than a lot of governments, but there’s little to no democratic feedback mechanisms to engage in debates or dialogues about the trajectory of where technology leads our society. Technology decisions will continue to be made before there’s an opportunity to fully evaluate and discuss the ethical implications of the technology.

If there were cross-functional teams focused on ethics, then representatives from these teams could have an embodied dialectic with journalists and the public about the ethical implications of their technologies. Without a clear point of contact, then these types of ethical discussions have a one-way asymmetry where Google takes an action and then there are a lot of reactive discussions in the media and on social media without an opportunity to directly engage in a dialogue in real-time. How resilient is our society to any number of ethical missteps that could be potentially be prevented through interactive conversations?

Google announced an AI capability like Google Duplex in a way that wasn’t sensitive to the ethical implications of how this technology would be used. The AI agent didn’t disclose to the human that it’s an AI agent acting on behalf of a human, and it was like watching a prank unfold as to whether or not the human talking to an AI bot was going to determine that this was an AI bot.

How many other humans did Google use to stress test their technology in these types of field tests? Did they ask for consent for these tests? Were these tests actually scheduling real appointments? Or were they cancelled later?

There is a level of human labor involved in training AI, and humans should be able to opt into whether or not they consent to helping train Google’s AI — that could eventually be putting them out of a job. There is a lot of general fear about where AI is going to be developed and cultivated with humans in mind, and Google’s cavalier attitude around ethics and AI isn’t helping alleviate any of that anxiety.

There were also a number of Google employees who quit in protest due to Google’s participation in supporting the Defense Department’s Project Maven with training and open source AI technologies. Gizmodo’s Kate Conger reports that “One employee explained that Google staffers were promised an update on the ethics policy within a few weeks, but that progress appeared to be locked in a holding pattern. The ethical concerns ‘should have been addressed before we entered this contract,’ the employee said.”

Over 90 academics signed an open letter from the International Committee for Robot Arms Control calling for “Google and Alphabet’s executives to join other AI and robotics researchers and technology executives in calling for an international treaty to prohibit autonomous weapon systems.”

Conger reports that “Google has emphasized that its AI is not being used to kill,” but the open letter written by academics says that it’s headed down that path. The open letter says:

With Project Maven, Google becomes implicated in the questionable practice of targeted killings. These include so-called signature strikes and pattern-of-life strikes that target people based not on known activities but on probabilities drawn from long range surveillance footage. The legality of these operations has come into question under international[1] and U.S. law.[2]

There are a lot of deep ethical questions when it comes to AI, but also issues around privacy, and whether or not we’re on a path towards capturing biometric data from VR or private environmental data for AR. My impression is that most of the technology engineers, software architects, and designers earnestly want to do the right thing in creating useful technology that helps solve real-world problems and helps make the world a better place. The problem is that there isn’t a single individual who can speak to the larger ethical, philosophical, or cultural implications of all these technological capabilities that they’re building.

In the absence of making everyone responsible and enabling every individual to speak about the ethical and moral implications of what’s being built, then companies like Google and Facebook should consider creating cross-functional teams that are having these conversations. If this is already happening, then these representatives should be cleared to be having these ethical discussions with journalists and the public at large. Otherwise, there’s going to be even bigger public backlashes to technology like Google Duplex when it’s exalted as a technological achievement while being completely tone deaf to the moral and ethical implications of unintended consequences of what it means to have embodied conversational AI deceptively interact with us without our explicit consent. Google recently removed the “Don’t Be Evil” clause from their code of conduct, and so let’s all hope that they figure out a way to have larger ethical discussions about the technology they’re creating without being completely blinded by their technological genius.

This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality

Support Voices of VR

Music: Fatality & Summer Trip

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR Podcast. So, the last couple of weeks were pretty intense in terms of traveling from these different developer conferences, and it was quite a contrast, too, to go from Facebook F8 to Microsoft Build to then Google I.O. F8 was kind of like all of the social media marketing and people who are hired by agencies to be able to build different advertising applications. Microsoft Build was the enterprise developers that were very interested in building pragmatic applications that are going to solve business problems. And then like the Google I.O. is basically like this open ecosystem, like open source slash people who are makers and builders and trying to take the extent of what technology can do and to build things for the sake of technology. And because there's lots of tools for artificial intelligence and now they have ARCore and all these other sort of exciting new technologies that are out there. So there was a number of different announcements that were made. So today's episode, I'm going to be talking to Nathan Martz, who's going to be talking about the three new features of ARCore. That's the scene form, which is basically a new interface for Java developers to be able to work with 3D objects a little bit easier so they can integrate into things like ARCore. And then they have the augmented images, which is going to be able to be trained on images and then translate those images into immersive experiences within AR. And then there's the cloud anchors, which is going to enable these multiplayer types of experiences within augmented reality by sharing the feature points of a common geography of a place and be able to facilitate these multiplayer AR experiences. So before we dive in, I just want to sort of set a little bit of like the larger context of the different types of conversations that were happening at each of these different developer conferences. At F8, it was basically like a mea culpa of Facebook coming forth and saying, we're sorry that the ethical implications of some of our technological creations have not necessarily ended in a way that we wanted. here are the things that we're doing to be more proactive, and we're sorry, but there's a vibe of we're not really sorry, we're gonna just keep building, and we're gonna keep pushing forward with this vision that we have of connecting the world, and we think that the risks that are associated with connecting people are worth it, and we're gonna keep doing that. Microsoft came out of the gate saying, like, our top priorities are privacy, cybersecurity, and ethical AI. So they want to make a stand for the enterprise markets of saying, hey, we're really concerned about privacy. But one of the things that Microsoft has been doing is participating in the Decentralized Identity Foundation and self-sovereign identity and actually developing open standards with the W3C to be able to implement Something that is essentially the antithesis of the business model for both Facebook and Google so Microsoft was talking very explicitly about ethics and ethical AI and the importance of like the implications of the technologies that we're building and When it came to Google, I oh Google was talking about ethics in the context of your relationship to technology so you being able to like turn over your phone and have it do not disturb so I different features they're baking into Android to just have a better relationship between humans and technology so that our technology isn't like hijacking our attention in specific ways. So it was kind of like this ethic of time well spent kind of being integrated into the operating system level. But when it comes to privacy and larger ethical issues, the metaphoric and symbolic message that I got from Google was that we're not going to talk about it. I say that because the day after Google I.O., they announced all their GDPR changes. And that was like the Friday after having a Google I.O. on Tuesday, Wednesday, and Thursday. And I have to just think, like, why? Why sort of dump the GDPR announcements on the Friday after all the journalists were just there? They could have done it before, like Facebook did, to be able to actually talk about some of these changes and some of these different things that they have to do to meet the obligations of the GDPR. But that's not what happened. But because I was going from Microsoft Build to Google I.O., I went to both and I missed the last day of Build and the first day of I.O., I missed the opportunity to talk to Clay Bevore, an executive at Google, to be able to ask some of these deeper and larger ethical questions. Ethics was also at the top of conversation when it came to like the first day keynote of Google I.O., where they showed Google Duplex basically succeeding in the Turing test of being able to fool someone who is a receptionist at a haircut salon scheduling a time to get a haircut now the weird thing about this was that I mean it was amazing technology and the lot of the AI was actually using like ums and ahs to be able to kind of like fool the person on the other end, but there is like all these deeper implications of like, what does this mean to, you know, have AI that's tricking us into being human? Like, what are the ethical obligations of disclosure when it comes to like AI that is, you know, communicating with us? Like in some respects, by not having like an emphasis on the ethical implications of that, it just sort of raised these deeper questions about Google. So they've since come out and said that, you know, they're going to have disclosure. But there's just like these deeper questions about ethics that I think that it's like how we're going to have this conversation between us as the public and entities like Google. And the challenge for me as a journalist is that as I have these different conversations with people who were pretty specific in the technological track of being able to talk about the actual new API implementations that are out there. They actually like aren't at a position to be able to talk about the higher ethical dimensions of their job. And the fact that there's not a direct contact for me to have ethical conversations with Google means that I've had to talk to people who are too low on the totem pole to be able to actually have those conversations. So I'm hoping at some point to actually have an opportunity to have like interaction or conversation about their new GDPR privacy policies and some of these larger ethical questions about like, you know, biometric data, what data are being collected, some of these deeper ethical questions that I think have to go into like governance and terms of service violations. And as the digital world gets blended with the real world, then what are the implications if they are Starting to become larger than governments in terms of their power than if you get banned from their service Then what are the implications of you being ostracized from this whole digital reality? That's gonna soon be melding into our real reality So these are these larger questions that I've you know started to talk to with Facebook But also these open questions about Google and some of these are five to ten years out but I'm just like thinking about these larger pictures and So that's just sort of the deeper context for some of the questions that I'm asking Nathan Martz. And hopefully, I'll be able to work with Google and be able to have some deeper conversations here at some point. But with that, we're going to dive into this conversation where we're going to get into the guts of the new ARCore APIs and some of the new features that were announced at Google I-O. So this conversation with Nathan happened on Wednesday, May 9, 2018, at Google I-O in Mountain View, California. So with that, let's go ahead and dive right in.

[00:07:12.675] Nathan Martz: Cool. Yeah. Well, my name is Nathan. I'm a lead product manager on the ARCore team at Google. And basically this year at IO, we've released a major update to ARCore, and we're focusing on tools that let developers create richer and more immersive experiences in AR. So there's three major features that we're talking about right now. One of them is a technology called the Sceneform SDK. And one of the pieces of feedback we've gotten from especially traditional Android Java developers is that they're like, oh, AR seems super exciting. I would love to add an AR-powered feature to my app, but my app is written in Java. And if you give me Java and OpenGL, wow, that's like a lot of heavy lifting to go from that to an AR feature. And so Sceneform is designed to actually make it a lot easier for those developers to create AR-powered apps or features that are powered by AR in their existing app. And we do that by essentially providing a higher-level API that makes it really easy to express 3D concepts like objects in the world. We include a physically-based renderer, which is really complicated under the hood, but basically produces visuals that are high-fidelity, that look like they belong in the real world. And then we also have tooling in Android Studio that makes it easy for those developers to bring in 3D assets, automatically optimize them, and then kind of prep them for loading at runtime. So that's one of the big workflow improvements that we have on the native side. We also, in terms of new algorithmic capabilities, there's two big ones that we're talking about. One of them is called augmented images, and this is, if you look at ARCore 1.0 that we released a few months ago, It understands your world, but in a very broad sense. It gets an idea of what are some 3D points in the scene. It can aggregate those points into horizontal surfaces. But it doesn't necessarily understand that that picture is a thing on its own. And so developers have said, oh, well, I often want to actually have AR experiences that are based on specific physical objects, not just the space around me. And so augmented images allows a developer to specify a catalog of images that they're interested in detecting in the real world. They can specify actually up to a thousand of them. And then we will actually detect whether one of those images is in view. And if it is in view, we'll actually give the developer a 3D position and orientation so they can use that to attach and trigger an experience or even have AR content that's oriented to or aligned with that image. And then finally, the biggest announcement that we have is a new technology called Cloud Anchors. And if you think about, you know, Sceneform is kind of a fundamentally enabling technology, and augmented images is about the objects in the world. Sceneform is really about allowing you to interact with other people in the world, or really allowing multiple devices to interact with one another. A lot of smartphone AR to date has been what you might call single player, right? It's an activity that's like you do on your phone. And part of the reason for that is the core way that these algorithms work is, you know, my phone builds its sort of understanding of the space around me and your phone builds its understanding. And by default, those are separate, even if we're standing next to each other. And what we've heard from developers is that they want to build collaborative experiences. And if you think about a lot of the compelling use cases, whether it's, you know, like furnishing a new apartment, planning a remodel, playing a game together, learning a new concept, these are all things that you want to do collaboratively with other people. And so what Cloud Anchors does is it solves the one really, really hard problem for developers, which is how do you have my phone's understanding of the world and your phone's understanding of the world actually become a shared understanding? And the reason we call it Cloud Anchors is Anchor is a technical concept in a lot of AR libraries, including ARCore. And it's essentially the glue between the real world and a virtual object. And so a Cloud Anchor is an anchor that is shared via the cloud between my device and your device. And we do the really hard work to allow a developer to create that cloud anchor and synchronize it across multiple devices. And once they have that cloud anchor, they can do all of their app relative to it the same way they would do it in sort of quote-unquote single-player mode.

[00:11:26.043] Kent Bye: Yeah, so I had a chance to try out the light board demo here at Google I O. And so they had to take each of the phones and basically kind of sink it to the shared environment so that they could figure out the anchors and then figure out how to share those anchors. And so it seems like it's not persistent, like you would have to sink it each time. And then so there's sort of this process by which it sort of sends up the scene graph up into the cloud. And then somehow it's able to resolve between the two different perspectives how each one is going to be able to share objects between each other. Is that the idea?

[00:11:57.228] Nathan Martz: Yeah, that's about right. I'll clarify a couple points. So the API is actually really simple. And what we do is that when you want to actually share, one of the devices calls an API that's called host anchor, which is basically, I want to create an anchor that I intend to share with someone else. And it can be actually like many other people, not just one other person. And when you do that, we have kind of a snapshot of that device's understanding of the world, which is, to a certain extent, positional, right? Like, it's what you've seen over the last, you know, n seconds. And not actually the visuals that you've seen, but some lower-level, what we call feature points, which are essentially, like, interesting 3D positions in the environment. And so, okay, that's my phone's understanding of the world. And then for the other phone, for anyone who is going to join that session, it's actually called resolving an anchor in the API. Their view of the world needs to overlap enough with the host's view of the world that we can kind of figure out what that match is. And it's about plus or minus 30 degrees either way. So it doesn't have to be right on top of each other, but it definitely helps to sort of start out side by side. That lets us get a good, confident match. And then from there, the tracking is actually stay synchronized. So you can walk anywhere you want, spin around, people can separate from one another, come back to each other, and that hosted anchor will stay fixed for both of them.

[00:13:19.224] Kent Bye: Now, in terms of the data that's sent up, I mean, there's certain privacy concerns around like, you know, what are you recording in the environment? What's being shared? And so what is being sent up into the cloud of that environment? And then what is being used in order to, you know, create this shared geometry? And is it sort of scanning the room for other things as well?

[00:13:37.713] Nathan Martz: That's right. Yeah, so it's actually, if you look at any of the modern smartphone AR systems, the trackers, like ARCore being a good example, one of the things they do is they compute what are called feature points, which you might think of as sort of corners, but really they're little positions in space that we're pretty sure that we can keep track of from one frame to the next. And we track sort of scores of them at any one time. And so if you think about an actual image, which is 1,000 pixels by 2,000 pixels, right? That's what a 1080p camera is. We're talking about a very, very small subset of interesting features in that larger camera image. And all of those features together, you'll hear people refer to it as a sparse point cloud, but it's essentially a collection of these, like, to a computer, interesting single positions in space. And when you host an anchor, that's what the data that we send up to our servers. And that, what's interesting is that we're, so we're basically taking data that we're already computing, which is very, very different than a human understanding, we're not like uploading video image frames, we're just uploading the point cloud. And that goes up to the server. And what the server gives you back is actually an ID. Essentially, you're saying, I would like to create a Cloud Anchor. Here's some data. We say, thanks for the Sparse Point Cloud. Here's the ID. Now do what you want with it. And then it's up to the developer to share that ID with other devices in the session. So they share that ID however they want to. It could be through their own client server thing, peer to peer, whatever. And then the person that they're working with, their device, says, hey, I got this ID. I want to resolve it. And so they send that ID and their own feature point data up to the cloud. We try to do a match. And then we send the 3D position of that cloud anchor back to the device. So there's a couple of really interesting things there. One is that the data that we send is only the sparse point cloud data. That's all that we send to enable the cloud anchor sharing. Also, the data that's shared, like, even that data only goes between your device and Google servers. We're not sending, you know, my future points don't go to another device, only that ID goes back and forth. So we've tried to make sure that we're using, you know, the minimal amount of data possible, that we're sending it securely, that that data, you know, only goes to Google services, and then the developers only need to share that ID, which is all that they need to synchronize the kind of pose of that anchor across devices.

[00:16:09.102] Kent Bye: Yeah, and I know last year Google announced the virtual positioning system, the VPS, which was something like Lowe's would be able to map out their store. And then there'd be this kind of interface of mapping up the inner landscape of the store so that if people wanted to find a specific thing, they could be able to be guided towards that. And so there's a certain level of persistence that happened in VPS. Is something like this where you're sharing anchors, is that tied to like a GPS? Is that sort of ephemeral data that disappears or is it tied to a permanent location in any way?

[00:16:39.291] Nathan Martz: Yeah, so the current stuff is not tied to a permanent location. It actually, interestingly, Cloud Anchors uses a very similar backend to VPS. And that core problem of take feature points that you've seen before, e.g. from one device, and then align them on another, like figure out where another device is relative to those feature points, that's actually a similar algorithmic problem. And so we use similar parts of the backend. The big difference is that, as you asked before, does the data persist? Can you rejoin a session? Today with Cloud Anchors, we persist that anchor for a day. And so even calling it persistence is a bit of a misnomer. We're really focused on enabling developers to share experiences simultaneously for people to collaborate together. And so we needed the data to live about as long as your session is going to live. And also we wanted to make sure that it was friendly for people who are doing what might be like multi-session within a day. Maybe you're playing a longer game, a board game together, and you want to play for a little while and then stop and grab a snack and then play again. So we'll actually, for that, we want to enable developers that if they're interested in supporting that pattern, that they can host an anchor and then actually find it a little bit later, that it's not literally tied to the app session. So to do that, what we allow ourselves to do is we keep that data around for, at max, 24 hours. And then if a developer wants to do something longer than that, they need to create more anchors and do the same user flow over and over again.

[00:18:09.137] Kent Bye: And for the image detection of being able to essentially translate a 2D image into a 3D, like, AR experience, you said there's about 1,000. Does that mean that, like, I would create an app, and that app would be sort of limited to 1,000, one of those 2D image AR stickers that could then, or AR, QR codes that could then get translated into an experience, and that in order to see each of those get translated, they would have to be in the context of that app in order for it to emerge?

[00:18:36.488] Nathan Martz: Yeah, that's right. So I say first off, you know, ARCore itself is a developer tool, right? It's a product to enable developers to make more expressive applications. And so everything that ARCore does is in the context of an application. So the augmented images specifically, the way it works is a developer, either at runtime, they can give us individual images and say, here's a few images I'm looking at, please try to find them. Or if they're looking for a lot of images, there's actually a command line tool in the PC that they can use to say, like, I want to, like, here's 1,000, here's a folder with 1,000 images. Please build, like, a single model, a representation of all 1,000 of them. Get the important information out. And then at runtime, rather than them giving us 1,000 images one by one, they just give us that model. And the result is the same for them. Either way, we let them know if we've detected that image. And then once we've detected it, what its 3D position and orientation are. Also, you mentioned QR codes, and I should be clear that augmented images actually works on almost any picture. Part of what's cool about them, and part of the reason we call them augmented images, is that if you want to use the front of a cereal box, or a toy, or the instruction manual for your coffee maker, those are all images, and we can work with any of them, not just the extra machine-friendly things like QR codes or AR markers.

[00:19:55.325] Kent Bye: Yeah, when I was at the session of What's New in AR, they did sort of a recap of what's to this point released, and they mentioned like three major features in terms of like being able to do lighting effects, so change the lighting and be able to detect the lighting, being able to do surface detection to be able to identify the geometries in a room and be able to place things on those. And then also just to be able to, you know, they panned over and showed a scarecrow standing next to, you know, this food cart. So you'd be able to essentially overmix and overlay the scene and be able to put things in the scene and potentially have different levels of occlusion. So just wondering if you could kind of recount what's out there, if those are the three major things in terms of the features and functionalities of ARCore.

[00:20:36.823] Nathan Martz: Yeah, that's right. We often were explaining like, how does ARCore work? We talk about three basic capabilities. One of them is motion tracking, which is fundamentally the core power of ARCore is that it can compute your phone's 3D position orientation in space. And once you know where your phone is in space, that allows you to do something like place a virtual character and walk around it, get closer to it and further away from it and have it all match up. But the thing is that if all you had was the like position orientation of the phone, you may not know where to place that virtual character, right? The virtual character needs to stand ideally on the floor or on a table. And so you need to actually understand the environment that the phone is in. And that's what we call environmental understanding. And so that's both the sparse point cloud that we've talked about, as well as things like the vertical and horizontal surfaces in that environment. And then finally, so if you think about motion tracking tells you where the phone is, and environmental understanding tells you what the environment looks like, both of those things you can use to create a pretty compelling AR experience. But often developers want to have virtual objects that feel integrated visually into the real world, that have the same kind of like, you know, if it's a bluely lit room, that there's a blue cast on the object. If it's dark, the object should be dark, et cetera. And so the third piece is what we call lighting estimation, which is that we try to understand the illumination of the room and pass that understanding on to developers so they can actually use that to make the virtual objects feel married, to feel integrated into that environment.

[00:22:12.870] Kent Bye: Yeah, I don't know if it was just a coincidence or if there's specific things built in, but some of the demos they showed during the presentation almost was accounting for where the sun was at that time of the day and where you're at and getting the right shadows. I think shadows in AR make a huge difference in terms of creating this illusion of presence. I think there's just something in a subconscious level that just allows our brain to just believe it more. Are you able to keep track of the sun and be able to look at where north is and be able to know what time of day is to be able to estimate the shadows? There's a lot of things that would go into that, but I'm just curious how sophisticated the shadow projection is within these ARCore.

[00:22:50.625] Nathan Martz: Yeah, right now the API is actually pretty straightforward and we're looking mostly at the overall illumination in the scene, not the location or direction of light sources in the scene. Your idea is really clever, you know, it's like if you want, I mean, you certainly like, if you know where you are in the globe and you know what time it is, you've got a pretty good idea of where the sun is. So those are the ideas that we've like kicked around and discussed or definitely we expect to advance Kind of the complexity of what we do in the lighting estimation APIs But there's a kind of a wide variety of techniques that you could use and we ultimately want them to be robust to being indoors Or outdoors and sometimes actually knowing that is the harder problem So there's no like specific API plans to comment on today except to say your idea is pretty interesting and I really agree with you that even though we may talk about lighting, it is the shadowing that's often the more important visual cue. In fact, if you think about video games, going back to the PlayStation 1 or PlayStation 2, one of the most basic techniques that we used to use is you'd put a blob shadow, just a simple blobby black sphere underneath the character, and all of a sudden you're like, I get it! The character's standing on the ground, and you take that out, and everything feels like it's floating. So actually yeah, there's this powerful like psycho visual cue that you when you see what's called a contact shadow like darkening around the edge of an object That's a really important part of how we understand that two things have a tight physical relationship with one another

[00:24:19.138] Kent Bye: And one of the other demos that was shown during the keynote was more of a mapping demonstration where you hold up Google Maps and it sort of translates, like it knows where you're at, where you're facing, and be able to look at the, you know, from the Google Street Map views, be able to give different waypoints and tell which direction you're facing. And I know that there's Google Daydream, Daydream VR, AR, which has got its own building and own entity. And it seems like these types of AR, immersive computing, and AI is kind of spreading out through all these other different branches. But I'm just curious if something like that, a feature like that, is being developed within the context of that mapping application, or if there's anything else that you can speak about to that.

[00:24:58.417] Nathan Martz: Yeah, I mean, I'd say broadly, like, that's why AR is so cool, is that it does go everywhere, right? And that there's, in fact, really broadly, we're seeing the camera, right? The phone's visual understanding of the world permeate more and more use cases. And so whether it's, you know, gaming or self-expression or shopping or navigation and maps, this idea comes up over and over again. And that's one of the big reasons that I'm on the team. On the ARCore team, we actually support both a lot of external developers and internal teams like Maps. So almost always, Google takes a mix of productionized public technology like ARCore and some of our own internal secret sauce to make a given feature. But that's broadly how we work with teams like Maps.

[00:25:45.986] Kent Bye: And so does something like in the ARVR tent, there was a Google Lens demo, for example, that was able to look at things and use object recognition. Is that coming more from the AI side, or is there also other augmented reality dimensions of Google Lens?

[00:25:59.558] Nathan Martz: Yes, I'm actually a PM on the ARCore side, not on the Lend side, but like Maps, it's another team that we work with really closely. We try to understand their needs and what they're building and how we can align the underlying platform tech to support that.

[00:26:14.289] Kent Bye: So I guess there's been a lot of broader conversations about privacy and ethics, I think, within the technology industry. We have GDPR that's coming up. I know there's a concern in terms of personally identifiable information versus something that's de-identified and data that's coming in. I know that Facebook is coming up with new tools to be able to see what data are being collected and then in talking to them basically any de-identified data that's being collected is not sort of within that purview to be able to connect to you in any way. With GDPR coming up, is there anything from a product standpoint that you've had to re-engineer and account for in order to meet the obligations of the GDPR privacy obligations?

[00:26:53.865] Nathan Martz: Yes, I mean, I'm definitely not the right person to talk about that level of policy at Google scale. I think the one thing I will say is, I've been at Google personally four years now, and I've been lucky enough, I spent a year in search, I worked on VR, I worked on AR, and if there's one common denominator with all of that, it's that we take privacy and the security of the data that people entrust us with incredibly seriously. Actually, as a product manager, a big part of the job is with every feature, thinking through what data are we recording? Is that essential? Are we protecting it? Are we compliant with our policy and the policy of the various nations that we work with? It's a huge part of the job. I can't emphasize enough how important it is to us.

[00:27:46.208] Kent Bye: Yeah, and I mean, there was also, I guess, the one during the demo yesterday that I saw there was like this conversation with an AI assistant that was sort of talking to other people out in the world and you know, my reaction was kind of like, hey, there's these AI assistants that are talking with humans, should they be disclosing that? There's these larger ethical questions that comes up with AI, but I think there's going to be similar ethical questions that are coming up when it comes to VR, AR. What do you think is a mechanism by which there can be a conversation with Google and the public when it comes to these various issues? Because right now, it's sort of like a thesis, antithesis, synthesis, where there's a feature that's being introduced, and now, oh, now are these, here's all sort of societal implications of this, like, what is that, what does that dialogue look like?

[00:28:36.060] Nathan Martz: Yeah, again, I'm certainly very reluctant to comment on some of our AI investments way outside of my personal wheelhouse. I think broadly, I can talk a little more specifically within the context of AR, which is that I think we always think about what data are we collecting or even how does the technology work, right? All technologies, computers fundamentally process information, right? That's what they do. But you want to understand, well, what is the necessary data to perform that? Whatever that data is, is it computed efficiently? Where is it stored? Is it stored securely? Do people understand? Which I think is a really important part. Do users actually understand? Like, you know, the extent to which the technology, like, how does it work? What level of understanding does it have? Where does that data go? These are all things that, you know, there's a fine line between, like, this technology is very complicated, right? Even pick one part of the ARCore tracking stack, and it's like, whoa, PhD thesis, you know, and we're not gonna, you know, we don't expect every user to have a PhD in visual and inertial odometry. But we do want to make sure that we explain, that we have, like, a way of explaining to them, like, at least in an approachable, reasonable, you know, truthful fashion, Here's what it does, and here's what it enables. And we do that to the best of our, I would say, ability every single day.

[00:29:53.576] Kent Bye: In terms of use cases and applications, looking at the future of AR, they're looking at gaming and entertainment, but also education and shopping, advertising. Maybe you could talk about what you see as some of the most compelling use cases of AR.

[00:30:08.650] Nathan Martz: Yeah, yeah, I think that's a great question. And I think it's one of the fun things about the space. So I would say that personally, there's already self-expression. The social use case, we're seeing a lot of engagement, right, where people are already pulling out their phones to take a picture. AR just helps you express your personality in a whole bunch of fun and interesting ways. Gaming is another category that we've seen be really successful. especially the location-based gaming, where you're out and about in the world, exploring the world around you. And then once you're in the world, there's this fantastic layer, and you want to believe that these virtual characters inhabit the same world that you do. On the utilitarian front, there's a few interesting examples. Education is one that I'm personally really passionate about. You know, we think about if VR can take you anywhere, which we've done with VR expeditions. Well, we do AR expeditions now, and AR expeditions bring anything to you. As an example, you think about learning about antiquity, right? About ancient cultures and the amazing artifacts that they produced. And these are all, like, hand-built pottery, hand-crafted sculpture. These are, like, inherently three-dimensional textured objects. And, you know, probably very few classrooms are going to take a plane trip to the, like, National Museum in London, you know, or to, like, you know, Greece or Rome or Iran. But they still want like to really understand these incredible artifacts they want to be able to see them in 3D and walk around them and get close to them and see how the light reflects off the surface and so as like as an example it's a really powerful way to get at that also. You know, a lot of even abstract concepts are inherently spatial, right? If your student is a bit older, you're learning chemistry. You're learning about, like, molecular interactions and, you know, how do covalent bonds work? And actually, these are inherently 3D structures and 3D interactions. And by presenting that in a way that's spatial, that, again, allows you to walk around it and interact with it, super, super powerful. Another one I'd highlight is actually what you might think of as utility. I don't know how many people thought that a main use of your phone is as a flashlight, right? But it turns out nobody carries flashlights anymore because your phone is really good at it. Well, with AR, actually your phone can become a measuring tape. If you want to get a quick sense of how tall is my window so I can get blinds, you can do that with your phone right now. I've seen developers with some really cool stuff like building floor plans, letting you walk around your home and annotate it, whether you're like planning a remodel or like actually figuring out where the best place to put your Wi-Fi router is. So some really, really interesting uses of kind of like measurement and spatial analysis in high utility areas. I think this is just the beginning, right? This idea of phones that kind of see and understand the world. This is really, really great. I think super healthy mix of really fun applications like in self-expression and gaming and really, really deeply useful applications like education and modeling and measurement.

[00:33:10.988] Kent Bye: What are some of the biggest open questions or problems that you're still trying to solve with ARCore?

[00:33:16.792] Nathan Martz: Yeah, that's another very good question. So I think there's a few I'll highlight. One of them is you can think a lot of, as humans, our understanding of the world, you can kind of bucket in people and places and things. And we've done a lot of investment in AR to date on the places side of things. where AR basically gives you a sense of the space that you're in. And I'm definitely interested in looking at how we better understand objects is a good example. And I think especially as humans, we put a lot of emphasis on, we don't just think about atoms and electrons, right? We think about, oh, there's a coffee table, there's the phone that's over here beside us, there's the car that's parked in my garage. And so actually giving developers tools to help them integrate more object understanding into their apps and allow them to build experiences on top of that I think is super interesting. And in many respects you can think of augmented images actually as a first step in that direction since many, many kind of common objects in the world are built to kind of present an image to you. One other one I'll highlight is that for us, we're a technology team. We think a lot about hard problems. But we're also a developer product team. And we know that developers have businesses to run. And to run a business, they need to reach a meaningfully large audience. And so we think a lot about scale. And in Android, part of what's amazing about it, it's a huge and diverse ecosystem. So we want to work really closely with our OEM partners and with developers to ensure that ARCore works on the widest possible range of qualified devices to give developers a really large, meaningful audience, but also to do that in a way that can help insulate developers from some of the complexity of talking to every single device. So it's an interesting challenge to get scale, but do so in a way that's digestible and understandable and approachable for a given developer.

[00:35:14.510] Kent Bye: Great. And finally, what do you think is kind of the ultimate potential of augmented reality and what it might be able to enable?

[00:35:22.872] Nathan Martz: Yeah, I mean, I personally really, this is a phrase we've used a lot, but it's computers that see and understand the world much like we do. And where that understanding enables us as humans to actually get data in the context of the real world. I think a lot of us today, our lives are kind of bisected into two spheres. We have The physical world, the world that we as humans inhabit and have with other humans and couches and animals and all that good stuff. We interact with that world in one way. And then we have our computers, the world of digital information. And by and large, we interact with that in a separate way. We scroll a mouse, we use a touchscreen. And so even if you have like, you know, you're traveling and you visit Paris and you look at the Eiffel, you know, you can like touch the Eiffel Tower with your hand or look up the Eiffel Tower on Wikipedia, but they're very separate from one another, right? They're almost different domains. And what I think is really powerful about augmented reality is it's going to actually erase that division and allow us to combine our understanding and experience of the physical world with the information that we have about it and this incredible corpus of human knowledge we call the internet and applications.

[00:36:34.893] Kent Bye: Awesome. Well, thank you so much for joining me today on the podcast. Yeah, it's my pleasure. It's great to see you again. So that was Nathan Martz. He's the lead product manager for ARCore at Google. So I have a number of different takeaways about this interview is that first of all, well, I just want to take a step back and just talk about the experiential marketing that Google has been doing over the last two to three years at Google I.O. So they basically overtake this music amphitheater and they have in the parking lot all these geodesic domes where you walk into the dome and there's these direct experiences that you can have of some of these different applications. Whether it's like AI machine learning, there are experiments, there are VR and AR. web payments and you get this impression of like this vast ecosystem of Android and all the different dimensions on the web and applications on automobiles, on watches, on tablets and computers and Android's in a lot of the different VR devices that are coming out there. So Android is this operating system that's like basically spread out everywhere. And as a developer, it's really exciting to go and kind of have a direct immersed experience of some of these different applications that are possible with our new APIs. And they have lots of different experiences that you can have spread out through the entire conference. So that is really cool just to see what they've been doing with experiential marketing. And I think of all the different companies that are out there, Google has been doing probably the most advanced experiential marketing that's out there. So overall I would say that virtual reality was a little de-emphasized when it comes to like the major keynote announcements and what is happening. There was like kind of a surprise Six Degree of Freedom Lenovo Mirage solo that was at the booth and I hadn't heard of it by the time I tried it out and it was like it was amazing. It was actually like really impressive and the world sensing technology to be able to do six degree of freedom moving around. I know that Upload VR's Ian Hamilton got a chance to get one of the headsets and to turn off some of the, I guess, boundary limitations. And he was able to basically walk and then walk back and have the world sense technology to keep track of that. So it's in some sense like this amazing kind of six degree of freedom greater than room scale potential of a headset. So not a lot that was happening in terms of the announcements or new stuff that came for VR and I guess we'll see what continues to happen and develop I think they're still committed to YouTube and all these other VR as an eventual end state but I think right now they're really focusing on the AR because that's just where people have the phones and a lot of the energy and excitement is and And there's the capability to be able to actually do applications that can do advertising. And so for a lot of agencies and companies, that's going to be funding a lot of that development to be able to do these different types of like take a box and take a picture of that box and then use that image of the box to be able to cue a whole like immersive experience around that. So I expect to see a lot more agency type of development when it comes to these different applications that are out there. There's also been a lot of stuff that's been continuing to progress with the web and the open web and web AR and web VR. Unfortunately, they didn't give access to Brandon Jones to be able to give a full update as to everything that's happening. I expect that later in the year, they're going to start to potentially be shipping the Chrome version of the WebXR. But I think that they're going to be doing some origin trials on the next like 2.0 spec for what's essentially, I guess, the 1.0 spec of WebXR, but the WebVR 2.0. But once people get to test that out, then they're going to start to then get that out there. And I think that's a big blocker when it comes to a lot of people really fully adopting WebVR. But Oculus Go came out and their Oculus browser has full support for WebVR 1.1. And so go ahead and fire up your A-frame and ReactVR and all these other different WebVR frameworks to be able to start putting experiences out there because people can start to really see it on these seamless user interface. But also super medium VR is a great way to be able to look at some of these different web VR experiences if you have a Vive or a Rift and you want to see a curated list of what the best of that's out there for web VR. So in terms of the new ARCore features, you have the scene form, which is at the infrastructure layer, I think this is huge news when it comes to the future of immersive computing. This is going to just make it easier for Java developers to be able to start to seamlessly integrate augmented reality experiences. The pipeline to be able to do 3D models, it's pretty complicated to know all the nuances of the physically based rendering. And so they did a whole like physically based rendering 101 in their scene form session. And a lot of the tools to be able to more seamlessly integrate these 3D objects into their development workflow. So I think it's actually a huge sign that moving forward, this is going to be a part of the immersive computing future. And spatial computing is just making it easier for the developers to be able to put these 3D objects into their experience and make them look good. so the other two is augmented images which I had a chance to do some demos of that and it was pretty amazing actually to see some of the 3d cubes that were there with some images and then to see it in the augmented reality window this whole explosion of different art that was going up in it it just has this nice like feel when you're able to overlay an image that is very similar to what is seeing as the trigger and then have that image be animated it just does a really nice job of kind of you know tricking your mind into stepping into this reality portal. And I'm excited to see what is developed with that. And the cloud anchors, the actual experience of being able to play a game with another person was a lot of fun. And it was interesting to learn that it's using the same technology as VPS. And yeah, I would just say like in the long run, just the implications of being able to scan different aspects of your environment. And if it's what's called identified and de-identified information, if it's not tied back to your identity, then they have a lot more leeway to be able to you know, get more information in there. And so this is kind of like a realm of data that is being ingested by Google in order to, I guess, in some respects, train their giant AI algorithms that they have. I know there were some journalists that talked to Sergey Brin at some point and asking him why they were doing something. And Brin was basically like, well, we're building this giant AI. And I think that's been their goal all along is for Google to build this artificial intelligence entity. And so in order to do that, you have to be very data hungry. And so you're gathering all this data in order to train it. And I guess the question is moving forward is, How much is that data that they're recording? Is it tied back to identity? Is it de-identified? Or is it identifiable if someone got a hold of it? What are the implications if this data gets out there, if there's a security breach? You know, there is a number of different Google employees that are actually like quit in protest of Google's participation in Project Maven, which is an AI project that is working with the government to be able to do object identification for drone footage. And so that just starts to bring up all these different ethical questions around how much direct work should Google be doing with the government in terms of helping what could essentially be these unmanned aerial vehicles that could eventually be used to be able to do lethal strikes that violate international law? What are the ethical implications of Google participating in some of these programs with the Defense Department and the government? And I think a number of different Google employees were protesting against this. And again, this kind of goes back to like, what is the ethical policies that is guiding some of these different decisions? from the feedback of some of the Google employees that internally they were just having disappointment that there was less of a dialogue with the higher-up leadership. And so, yeah, I guess there's some deeper questions when it comes to Google as an entity, especially when it comes to VR and AR, and what are they doing in terms of what data are being collected, how do we get some transparency around that, maybe we'll see some movement around that around GDPR implementations, but it was honestly kind of disappointing that Google decided to announce the GDPR changes and their privacy policy changes and the different tools and stuff that they plan on doing with data portability and whatnot. It was like the day after Google I.O. So it was kind of like energetically and symbolically kind of trying to sweep it underneath the rug or at least de-emphasize it. Whereas I think that's not the strongest suit for what Google wants to emphasize. So I guess it can understand why they didn't want to sort of focus on that, but it's just kind of frustrating to be there and to not sort of have any answers as to what was happening, knowing very well that there was likely something on the horizon that was about to drop. So hopefully I'll be able to talk in more detail with somebody at Google about both like what data are being collected with the VR and AR. And the challenge I think is this whole balance between de-identified information and personally identifiable information and how much of our de-identified information is actually going to be identifiable, what data are being stored over time. you know, I can see why for their cloud anchors, they would want to potentially like just say, hey, you know, after a day, this disappears to sort of get people comfortable with the fact that they're, you know, scanning their environment and sharing it with Google. And the same type of like issues came up with Facebook with being able to, you know, record your rooms, just as one potential ethical implication of this is, if you're scanning your environment, and let's say you have a certain deodorant that happened to be laying around in the bathroom or something, and they're able to potentially identify that and connect you with a specific type of deodorant and then that connected to your identity, then they'll be able to advertise you specifically to like a competitor for the types of products that you happen to have around in your home. So connecting the fact of where you live with what data is being collected in these different scene graphs and what the level of fidelity is there. I know that Nathan said that it's just the feature points and the corners essentially, but when I was doing one of the line drawing demos at Google I O, this is a big geodesic dome. What are you using for the feature points? And they said, Oh, well, actually it's the carpet and the carpet was just like shapes. And so it was like, the shapes were somehow being translated into those feature points, which told me that it's not always just geometry, that it can actually be the images that are there as well. And if it's a distinct image, they can use that as a feature point to be able to use as a hook to be able to do these shared augmented reality experiences. So just overall, I just think that Google's main business model at this point is still this process of surveillance capitalism. So they're getting as much data on us as they can. The more they know about us, the easier it is for them to be able to put us into different psychographic profiles and to serve us advertising to a certain extent. Both Google and Facebook have taken this approach of ingesting all this data about us. And I think there's just some deeper privacy trade-offs when it comes to participating in these applications, which in a lot of ways, these applications are absolutely amazing. And there is this legitimate trade-off between the benefits that you get from giving access to your data to someone like Google, but there's also a lot of trade-offs when it comes to the privacy implications over the long term. And the risks that are associated with anytime you allow any information to a third party, then there's risks that it could leak or it could get out on the dark web. And what does it mean for entities like Google to have all this information on us? So that's all that I have for today. And I just wanted to thank you for listening to the Voices of VR podcast. And if you enjoy the podcast, then please do spread the word, tell your friends and consider becoming a member of the Patreon. This is a listener supported podcast, and I do rely upon your donations and support in my Patreon as a member in order to continue to bring you this coverage. So you can become a member today at patreon.com slash Voices of VR. Thanks for listening.

More from this show