#1234: Geospatial Browsing AR Feature in Google Maps with Screen Reader Support

Ohan Oda works at Google on an AR feature in Google Maps called Live View that he was showing off during the demo session at the XR Access Symposium. It adds some screenreader support for geospatial browsing where users can search for nearby landmarks and businesses, and Oda wanted to get some feedback and raise awareness for these accessibility features in Google Maps.

I had a chance to catch up with Oda where he elaborated on how his team’s finding that most sighted users were not returning to some of the Google Maps features that had implemented the Virtual Positioning System (VPS) technologies. VPS extrapolate features from Google Streetmaps and translate it into an AR feature via computer vision that serves as a more precise version of GPS. They found internally that it was blind and low-vision users who were returning to these features more frequently because they were using them as an essential assistive technology whereas it was less useful for sighted users who could use other landmarks and street crossings as a way to orient.

Oda is wanting to lean more into the accessibility use cases of these XR features in order to be a technological driver of innovation with AR. He speaks about the default utilitarian design approach of developing new features to address the largest potential target audience, which means has meant that accessibility functionality has traditionally been deprioritized by management. He wanted to show off these features to the XR Access community, and give a heads up that these features may be coming soon. The features still have to work their way through the beta testing process before being deployed, and he’s hoping that applying these universal design principles to these XR features in phone-based AR will create new assistive technology use cases that drive more feature development in the future.

Full text of XR Access Symposium Poster titled "Lens in Google Maps with Screen Reader Support" Lens in Maps is a feature in Google Maps that shows places, iconic landmarks, and streets around you through your phone's camera. It also allows you to search specific types of places, such as restaurants, within walking distance. This experience matches your camera feed with Street View images to precisely locate where you are and the direction of your camera. This feature is currently available in select metro cities including NYC. Google is making this feature work with screen readers, making it possible for blind and low vision users to identify the places, landmarks, and streets along the direction of their cameras. When a place is centered, the name of the place, type of place, and distance to the place is announced. User can then double tap on the screen to find out more about the place. When a street near the user is centered, the user the name of the street is announced. We may also announce the direction of the street (e.g., North-South) to make it more understandable. Using the buttons at the bottom, the user can search for specific types of places, such as restaurants, shopping, etc. Places matching the type within walking distance will show up. One potential area of exploration for the future is screen reader support for the AR walking navigation experience so that blind and low vision users can get more precise navigation than the existing navigation experience that uses the GPS and compass.

This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality

Rough Transcript

[00:00:05.412] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR podcast. It's a podcast that looks at the future of spatial computing. You can support the podcast at patreon.com slash Voices of VR. So this is episode 13 of 15 of my series of looking at XR accessibility. Today's episode is with Ohan Oda of Google. He was at the XR Access Symposium showing off an AR feature called Live View, which is Lens in Google Maps. It's something that is using virtual positioning system. So it's using the augmented reality features and certain select cities right now. And it's still in beta and demoing what's possible. But something that's quite interesting about this conversation is that Ohan works at Google and generally accessibility is something that isn't always necessarily on the top priority list because most companies, when launching a product, want to reach the most users possible. So there's this de facto utilitarian reasoning that is always trying to get the largest audience. And there's other folks that are always kind of left out of whatever that majority is. In this case, folks who have low vision or blindness or other disabilities aren't necessarily always at the top of the list when some of these features are being developed. And so what they found actually was that some of these different features that they were launching, actually the folks that were coming back to them again and again, were folks who had either blindness or low vision. So Ohan was here to show off some of these accessibility features of this geospatial browsing. where you're able to search for different landmarks or coffee shops or whatnot. And as you scan around, it would tell you like how far it was and in what direction. So that's what we're covering on today's episode of the Voices of VR podcast. So this interview with Ohan happened on Friday, June 16th, 2023 at the XR Access Symposium in New York City, New York. So with that, let's go ahead and dive right in.

[00:01:56.417] Ohan Oda: My name is Ohan Oda. I work at Google on the feature in Google Maps, and our team works on an AR feature in Google Maps called Live View. The feature has been around for about three years. It's been introduced in 2019. It started with navigation and recently we introduced something called Lensy Maps which actually allow people to see things around them that are businesses, streets, they can even do search within those augmented reality features to find specific type of businesses like restaurants, shopping area, coffee, and those kind of things. So this has been available for select cities including New York City, San Francisco, Los Angeles, London, Paris, many various metro cities around the world. So one problem for this feature is this feature was designed for sighted people. And even though the navigation part has been around for three years, and this new geospatial browsing capability has been around for about a half year now, publicly, on both Android and iOS, but we never supported accessibility. Meaning that this AR experience is going to overlay information of the business, the street name, on top of camera, but those things are in AR and it was never announced to the user. So this feature was pretty much useless for people who cannot see or hard to see. So the project that I've been focusing on and I'm actually demoing at this conference is to make this feature accessible by supporting voiceover and also TalkBack on Android side. So currently it is in demo phase on iOS device with voiceover. and what it can do is as you move around your camera it is going to read out things along the direction of your camera so you will be able to get an accurate direction and also distance to those places along your phone. This capability I believe was not Possible with any apps before I know there's a bunch of apps like soundscape from Microsoft There are some other cues or a bunch of different apps around that existed before that could actually tell you things around you But it's using GPS and compass meaning that information Cannot be very accurate both in terms of location and orientation So it can just tell you in general what's around you, but you can't really tell which direction things are, and you can't really validate what's the story in front of you or across you. And this capability that we're introducing can actually do that, because we use a technology called VPS. What it does is, in addition to GPS and compass, we actually match the image from your camera against Street View imagery that Google has collected over the last decades. So those imagery from Street View cars have a golden pose, and that means it has a very accurate position and orientation for each of the images it has captured. So by comparing against your camera image with those images, we can get very accurate information about where you are and which direction you are facing. So combining those technology and the map data that Google Maps provides, we can give you a lot of information along the direction of where you're looking. So we hope this feature would greatly improve the capability of low vision and blind users to do geospatial browsing, which were not very easy to do or impossible before. So now they can go to a place or a place they have never visited before to understand what's things around them, including restaurants, coffee shops, shopping area, even like tourist locations like in New York City's Chrysler building or World Trade Center, those kind of places. So we hope this feature can help improve the quality of life for low vision and blind users. And eventually we're also trying to support the navigation part as well, which before it was based on your GPS and compass and it wasn't very accurate. So with AR technology, VPS technology, we hope that can actually improve the navigation parts as well. So then it becomes an entire user journey where you find the location you're interested in nearby, and then once you find a place that you're interested in, you can start the navigation and get an accurate instruction of how to get there. And one of the pain point that we hear a lot from low vision and blind user is that even though a lot of the map system gives you a instruction of how to get there and the route to get there, but the thing is, once you get there, it becomes very unclear where exactly the destination is. So we see a lot of those last mile problems. And hopefully this feature, because we can provide a very accurate location and heading, we can actually tell you which direction you're supposed to walk when you arrive at the destination. So hopefully, overall, whatever we work on, I hope that actually would be very useful for the community of visually impaired users.

[00:07:13.576] Kent Bye: Great. Maybe you could give a bit more context as to your background and your journey into doing this work.

[00:07:18.197] Ohan Oda: OK, sure. So yeah, I've been in the AR field for a very long time. And I did my PhD at Columbia University, actually, in New York City here. And my focus was on augmented reality. That's why I kind of work on this augmented reality feature in Google Maps, because that's something that excites me. And the thing is, the accessibility part that I'm really pushing on right now is because I feel this feature can really help those users. One of the problems for our existing feature is that for sighted people, they can still fall back to the 2D map if they want to find something or get something. So this is not a necessary tool for them, even though it provides accurate information about which direction you're looking at, it has a nice AR core feature. But the thing is, if we check about those kind of retention rate of how often people come back with this feature, it's not great. And we find actually people who really need this kind of feature, like people who cannot translate from 2D map to 3D map, For those users, they really come back because this is the kind of only way for them to know the right direction to go. Because even though they can see the 2D map and know which direction is supposed to go, but they can't actually convert that to 3D to relate to the physical world. So those people really come back to the feature and use it a lot. So we thought, what other type of users might be really in need for this kind of feature? I thought, actually, this could be a really great feature for visually impaired communities. That's why I thought, OK, we should now start supporting accessibility for this one. and it's also difficult to push toward this kind of capability because I know a lot of people have different ideas for side users and those are usually put on the higher priority for implementing features. So we actually thought about those accessibility a long time ago, but it was always being pushed to lower priority because they were more focused on supporting sighted users. So I thought it's finally a time that we should actually focus and make this work for people who are visually impaired. And that's why I'm kind of really pushing for this feature.

[00:09:30.393] Kent Bye: Yeah, and so the demo that you're showing here at XR Access, we were upstairs, so we weren't embedded into where buildings are around. And so when you were showing it, you were having different landmarks that were off in the far distance for places that you might be going. But is the idea that someone who is either blind or low vision would put in a location for where they're going, and then they maybe get Instructions for you know, maybe they need to get onto the subway and then we're getting out of the subway They just need to know what direction to go That's something that I find myself that when I'm in New York City I always get lost and I have to always look at my map I have to move a little bit look at the little blue dot and trust a little bit as to like what the orientation is So the idea here is that it seems like that you could start to pull up the AR features with what seems to have like AR core integrations with the virtual positioning system with the VPS and then As you turn your phone, you could get the audio feedback as to what direction you're going. So is the idea that this is mostly for people that they know where they're going and they're getting directions? Or are there other use cases that you've found just to help orient it and maybe explore spaces where they're discovering places that they might want to go?

[00:10:31.908] Ohan Oda: So it comes in two stages. The situation you described is probably the navigation journey, which is a part that we haven't supported yet and plan to support in the future. So the current things that we're implementing and going to release or launch soon to the public is more of a geospatial browsing part, which is the case where you still haven't decided where to go, but you kind of want to find a place where it might be interesting to you to go. And then you may actually dive into the navigation journey that you talked about. And that's where we would like to support as well. And for sighted users, they already use this for navigation purpose. And it's exactly what you described, is you are just out of the subway station, and your compass is messed up. So you always tell users to go the wrong direction. And as you notice, you realize that from your blue dot moving in the wrong direction, and then you face the other direction. So that is exactly the first problem where this feature was developed to actually solve because it gives you wrong direction and with this VPS technology by looking at the camera and comparing with the street view image we can give you an accurate heading right away.

[00:11:42.575] Kent Bye: Yeah, and I know that there's also other accessibility features built into the Android operating system. And so maybe you'd speak about some of the other different types of like either haptic vibrations. I know that there is some integrations for the user journey when you're walking. You can have a series of different haptic buzzes or whatnot that tells you to turn left or turn right. And so, yeah, talk about how the haptic aspects of the phone may also potentially start to be integrated into this as well. If people who are low vision or blind are using this, are there ways that you're thinking about how to integrate the haptic experience of the phone to help in addition to the audio cues, but if there's other subtle ways that you can start to communicate without necessarily always using language?

[00:12:23.688] Ohan Oda: Yeah, that's a great point. Actually, I did get a lot of feedback from visually impaired users who are testing internally at Google to provide different ways of giving feedback other than audio. And some of the conference presenters were saying that there is also competition of audio because there's so many things. So it's good to have different ways of giving feedback. And haptic feedback is definitely one of their case. The only thing we have to be careful is because we're using the camera image vision to detect where you are and also doing some of the intermediate tracking. So the VPS technology I mentioned is actually not performed every frame of the video. It is actually performed every whatever number of seconds to do correction. So the intermediate tracking between those intervals are using ARKit on iOS and AirCore on Android site to provide something called VIO tracking, visual inertial odometry. And those images rely on camera image, meaning that if we give haptic feedback, it basically vibrates your phone. And what vibration does to your phone is makes the image blurry. So we need to make sure that we can definitely give haptic feedback, but we need to adjust the strengths to make sure it doesn't interrupt the tracking. because then you get lost about where you are. And so it is definitely a good direction to explore, but we haven't explored too much. And because before we found out, probably because we just had too much strength on the haptic feedback, but actually it was interrupting the experience before. So this is something that we can explore and make it a better use of feedback.

[00:14:04.811] Kent Bye: Yeah, I was just talking to a Verge reporter and she was asking me why is it that the accessibility is so down on the priority list. And I said, well, it's a very utilitarian approach because you're trying to take the features that are going to serve the most people. And then rather than a utilitarian argument, you need to take more of a deontological or human rights approach that is trying to prioritize those folks who are not being served by the technology. So I'm curious to hear a little bit more elaboration on what was the shift internally at Google that had some of these accessibility features that have been deprioritized over time. Why now to be able to start to take a look at them?

[00:14:40.140] Ohan Oda: So the thing is, I don't have an answer for the general things of Google. But I feel like the thing is, as you said, for any products, I think they are trying to target the most use case and users first. That's where the sighted people are. And then they kind of think about this, how can we now support accessibility once it becomes popular and useful for sighted people? And that was really actually the initial problem of this AR feature was that we were not even successful with sighted people because we see not many people using it and not many people knowing about the feature. So that's why a lot of the focus went into supporting sighted people first, trying to make it successful on that area. And the thing is for why we are trying to support accessibility now for this feature is very specific to this one, is that because we're not seeing a big success to regular sighted people, as I mentioned earlier, I was wondering how can we find a specific target user that would actually come back to use this feature more often. And people who are visually impaired was kind of my thought, maybe those are people who really need this and they could actually come back to the feature and use more often. So that is kind of a very specific case for this feature. But in general, obviously Google is trying to make products more accessible. And there are even teams internally that actually does an evaluation of how accessible each Google product is. And you probably know that Google defines something called GAR, which simply indicates every app is supposed to have a level of GAR 4. And what it is, it's just a guideline of how Google defines things to be accessible. And it has different levels. So Google, for every product, has a minimum requirement of Guard Level 4 before it gets approved to launch. But the thing is, simply it asks for things to be accessible, meaning something, if you have an image, it should be read out, and those kind of things. But it doesn't always mean it is useful. For a lot of cases, it actually does read out things, but it sometimes doesn't make sense. And there's another team within Google that actually evaluates whether the current level of accessibility is actually useful. And then they actually give a suggestion to each product about what they can improve. For example, there's detailed information in Google Maps where you can access, and it has a bunch of photos. Right now, unfortunately, it just says photos, photos, photos as you go through. And obviously, that's something that the team has found out and actually giving feedback to Google Maps team saying, hey, you better make it better than just saying photos because it doesn't tell them anything. So there's a lot of things. I believe still a lot of the accessibility features are deprioritized compared to things targeted for sighted people, because that's really the major user base. But for case by case, maybe something works better for visually impaired users. In those cases, they can put things in more high priority. But it's always difficult. things to argue because there's limited resources, because of the current layoff in tech industry. A lot of teams are losing people and that actually narrows down their resources even more. So yeah, it is really hard to convince leadership to actually, hey, let's prioritize accessibility. And I think it's really case by case.

[00:18:09.885] Kent Bye: I'd love to hear if you have any reflections on some of the emerging developments of what's happening in artificial intelligence. We have things like neural radiance fields that are able to take these Google Maps images and turn them into a neural net that has an ability to basically generate that imagery from many different perspectives. I'm not sure if that's some of the core technology for the virtual positioning system. Yeah, I know that there's just a lot of other computer vision innovations that I'm sure that you're integrating in different dimensions. So yeah, I'd love to hear any other reflections on the developments that are happening in both computer vision and artificial intelligence, machine learning, and potentially something like the neural radiance fields, and if some of that is being integrated into these products.

[00:18:48.365] Ohan Oda: Great. So I'm not sure if you had a chance to look at the Google I.O. this year, but there was actually a feature that's been announced. I think it's probably on search on last year. In last September, Google Maps announced a feature called Immersive View, which is another new feature. It is actually our sister team working on that. My Umbrella team works on Street View, Live View, and now there's a new thing called Immersive View. What it does is it uses AI to regenerate the 3D model of the popular places in the world using a high-end Unreal Engine 5. Obviously, your phone cannot run Unreal Engine, so it is actually doing those rendering on the server side. And basically, it is using those AI to reconstruct the scene with a very beautiful geometry, more accurate geometry, and also add additional simulation, like how the water look like if it rains, how the rain look like, and what the traffic look like at different hours of the day. And it has a lot of good things. And it also uses some of the NARF system. I'm not sure if that's

[00:19:58.135] Kent Bye: Yeah, the neural radiance fields is NERF, yeah.

[00:20:00.217] Ohan Oda: Yeah, so yes, we do use NERF to do transition between things, especially one of the interesting features introduced in Google I-O, or Search On Event, was that going from overhead view to indoor view. And that, obviously, we don't have any images, so we need to use those kind of AI technology like NERF to interpret from one viewpoint overhead that's actually looking from high above the sky. to inside of a store. And that one has a NARF technology used. And actually, some of the movement within the store is also using NARF based on a bunch of images, Street View images taken in the store. So it is a very cool thing. You can actually find a video of it outside in YouTube or something from last year's search on, or even this year, Google I.O. announced something called immersive view for routes, which actually give you a route summary using immersive view. So it's a very high detailed 3D reconstruction of the city with your walking paths or bike paths. And they actually even tell you at the hour you're going, it could be a rain. So it's going to actually simulate a raindrop there. So it looks really cool. So we use a lot of those AI technologies to make their Google Maps better.

[00:21:16.834] Kent Bye: And are there any accessibility integrations to things like Nerf or other things that you've started to think about, this immersive view or other things? It sounds like a different team than what you're doing with Street View and the live view, but I'm just curious if you see more of a fusion of these different platforms and if you see that there's room for different types of accessibility use cases for some of those other immersive views.

[00:21:37.877] Ohan Oda: Yeah, so that's a good point. And actually, my purpose of demoing this feature in this conference and actually making the AR feature accessible is actually one way to motivate other teams in Google Maps to make accessibility their high priority. Because as I said, because of the limited resources, it's usually put on lower priority. But if I can show this could really be useful for people who are visually impaired, then that becomes a big motivation for other team to follow. So I'm hoping that actually I can make that happen.

[00:22:15.437] Kent Bye: Great. And yeah, I'm curious what happens next with this feature. I understand it might be entering into a private beta. And then when can folks from the general public expect to potentially see this available?

[00:22:25.812] Ohan Oda: So the thing is, yes, we did some internal testing within Google. We got a very positive feedback. But the thing is, you probably know that doing user research with visually impaired users is really hard because of the number of people who are visually impaired is small. And they are usually being overwhelmed with this kind of request to do testing. more difficult to do that. So even though we're getting feedback internally, the number is small. So we're trying to reach out to external people that Google has resources to reach out to those community and do some testing there. And once we're convinced this actually is really helpful, and also we are in a place where we are ready to launch, then we will be ready to launch at that time. But in terms of the timing, it is a little bit hard to say. I guess it's also depending on some of the marketing stuff, and there's all sorts of things involved other than just the technology being ready. So it's hard to say when it's going to be ready, but it is going to be ready soon.

[00:23:32.167] Kent Bye: Great. And finally, what do you think the ultimate potential of these immersive technologies and accessibility might be, and what it might be able to enable?

[00:23:42.468] Ohan Oda: So the thing is, even within the conference, I see a lot of the talk about VR accessibility and not much on AR accessibility. And the accessibility in this AR field is really probably just started to happen. I haven't seen a lot of or any AR application that does really do a good accessibility support. So the thing is, I hope this becomes a starting point, where if I can show we can actually do well with accessibility in AR, then I hope we could make a standard for how we should be supporting accessibility in AR applications. That's kind of my goal.

[00:24:22.277] Kent Bye: Great. Is there anything else left unsaid that you'd like to say to the broader immersive community?

[00:24:26.673] Ohan Oda: One difficulty for this kind of feature is that it actually makes those features known to the visually impaired communities. So I hope that when we release this feature, you would actually really try it out. And we hope you can provide us more feedback to make it better.

[00:24:43.340] Kent Bye: Awesome. Well, I was really excited to see the different features here and to have a representative from a big company like Google here talking about some of the different accessibility features that you're working on. And yeah, thanks for coming out and showing it off and helping to tell the story and unpack it a little bit. So thank you.

[00:24:58.272] Ohan Oda: Yeah, thank you very much, Kent.

[00:25:00.313] Kent Bye: So that was Ohan Oda. He works on the AR feature called live view and lens in Google maps, and he works at Google. So I've a number of takeaways about this interview is that first of all, well, lots of really fascinating information here that I think reflects what generally happens across the industry, which is that it's difficult sometimes to have management agree to prioritize different aspects like accessibility. And so usually they're trying to push out features that are going to be used by the most possible people. But I think the interesting trend here that I want to just elaborate here is that there's some of the cutting edge of virtual and augmented reality technologies that the primary use case is going to be folks who have visual impairments. And so using the XR or augmented reality as an assistive technology is something that's going to potentially drive innovation in a way that have this high engagement and continue to solve a very real problem that a lot of people have. In this case, this last mile effect. They didn't have navigation implemented yet. So it wasn't from end to end navigating, but in this case, it's at the very early beginnings of using the virtual positioning system. And it's just starting with geospatial browsing. And so it's less helpful in some sense, because, you know, you're able to understand where to start walking. But I think ultimately that navigation integration to be able to have the virtual positioning system. Integrated into this navigation is ultimately where things would need to go in order to have it ultimately useful for people to actually get to these different locations. You can understand where they might be, but unless you're able to actually get there, I think that's the next step that will need to also be there as well. And navigation is something that you could already use. Some of the 2d modes and it has the existing accessibility features in there, but there is this kind of last mile problem that Ohan was referring to, which is that once people who are blind or low vision gets to some of these locations, sometimes it's difficult to know exactly where to go. So that's where that other part of the navigation is going to have to start to come in there. So yeah, just some interesting reflections that I think are not necessarily unique to Google or any other company that are out there. Just kind of like stating the facts that this de facto utilitarian thinking that pervades a lot of these different companies, you know, that's something that, you know, I have been susceptible to that same type of utilitarian thinking over many years now. And I'm starting to have basic functionality like transcripts available for folks. to have other modalities to be able to have access to some of these different conversations that I've been doing here on the Voices of VR podcast. So yeah, just really fascinating to hear from Ohan as a Google employee and just trying to get awareness to these different features and to show it to this community and to promote it and to get it onto the radars of these different folks. And, you know, as it starts to come out and like you said, it's still in beta, it hasn't been launched yet, but as they continue to potentially solicit beta test users and get more feedback then yeah just these different types of features to be on the bleeding edge where folks who are blind or low vision can be the driver of technological innovation when it comes to some of these different features if pushing the bleeding edge of technology isn't Engaging some of the sighted users because they can still see and like when I get out of the subway I can just see which ways north which way south just to see in the distance what the signs are and also look at the blue dot and start to understand what direction I'm going and and whether or not I'm going in the right direction or not and that's not always easy for folks who are blind or low vision and so having these different types of augmented reality features. Another point that Ohan made is that a lot of the talk at these conferences is very specifically on virtual reality and I'd say that's also true for Augmented World Expo which is head-mounted emphasis of AR as well as for VR but not as much mobile augmented reality which is something that is already well distributed and out there. And so, yeah, just emphasizing the importance of how to make these existing mobile augmented reality applications more and more accessible. So this is a great use case for some of these different type of screen reader integrations, where as you're moving the phone around, you're able to get this geospatial browsing that is able to point you in the direction of these different landmarks or locations that you may want to go to. And just a shout out to the neural radiance fields, nerfs that are in the immersive view that he said is one of his sister organizations. They're starting to integrate more and more of these computer vision and artificial intelligence views, something that also going from the overhead view into like the inside view and doing this kind of like seamless transitions. And so. Yeah, just to see as things move forward if there's continual uses of things like nerfs or if the virtual positioning system is already using something like nerfs in order to come up with these different features. It's basically like a GPS that's way more specific because it's using the augmented reality features within the phone. And yeah, also just fascinating that when you start to integrate haptics into that, it can shake the phone and be in direct conflict for this type of ARKit or ARCore augmented reality computer vision applications that are happening. When you start to use haptics, it may disrupt and shake the phone and make the image blurry and kind of break the overall functionality of the feature in the first place. So yeah, that's something that they have to either abandon the use of haptics or just kind of dial it down. So the intensity is not so high so that it's actually shaking the camera and getting blurry images. Alright, well, that's all that I have for today, and I just wanted to thank you for listening to the Voices of VR podcast. And if you enjoyed the podcast, then please do spread the word, tell your friends, and consider becoming a member of the Patreon. This is a listeners-supported podcast, and so I do rely upon donations from people like yourself in order to continue to bring you this coverage. So you can become a member and donate today at patreon.com slash voicesofvr. Thanks for listening.

More from this show