#954: Stanford Study Shows Motion-Tracked VR Data Can Be Identifiable

Stanford University has just published an important research paper that hows how motion tracked data in VR can be identifiable of specific users. The paper titled Personal identifiability of user tracking data during observation of 360-degree VR video was published in Scientific Reports on October 15th with authors including Mark Roman Miller, Fernanda Herrera, Hanseul Jun, James A. Landay & Jeremy N. Bailenson.

I had a chance to catch up with Miller on October 12th to summarize their major findings that included a 95% accuracy in being able to identify one of 511 different participants from a 20-second sample size from a 10-minute session of watching a 360-video, and then rating their emotional reaction using the HTC Vive hand-tracked controllers. Even though they’re watching a 360 video, they have access to a 90Hz feed of 6DoF information from the head pose in addition to two 6Dof-tracked hands. From this basic motion-tracked data, they’re able to extrapolate a unique signature of someone’s body size, height, and nuances of how they hold and use the controllers, which ends up being enough information to reliably identify someone given the right machine learning algorithm.

I talk with Miller about the experimental process and analysis, as well as some of the implications of this study. Currently this type of motion tracked data is typically considered to be de-identified data, but research like this may start to reclassify motion tracked data as personally-identifiable and potentially even classified as biometric data. We also talk about how specific medical information can also be inferred from the recording on this motion-tracked data. There’s more ways to make this type of research more robust across multiple contexts over time, but it’s generally pointing to the possibility that there are some immutable characteristics that can be extrapolated and inferred from this data.


This is a listener-supported podcast through the Voices of VR Patreon.

Music: Fatality

Rough Transcript

[00:00:05.452] Kent Bye: The Voices of VR Podcast. Hello, my name is Kent Bye, and welcome to the Voices of VR Podcast. So in today's episode, I'm going to be talking to a PhD student from Stanford who's been working with Jeremy Bailenson on the study that was just published in Scientific Reports. So the study is called Personal Identifiability of User Tracking Data During Observation of 360-degree VR Video. Now, the observation of 360 video throws you off a little bit because you might think, OK, well, that's just three degrees of freedom. Well, they're actually having people with an HTC Vive and they're also tracking people's hands as they're reading different things within this video. So it's actually going to be 18 degrees of freedom between the head and the two hands. And from that data of just 10 minutes, they're taking like a sample size of that to be able to train this AI algorithm that is categorizing different aspects of your body. And then from that, they're able to run over 500 people through this experience and then get 95% correct of identifying the correct person based upon some core identifying information. I think the larger context is that a lot of this motion track data is considered to be de-identified. It's not like taking a picture of yourself. But what research like this is showing is that it's kind of like taking a picture of yourself, especially because there's lots of immutable aspects of your body that can be captured with some of this motion track data. So that's what we're covering on today's episode of the Voices of VR podcast. So this interview with Mark happened on Monday, October 12th, 2020. So with that, let's go ahead and dive right in.

[00:01:45.459] Mark Miller: Yeah, I'm Mark Miller. I'm currently a fifth year PhD student at Stanford University, which is crazy to say, both the five years and the Stanford part. I do research with Professor Jeremy Bailenson. He is with the Department of Communication at Stanford. But my track, my program, is human-computer interactions. So I'm technically part of the computer science department. And as will become more apparent as we talk, there's a lot of this confluence between the behavior, the media and the technical sides of things that I like to build on.

[00:02:22.144] Kent Bye: Okay. So your, your background is actually in computer science. Cause I know Jeremy's in the communications area, but so is he your advisor or maybe you could give a little bit more context as to your background and your journey into VR.

[00:02:34.748] Mark Miller: Yeah, so I'm co-advised by James Landay in the computer science department. So yeah, as I said, there's a confluence there of technology and behavior. But my track into virtual reality started probably more with augmented reality. I remember in high school hearing about this thing called augmented reality and, hey, you print out this sheet of paper and you put it up in front of a camera and there's this 3D model that appears on it. And I thought it was pretty cool. I've had a history of liking things like Pokemon, Minecraft, these video games that seem to create this alternate world, so to speak. And I think when I saw augmented reality, I really liked that as a medium and a way of creating these worlds that we learn about ourselves through. So it's always stuck with me. I remember my freshman year in college, I wanted to make phones be able to put virtual content in places. The idea was very much what ARKit was about. I thought I could tackle that problem. Then I learned that's a really, really, really hard problem. And it takes a lot of years and some very, very talented people to make that happen. But it got me really excited about what augmented reality could do. And then as time goes on, I had another chance to do research with virtual reality. And this is at the University of Illinois, where I did my undergrad in the computer science department. And I had a chance to work on a project with Professor David Forsyth. and a master student named Pulkit Budhiraja. And we were working on a way to warn somebody if they're wearing a VR headset, but something's coming at them in real life. The way we set up the system was, hey, wouldn't it be nice if you're walking around like a construction site, maybe something goes wrong, How do you get someone to duck really quickly in VR? Do you have something come at them in VR? Do you have a red warning light? Do you have nasty sounds that make somebody take the headset off? So it was an idea that we kicked around for a while. And I really liked research because, hey, I was able to take the skills that I had been learning in my classes, in the same things that I kind of did in random little personal projects, programming something up, and have them directed at an interesting research question. So it's like, hey, I get to work on more interesting questions than what I would be doing normally. And it's the same kind of day-to-day work that I really appreciated. So I started to consider research as an avenue that I wanted to pursue further. And then the short version of coming to Stanford is that I had a chance to work with a couple of different professors. And what I ended up liking about Jeremy's lab, in addition to the people there, was the confluence of the day-to-day work, but also the purpose of that day-to-day work. I could sit down, read papers, write some code, but also add something that I felt was valuable to the world.

[00:05:49.877] Kent Bye: Yeah, well, I know that Jeremy Bailenson, he gave me a heads up that this paper was going to be coming out, this paper that you were the lead author on, Personal Identifiability of User Tracking Data During Observation of 360-Degree VR Video. So for me, when I read it, I was like, oh, wow, this is like the bare minimum of data that you could think of in VR. Like, it's just your head looking around. And what you're claiming in this paper is that as you're looking around, then that data just for how you're looking around in a video could be enough information to identify you, even if it's potentially even three degrees of freedom, or if it's six degrees of freedom, we can get into some more nuances of that. But the larger context for me, at least, is that when I talk to Joe Jerome, he says that the existing privacy law creates these different classes of data and how it's treated. And there's data that are personally identifiable and data that isn't personally identifiable. And that creates these bifurcation for how you even treat that data. And part of the implications that I suspect with this type of research and as research goes forward, data that we may not consider to be identifiable or something that is de-identified may actually turn out to be identifiable if it's applied with the right machine learning algorithm. So that seems to be the thrust of what you were able to prove here, at least in this sample size and this constraint conditions, that you're able to potentially have people watch a 360 video and re-identify them. Maybe you could take it from there and set the larger context for how this came about and this whole journey of doing this specific research.

[00:07:24.542] Mark Miller: Yeah, so the way I describe the study, if I'm explaining it to someone for the first time, is I go through the procedure of what we did. So we had over 500 people come in, took part in this study. Each of them watched five different 360-degree videos out of a set of 80. Originally, this study was meant to be getting a bunch of these emotion ratings on these videos. Does this video make you feel happy or sad or calm or intense? Each clip is 20 seconds long. There are similar databases for regular videos and for still images in psychology. Say if you really want somebody to be happy when they do this part of the study or something like that. So we wanted to do that in 360 degree video. So we had a lot of people for that kind of process. So somebody comes in, they watch five different 360 degree videos, they rate how the videos make them feel. And just as a standard thing that we do in the lab, we very often track people's motions in VR. So maybe we'll talk about this a little bit more, whether it's three degrees of freedom or six degrees of freedom. But in a sense, it's tracking somebody's head and hands in 3D space. And yeah, I'm very happy to go into the technical details of that.

[00:08:47.067] Kent Bye: Just to clarify, you were having them watch a 360 degree video, but you were also tracking their hands as they were moving around. Okay.

[00:08:53.592] Mark Miller: Exactly. Yeah, so both head and hands position and rotation. So we have all this data, it's tracked at 90 times per second, 90 Hertz, and we get these 18 degrees of freedom. Then after we had collected all this data, I think it was around the time Jeremy was either working on or had published the JAMA article, the Journal of the American Medical Association article on children's tracking data in VR. We were thinking a lot about privacy and being the computer science person in lab, he goes, hey, you know, why don't you try throw some machine learning at the problem, see if you can identify people with this data. And originally I was pretty cynical about it because the machine learning process in this case, it was, we were dividing up the data. So people watch five videos. We take four of those videos, how people are moving in four of those videos. And that's sort of what the machine learning system can learn from, so to speak. Air quotes on learning.

[00:10:00.510] Kent Bye: So- The learning is that you're also coding it too, right?

[00:10:04.554] Mark Miller: So, that's a good question. And I'll get to that in a bit. I kind of want to finish off the loop on how the participant went through here. Because that is a huge topic. And I'd like to spend some time on that. So, the machine learning algorithm is given 80% of the data. It's able to learn from it. And I'll talk about more what that means in a bit. But the end result is a function, a mathematical function that takes in this other one session. So it takes in as input another 20 seconds of motion data, and then outputs its best gets of who the participant is. So originally, the data is labeled. So you have four sessions from participant A, four sessions from participant B. And the system can learn from that. And then it outputs this function that when you put back in a little bit of tracking data, it says, oh, hey, I recognize that person. That's participant Q. And the end result of that system was putting in those clips, those 20 seconds of motion data, The system recognized who that person was out of the 500 participants that was in this study. The system recognized who that was 95% of the time. So, by chance, you would expect it 0.2% of the time, one out of 500. So, you know, it's not randomly guessing and getting it right some of the time. It's, yeah, picture taking a 500 element multiple choice test where each question out of the 500 questions has 500 possible answers. and scoring a 95% on that test. And that's pretty sizable.

[00:11:44.489] Kent Bye: Yeah. So I guess, you know, one of the immediate questions with machine learning and it's like overfitting the data, but you're taking these videos that are a certain length and you're taking just 20 second samples. You're only training it on what I understand is 80% of all the data. So you're, you're kind of feeding 80% of when you say, is it like the entirety of all of the data or is it these random samples of 20 seconds. I'm just trying to think about how would you go about trying to prevent the overfitting of something like this with this small sample size so that you pretty much always get the answer right, but make sure that it's being able to have a robustness as to not be so, like it only works with these 500 people and anybody else that comes in, it would be like 0%.

[00:12:27.550] Mark Miller: Yeah, so that's a good question, right? They call it the bias-variance trade-off. You can have a simpler model that's less likely to overfit. There's less parameters flying around. But you may not capture the signal that you want or vice versa. You can have many, many, many, many, many parameters, many, many choices. And it will fit everything down to the noise and then the signal that you get is useless. Are you going to recognize somebody based upon the color of their hair and only the color of their hair? Or are you going to recognize somebody based upon the clothes they wear? That would be just kind of focusing on the color of their hair is very biased. It's too narrow. You're only considering one parameter. But recognizing someone on their clothes, in a different outfit each day, that would be overfitting. So, the very, very basic thing to do is to make sure you aren't training on your testing set and you get that difference. But even then, you know, it's tricky. I don't think there's any hard and fast rule to make sure you're not overfitting. I guess there's the next step is to break up the data into three steps. The first one is training. The second is validation. So, it works like a testing set, except it sort of tells you when to break off training or not. And then testing is sort of the final result. And it made me uncomfortable as a computer scientist, as someone who is very mathematically oriented, that there's just so much fuzziness when it comes to machine learning. Now that I've gone through the social science lines and swung back to where everything's super ambiguous and nothing is as reducible as it is in computer science. Swing back to machine learning, it's like, oh, yeah, this is fine. At least we're not asking questions of people where how the weather was that day might affect their answer. But, I mean, that's always a concern. I feel fairly confident in it in that that 95 is not fitting to noise where it should be 10%. One thing I will add, because your question said, hey, what if you pull someone else in? This system is trained only on the people that it's seen. So, if we had someone new come in and we sort of tested their tracking data, it wouldn't be able to recognize them. we didn't program it to say, hey, this is a new person we haven't seen before. The way we sort of set this up was an identification problem rather than an authentication problem. Instead of someone coming in and sort of it working like a password where you move like somebody and then it gives you, hey, here's access to your bank statements. It's more advertising or just sort of a situation in which you're identified, maybe you have preferences that automatically get filled out or something of that sort. But I mean, the lesson here, right, is that motion data is identifying. And I'm very confident that will hold outside of this sample size and outside of this task that we were doing.

[00:15:31.484] Kent Bye: Yeah, because I mean, there's already existing studies around gate detection. So the way that you walk and the VR privacy summit that happened at Stanford back in 2018, I was one of the co-organizers with Jeremy and in the discussion there, someone had brought up the bone length in terms of the length of your bones is pretty well set. So that within itself, when you start to look at someone's resting position, do you imagine that that is a part of it is like, the way that our bodies are constructed in the sense of just the size of our bodies can be enough to be able to determine your head position relative to where your hands are at.

[00:16:08.476] Mark Miller: Exactly. Exactly. So this is a nice transition into once we found that this data was identifying, of course, we wanted to know, OK, what is this system using? And so I can go a little bit into the technical details of what we did. So the primary machine learning model that we used was something called random forest. And those of you that may know a little bit about machine learning and a little bit about neural networks and all that may recognize random forest as a sort of Maybe not outdated, but a little bit of a lower power model. This is something that, you know, it's still in use, but it's kind of like a first pass approach. It doesn't require a lot of data. It's a little bit closer to the lower power, but less likely to overfit side of models, a little simpler model. So one of the nice things about random forests is that you can ask it what features it uses often. The basic idea of a random forest is a classification tree. So my wife and I need to ship a bunch of packages, and I was estimating how much these packages would cost. And you can make a simple decision tree. You could say, hey, is this package more than a pound or less than a pound? Well, if it's less than a pound, we should do this particular thing. If it's more, hey, maybe I'm going to ask another question. There's going to be another split in this tree. I might take another path. And so, what the random forest is doing is saying, hey, here's a moment of data. Let's look at the Y axis. So, the vertical position of this person's head. Which, you know, spoiler alert, that's strongly correlated to someone's height, which is pretty consistent across when someone's in VR and outside of VR. Your height tends to be pretty stable. So the random forest looks at that and says, hey, if it's above a certain amount, if it's above five feet, then here's the 300 people that it might be. And if it's below, here's the 200 people it might be. And it sort of breaks its way down, oftentimes splitting on height, but sometimes splitting on other things, maybe the hand position or something of that sort. And so one of the ways to measure how important a feature is in a feature in this case is the position and rotation that the 18 degrees of freedom that we were talking about earlier. That comes from the fact that position is three dimension. We live in three dimensional space. So the way this is set up in the VR program is you sort of have a vertical dimension, how high or low is someone's head, and then two horizontal dimensions, how forward or back is someone and then where is their head sort of side to side. Then there's also three rotational dimensions. There's a lot of ways to sort of pick and choose how you want to measure those, but the way we did it was yaw, pitch, and roll. Pitch is what is the direction your head moves when you nod yes. So this sort of up and down, tilting up and down. Yaw is the direction your head moves when you say no, sort of side to side. And then roll is the other one where you make your ears touch your shoulder, basically, back and forth. And so each of these variables are measured and are tracked 90 times a second. And that's true for the head in addition to each of the two hands. So you have six dimensions, which it's three position plus three rotation. So six times three body parts, the head and the two hands. And you end up with 18 points, 18 degrees of freedom. per each snapshot that is taken, and those snapshots are taken 90 times a second. Going back to the original question, which is, how do you tell what's being tracked? Is it the bones? What might it be? I can definitely say height is a big factor, because very often the random forest would use the Y position, how high or low your head is, as a way to tell people apart. which it sounds very obvious in hindsight, but the first time it popped up to me, like I said, I was cynical. I didn't think that this would be effective, but I don't know. It's those features that fly under the radar that are often very, very useful to the algorithm. So I thought that that was worth remembering in the future to be a little less cynical about what you can pull out from data. So I think it's very true that bone length, because the next most predictive features were the Y position, so how high or low, of the hands. And so, did people have their hands down when they were watching the video? How long were their arms? Did people have their hands sort of crossed over? These features, I think, very much tie into what you were talking about with biometrics and bone length and things like that, and makes VR data pretty well identifying, kind of almost just as gait.

[00:21:05.497] Kent Bye: Yeah, as I listen to the constraints of this, I guess the more skeptical mind to really stress test it would be like, I would love to see everybody that were like roughly the same height and maybe similar bone length to see if there's other aspects of how people are actually moving in these experiences. And also just from knowing that from a variety of different VR experiences and different contexts, you know, there may be different behaviors that are being evoked that is there a fundamental signature that I have that is kind of transcendent going from Beat Saber to like Tetris to watching a 360 video. I mean, there's lots of different contexts that people are in and you're looking at 360 video and those 360 videos, you have some variants, but you know, maybe if one's way more engaging and you're looking around more. So do you have any sense in terms of the motion signatures that people may have if something like that was able to be catalyzed in this model above and beyond the bone length of the other biometrics of the height and all that other stuff.

[00:22:06.863] Mark Miller: So the study that we've done, and especially choosing the random forest as kind of our method, you really bias towards these static features, these things that stay constant across time. It's kind of hard to pick out anything else over the course of 20 seconds. You know, you can't gauge how likely someone is to jump back from a virtual monster or something of that sort. I suppose those things would manifest themselves over larger datasets. In addition to the biometrics, we had the next strongest features was the hand controller tilt. In particular, not the direction someone was facing, not the horizontal direction, the yaw. the left-right rotation, but the other two, which really threw me off when I saw this. I've done a fair amount of tracking data work in the past, and very often the role, this sort of side-to-side tilt, you know, where ears touch your shoulders, it's not something that people do naturally. It's not something that you see as a useful signal. It's not that useful of a signal. So, I was very surprised to sort of see that. But the more I thought about it, and this involves taking some of the data, plotting it, saying like, hey, how are the hand controllers rotated over time? And sort of seeing that playing back in a sense. And realizing that the way people hold hand controllers is somewhat idiosyncratic, at least over the 10 minutes that we had them in the study. So that's another caveat to mention is that we had people come in, we recorded them for 10 minutes, and then they left never to be seen again. Because this wasn't originally an identifying data study, we don't have people coming in a second time, hey, maybe instead of wearing flats, they're wearing high heels instead of a big bulky coat. It's now a t-shirt or something of that sort. So I think that 95% accuracy would be lower if we had people come in over multiple days. But I think it would be down to like 90 or 80%, which is still pretty concerning when you have 500 people to look at as opposed to trying to guess someone out of 20. So what else might be trackable? So I very often see Roll as the strange middle child of the rotation variables, so to speak. No offense to the middle children out there. but it's one you just sort of don't see as often. So I was surprised to see it useful in this context, and I found it was how people were holding their hand controllers. Turns out that's idiosyncratic. Some people might hold them down at their sides, sort of facing forward. Some people have them pointing down. Some people have their arms crossed. Is it left over right, right over left? How do you have the hand controller? Is it under your arm? Is it over your arm? And that was something that would be fairly consistent between days, between sessions, but is not just this sort of biometric, how long are your arms, how tall are you sorts of questions. And again, like I said, the way we've set up the algorithm is biased towards the static features. So for example, if somebody were playing a baseball VR video game, we couldn't use the same measures and you wouldn't necessarily be able to take what you've learned watching 360 video and immediately convert it into baseball VR. But hey, instead of how somebody holds the controllers in a 360 video, you now have how someone holds a bat. And maybe there's similarities. I don't know what those similarities might be. I would guess that there's similarities. It's just my experience with looking at people's behavior. It's so strange. All the things that you don't notice until you start looking at the numerical data. So yeah, I don't know what that relationship might be, but I'm fairly confident there is a relationship.

[00:26:11.780] Kent Bye: Well, back a number of years ago, I did an interview with OpenBCI's Connor Russomano, the Brain Control Interface Open Source Project. And, you know, he was telling me back then that he suspected that EEG data has like these biometric fingerprints. So that if you look at the EEG, then that's somebody's brainwaves that you could start to identify that information. And ever since I had that conversation, it was sort of like there's this assumption within the VR industry that a lot of this information is de-identified, it's not identifiable. But I sort of had this sense that given enough collection of that data and the right machine learning algorithm, that it most certainly would be fairly identifiable, especially when you start to combine these different strands together. And when I talked to Joe Jerome, one of the things he said around biometric data, as an example, like biometric data, I've been sort of maybe misusing it in the sense where biometric has a very specific legal definition, which is that it's information from your body that can identify you. Like your facial features that can be used with facial recognition would be like a biometric piece of data. But as we move on, what it seems to be is like all this data that we're radiating from our body could potentially eventually be classified as biometric data, even if right now it has a definition that doesn't really even fit cleanly into any of the existing privacy laws at all. It's sort of just data that we're radiating But yeah, I don't know if there's a deeper thrust here of the previous research of trying to do this, that you're building on top of in terms of the larger context or why this type of research is important. It seems like it's feeding into like what data are recorded and if that data can identify us in the personally identifiable data is treated differently. And so maybe you could get into a bit of that other research that's on this line and then maybe some of that larger context for why you think this is important.

[00:28:03.899] Mark Miller: Mm-hmm. Yeah. So one person to definitely mention, I think Joe mentioned her in his podcast, Latanya Sweeney is a professor at Harvard and she kind of kicked off this whole, wait, you can actually pretty accurately de-identify data tack. Originally, it was merging two kinds of database style data. So, zip code and date of birth. And showing that that combination are really effective at sort of narrowing down into a small enough pool that oftentimes you can figure out if you have two lists that both have those variables, you can match them up. And I think the I'm trying to remember the example that she used, but I think it was merging some voter database with anonymized medical data in a study and showing that because you had both sets, you could figure out who took part in this study and who has this medical condition. And it was a really excellent demonstration of the fact that identified data is not quite as straightforward as we think. And so it's very much in that line of work saying, hey, this data, we thought it wasn't identifying, but it turns out it is. And the sort of other thread that we had to tie in is a couple other researchers, we're talking about maybe five or 10 research papers on this topic, have used virtual reality tracking data to make some sort of motion password was one of the studies, authenticate, sort of make sure somebody's not walking off with your AR headset, something of that sort, again, using gate data. And one thing that stuck out to me when I was looking at this research was that the assumption was, hey, we have to pick a good task. We have to say, hey, do this thing. And so, we can get identifying data. So, one of the examples was having someone throw a ball in virtual reality. Saying like, hey, that's the sort of password box in VR. And the implicit assumption there is that you're not generating, or I liked how you said it, you're not radiating this identifiable data as you go about your day. And so that's the framing that we wanted to change, is that this data, you can throw it in a machine learning algorithm and it identifies you, but it's also data that you wouldn't necessarily expect would be identifying. Maybe you would expect it if you're well-versed in how people walk is identifiable and that sort of stuff. But I mean, it was a surprise to me. Like I said, I was originally cynical about this line of work. Jeremy wasn't because he knows better. But, you know, I think it's an excellent example of that thread.

[00:30:50.139] Kent Bye: Well, I think part of the complaint that someone like Joe Jerome as a privacy advocate would have is that, you know, giving access to these data sets sometimes is a big challenge to be able to like de-identify it. And I guess I didn't have time to really dive into it in my conversation with him, but where I would push back is that it seems like virtual reality is a context under which on your end, you're getting this raw 90 Hertz data stream that is like pure. It's like the actual data that is being put into this experience. But a lot of times when you actually translate that data into a virtual reality experience, if you're in a social VR experience, you could potentially be watching somebody and using like more a 2D capture of someone and then inferring and extrapolating this through like all sorts of other photogrammetry. I mean, it'd be much more computationally intense to be able to take that degree of data. Maybe it's only at 30 frames per second or 60 frames a second that you could actually record it, but you could potentially start to get all this other nuanced information just by observing other people within these virtual environments, which would imply that as you're in a VR environment, then even if you're trying to be anonymous, then somebody could be able to observe you and then maybe observe you in different contexts where they know your identity and potentially unlock your identity. We can already kind of do that when we know somebody and we know how someone moves, we can sort of see someone walk across. a hundred yards away and maybe identify somebody. And then if you really know someone, you can see what their mannerisms and their behavior are, and you can start to identify people. So if we can already start to do that without any additional technology help, it's just a matter of getting the right machine learning algorithm to be able to do that. To me, it feels like that complaint that this is more of an academic context is different in VR because you could potentially have people starting to record this information and then use that as the key to unlock people's identities.

[00:32:46.313] Mark Miller: Yeah. The example that I think of when I think of that is I grew up in a relatively large family. I have three little brothers. And so it was fairly easy to tell who was running down the stairs. Everyone runs down the stairs in a different way. It was my mom, my dad, which the brothers. And that's just something you sort of pick up. You realize you're sort of leaking this identifying data in a sense. I think the system that you would use to give in a video recording or you're in VR, you're seeing someone, that is a step. It's not an easy step, but I think it's a very doable step to convert that into this 3D type of data that we'd use. Of course, there would be bugs, but I think we'd be talking about 50% accuracy or something of that sort, not slightly more than 0.2%. And this kind of gets into one of the first questions that I learned about. So, with this paper, I'm not a security expert or privacy expert. My background is computer science. So, I can program. I can do data science. I know a little bit about human behavior. And so when we tried to take this privacy tack, I was like, we very much need to do this, but this isn't my first language. So, okay, got to learn another field. And by that, I mean, the very basics in another field enough to get a paper published. And one of the things that stuck out to me from day one was this question of a threat model. We got a critique on our first paper saying, hey, where's your threat model? And we're like, what's a threat model? And I Google it and put it on YouTube because I'm like, I learn so much from YouTube nowadays. And the top result is the very first lecture of MIT OpenCourseWare Computer Security. And it's like, oh shoot, this was primary. Why didn't I run into this before? But it's a very important idea of who is doing the attack. What's their motivation? What are they trying to get? And what kind of data do they have access to? And so this idea of, hey, you know, it's one thing for me, a researcher running this study to collect this data and identify someone. It's another thing for, you know, you're in VR and you're kind of having some fun and somebody is recording your avatar moving around because maybe it's politically charged or maybe it's content that you'd rather keep private. And that feels a lot more plausible as an attack route. Somebody probably isn't going to have your IP address, so they can't trace you to the Bay Area or the Chicago suburbs. It'll be, oh, they move something like this, and hey, they're already on our 20-person list, so maybe it's this person. Which, again, is a somewhat contrived situation right now. But my job as a researcher is to say, hey, we use this method to get this result. And I do admit I'm not well versed in creating threat models and thinking of good ones and all of that. But somebody else who is a lot better at that can take the work that we've done and say, oh, wow, I see this particular thing happening. I have these privacy concerns. I'm going to take them to people like Joe Jerome that are working in this space. So yeah, my responsibility as a researcher is to say we had this kind of data, we did this kind of process, and we got these results.

[00:36:11.060] Kent Bye: And in your paper here as well, one of the things that's really great is just a lot of references for other stuff that is happening from a variety of different sources and contexts. And so maybe, were there any that really stood out in terms of other research along this line of taking either VR tracking data or other ways that you're looking to see information that's personally identifiable in some ways?

[00:36:34.918] Mark Miller: I mean, I think, did any stand out? I think all of them stand out. I'm working on another paper and I have a short section talking about VR tracking data and privacy. And I still cite all these papers and I'll list them off because, you know, I don't want to have people wade through articles meant for academics. But the three that are kind of the most eye-catching is using tracking data to classify students into students that had been diagnosed with ADHD and not, students that had been diagnosed with autism and not. And this was people that were in the early stages of dementia or not. And they sound very menacing, but I kind of want to pull behind the curtain a little bit and say a little bit about what was tracked in each of these situations. Because I think it's a good example of how tracking data is powerful. So for the ADHD case, what they did was they just measured the total amount of motion, head and hands, and found that almost any way you slice and dice motion, the kids that had ADHD moved more. than kids that didn't. They had a car in the VR experience drive by the virtual schoolhouse. So this was somebody who's in a virtual school. They're trying to learn something. And they had a virtual car drive by, honk its horns, and then drive off. And that would be distracting. So the kids that had ADHD were more likely to look over at the car, which is very, very, I guess you'd say that's subtle. It was certainly clever when I first saw it. I was like, oh, yeah, of course that would be a difference. And so that was the ADHD case. For the children with autism, it was how often they sort of made eye contact or was looking at this virtual person's face, which again, if you know a little bit about autism, that kind of makes sense as a way to measure it. And also, right, if you're in person, signals like that are signals that you might be able to pick up on. But instead of having someone there, you're now the power of technology to really scale up doing these sorts of measurements is where it gets tricky. And then with the early stages of dementia, it was motion planning time. So going from point A to point B through obstacles, how long did that take you? you know, when you're planning this route around a space, how efficient was it? And so, again, these things that you sort of subtly pick up subconsciously if you were in person, you know, do have some actual manifestation in your tracking data. So, those are the three that I like to go to because, you know, they're all medical. They're all things that are protected. And there is this weird area where these things do sort of manifest themselves in everyday life, but we're just sort of legally required. If you sort of catch that in an interview, you have to say, no, I can't let that influence my decisions. And there's sort of this, thousands of papers on implicit bias have shown it's really hard to undo that. But yet, there's this assumption that we're able and we sort of have to. in a sense. And I don't know, it'll be interesting to see how it resolves, right? How psychology thinks about behavior is very different, how law thinks about behavior. And there are reasons for that, but it's interesting to see how this will interplay when it comes to tracking data in VR.

[00:40:04.385] Kent Bye: Yeah, I was on a panel discussion with Jeremy Balanson and Donna Davis earlier in the year. And one of the things that Jeremy went through was this Black Mirror-esque scenario in the future where people are developing VR experiences whose whole purpose is to gather information that could be medical information that's inferred that if you have their identities, you're seeing if someone has a risk for certain medical conditions that then that data is sold to somebody who is selling insurance. So then you could get denied insurance based upon a VR game that you played. But not only that, but your sister who also is like genetically could be predisposed to things that have like a genetic ability. So it could be like things that you're not even doing, but your family member is doing a VR experience that is getting medically extrapolated and inferred information that then you're getting denied health insurance coverage because of some VR experience that was harvesting all this data. This is kind of like the black mirror scenario that Jeremy was painting, but I think it kind of speaks to that data harvesting context where there's all sorts of information. Like we're just at the very beginning. Like we have no idea what type of other things in the future that we're going to be able to extrapolate from this data. And that currently, as we stand right now, there's nothing to prevent any of these companies to just start recording at like 90 hertz per second, like all this information. And on the one hand, they'll say, we need this information in order to improve our technology, like in order to do like all this stuff, whether it's like hand tracking or to get the guardian system to work better. I mean, there's, there's a legitimate trade off. And I think that's what Joe Jerome was struggling with was, what is the threshold where you say, if this is a threat model that has a 90% accuracy, 95%, 99, 99.9, what that threshold is for the risks that are there versus the other benefits that you get from needing to have access to that information to be able to run the technology. My position has always been that yeah, you need it to have to be able to do the technology, but do you really need to be recording it and storing it and saving it? And it's saving it that allows the companies to do machine learning training on it. But at the same time, it's that data that is discarded as de-identified, but yet as some of your research is pointing is actually identifiable, then it sort of reclassifies it. So I feel like we're in this situation now where it's just the very beginning we don't know everything that's going to be determined with this data that if someone saves for five, 10, 20 years, you'd be able to go back and who knows what you're able to do with it. But I think for me, at least I think of this as information that's radiating out and use whatever you need in the moment to make a decision. But let's try not to like record all this data and hoard it because we don't know what kind of risks it's going to have down the road where you couldn't even imagine the type of stuff that you could infer from that leader, but you could be sitting on all this information that could be not only identifying people, but revealing very intimate medical information about them.

[00:43:14.046] Mark Miller: Yeah, yeah. I mean, as a researcher, there's the IRB, the Institutional Review Board that reviews the ethics of the studies that we do. So oftentimes, I don't usually run into trouble because people come into the lab, they do some stuff, they leave, we don't really have medical information. Well, we might now. And there's usually fewer hoops that we jump through. But one of the things that is always asked is, hey, how long are you keeping the information? And that is important, right? The distinction between you have the information forever versus you have the information until next week, that's a really big difference. So yeah, to your point of how powerful is keeping data, I think, yeah, keeping data for longer is more powerful. And I think that's not always obvious. I feel like when I have these discussions about data, it's always, oh, who has access to it and who doesn't? Not, hey, how long do they have access for? So I think that's a good point. I remember thinking, as we were working on this paper, talking about the data availability statement. There's this big thread that I very much like in the social sciences of keeping your data open, keeping your experiments replicable, so people can come back and say, hey, Mark, you actually had this big bug in your code, and it turns out that's a big problem, and holding each other accountable as researchers. And we didn't push it. But there's a fair chance that in this paper where I say, hey, all this data is actually identifiable, we could have released all of that data. And as far as the policies of how we currently work with data, that would have been fine. And I just appreciated that irony. It was funny for about five seconds. And now it's kind of scary. But Yeah, that irony of in the same paper that we say, hey, this is identifying data. We could have had all that data freely available and that could have happened.

[00:45:09.680] Kent Bye: Yeah, it's quite a dilemma in that sense. But yeah, just a few more questions just to kind of start to wrap up here. This interview that we're doing here is in the week ahead of when this paper is actually getting published. Where is it being published and do you expect it to kind of make waves within this line of research as being a novel enough result that may catch the attention of folks that are generally looking at these different types of issues?

[00:45:36.167] Mark Miller: Yeah, so it's published in the journal called Scientific Reports, which is published by the same company as Nature. It's not as prestigious, but we liked it because we've published there before and they're quick with turnarounds, relatively speaking. This is academia after all. But as you mentioned, this is good timing when it comes to Privacy, you just talked with Joe Jerome. There's a lot of concern that is building. So we wanted it out quickly. And I think it's good timing. I mean, personally, I think it's the most convincing paper that tracking data is identifying by default. That's kind of our thesis of the paper, is that you record someone for 20 seconds, you probably will know their height and something else about what they're doing. You don't need to contrive a situation in which you get identifying data. It's there. That's the change that we want to see happen in the discussion, is that this is almost always identifying. Jeremy is hopeful for it. I'm always surprised to see how many people read the work that we put out. It's so easy to get down into the weeds of all these little details and all these little changes. And hey, if we did it this other way, maybe we would have learned more. But it's stuff that the world needs to know. So I'm hopeful that it'll catch wind and we'll see what comes of it. I hope that in five years, I won't be able to share the tracking data as easily.

[00:47:05.635] Kent Bye: Yeah, I know last year at the Greenlight Strategy Conference, I gave a talk on the XR ethical manifesto, where I say in there that as we move forward, we have to pretty much assume that all this data that is coming off is going to be eventually identifiable with the right machine learning algorithm or matched up with other pieces of information from our body. So that's a good place to start. And I think this line of research starts to reinforce that. And that this is good timing in the sense of these discussions that are happening right now, both from as the privacy policies and all the different strategies for how people are treating this data, but also as we move forward, thinking about the legal definitions, because I think that's the other big thing that I got from Joe is that right now, the motion track data wouldn't necessarily be classified as biometric data. But some of this research that you're doing here, even if it's a 95% or even 90%, it could be enough of some of these primitive pieces of information that are pretty identifiable. And I don't know how that plays out. Like, do you have any sense of the larger legal sphere of making those definitions and whether or not research like this starts to change how data are classified from a legal interpretation?

[00:48:25.018] Mark Miller: I would hope it would have some influence. It's an important question whether you can de-identify data or not. Of course, there's other steps involved. What are the risks? What are the basic threat models? What are the situations in which someone might use it? And what are the risks that this data sort of produces? Yeah, I feel this research has a role in that process. Of course, it's not the end-all be-all of, hey, this is identifying, therefore we must protect it at all costs in all situations. But I want the people that are making those decisions to know how identifiable this kind of data can be in the situations. And I think that happens both through maybe I'll be lucky one day and someone will read the paper. But I think what's more likely is our popular conception of this data will change into one where it is identifying. You think of it more as a photograph of yourself or a video recording of yourself.

[00:49:25.078] Kent Bye: than something else, you know, something that's not as abstracted data points that are, you know, no one can make heads or tails of, but given the right, I think that's the thing is that it is a bunch of numbers strung into, like, when you look at it, you can't identify it, but given the right algorithm or machine learning process, it can metaphorically create a picture of who you are kind of thinking of it.

[00:49:47.829] Mark Miller: Exactly, exactly.

[00:49:50.116] Kent Bye: Great. Well, what's next for you in this line of research and as you move forward and finish your fifth or sixth or more years of PhD?

[00:49:58.540] Mark Miller: Yeah. The PhD has been flying by, which is, it's crazy to say it. Everyone's like, oh wow, that's a lot of time, but it flies by. So the next step in this line of research, the thing that keeps coming up that I think I'd like most to learn about is this sort of, hey, if you have a 2D recording of somebody prancing around in VR chat, can you recognize who that person is and under what conditions? What does that use? Under what accuracy? How many data sets? Does that change when someone goes from the Knuckles character to a giant dragon? How do avatars play a role in that? Are there ways to protect yourself? What's the tinfoil hat of tracking data in VR? Yeah, so that question, that's the one I'm kind of most, I just, I don't know how that's going to turn out. Like, is it really effective? I mean, my hunch is that it's fairly effective, but is it, you know, we can kind of get there sometimes? Or is it, you know, hey, you may as well just be prancing around in the room that you're in. You're just that identifiable. So that's kind of the question that I think is sort of most up in the air. There's a couple other sort of variables that I'd like to understand more of. What if you're, as we talked about earlier, doing different tasks? You know, you come in, you watch 360 videos one day, you play baseball another day, you know, or Beat Saber. You know, how do those things compare? What can you learn across those days, right? It's building this case of, hey, it's not just if we know who you are, we know who you are 10 minutes later. It's if we know who you are, we now know who you are five months later doing a completely different game. That's a big jump in scariness. And then, yeah, these strategies, what might you do to either on the computational end, so what can app programmers do? What can the developers of VR applications do to sort of protect data you know, when it goes from your vibe to what gets transmitted maybe over the network, or those sorts of questions. What can computation and what can behavior do to make you less identifiable? That seems like a question I just have no intuition for, and I'll be very happy to learn the answer.

[00:52:11.413] Kent Bye: Yeah, I know that this, this issue came up at the VR privacy summit. I think Diane Hossfeld was talking about the challenge of like just adding random noise to the dataset because the random noise can be pretty well discerned and averaged out in some ways. And so there's ways in which that if you try to add more noise, then it could be ways in which that algorithmically it's just, if you have certain biometric characteristics of your body, then that doesn't lie in some sense. And so. like how do you mask that to some degree without creating artificial differences of how you're even embodied in these virtual worlds? And it kind of ruins your experience because you have this offset that, yeah, it's like how to actually solve it seems to be another challenging issue there where there's not necessarily a clear answer.

[00:52:59.024] Mark Miller: Yeah, it's interesting you mention adding noise because one of the parts that was originally in this paper, but we cut out for space. So big caveat, this isn't actually published. So no other research scientists sort of went over this, but I'm well convinced of this. We had two sort of ways of reducing identifiability. One was adding random noise. And as you point out, over time, that noise is very easy to sort of parse out, especially say if you're adding noise to someone's height. Your height doesn't change that often. That's a good static feature. So if you have 1080 samples of 90 times a second times 20 seconds, you can pretty much find that average, and it's still pretty good. So what we tried that was actually more effective was I call it the platform shoes effect, where instead of adding random noise, you just add an offset. Say you're two inches taller than you were before. And the way that worked in this algorithm, which I think would be extensible to other situations, is the fact that now, you know, this height range, instead of putting you in your own sort of bracket, your own sort of bin, it's now putting all these samples in the bin of someone who's two inches taller than you. Which is often consistent enough that instead of, hey, maybe it's focusing on how you're holding your hands, the algorithm is like, no, their height is consistently this level, which matches this other person. Maybe I'll trust the height tends to be a good measure. So, I'm going to trust the height over the way they're holding their hands. Very much personifying the algorithm here, but that's the way that kind of works. So, It was a neat little finding that did not seem obvious beforehand, but seems somewhat obvious in hindsight. So I offer that as kind of a, hey, maybe this is simpler than we thought, but maybe not. Maybe when everyone does that, now you have the algorithms learned, hey, don't focus on height, focus on how their hands are going. And there's this sort of arms race, so to speak, depending on what's in the dataset and what's in behavior.

[00:55:02.037] Kent Bye: Yeah. And even in applications like VRChat, there is a certain abstraction there where you can be different heights. You're not like your actual height. And so depending on whether or not the experience that you're in is actually accurately recreating that as a default versus having these levels of abstraction of something like, you know, allowing you to be a lot smaller than you actually are. Exactly. Cool. Well, finally, what do you think the ultimate potential of virtual reality might be and what it might be able to enable?

[00:55:32.521] Mark Miller: So that's a good question. I find it difficult to think more than just short term, which is odd for a researcher to say, but I think it's true. And so when Jeremy came out with his most recent book, Experience on Demand, there were a couple of points that I've really liked and really taken to heart. And one is that VR is exceptionally good for short, intense experiences. people aren't going to be doing in VR what they would do in real life for the most part. When you jump on a computer game or something like that, you're doing something that of all the people that I know that play Grand Theft Auto, none of them have actually committed Grand Theft Auto. So you're going to be doing things that you can't do in real life, in particular things that are expensive, dangerous, counterproductive, or just downright impossible. Oftentimes for short periods of time, well, at least as things stand right now, it's difficult to get someone in VR for five hours and not get sick. So maybe this is a little more, a shorter timeline than the grand potential of VR, but those short, intense experiences, we're talking about empathy experiences, we're talking about training, we're talking about Those moments were just, I think the best experience that I've had in VR was actually in VR chat. Me and then just four people that I had randomly met were walking around a maze for a good hour and a half, trying to go through this maze. And I'm talking with this robot, this tall, this animated paint can that just has a mouth and one of the other generic avatars. And we were just goofing around. We're all walking around this maze. When do I do that? I'm not even a big fan of corn mazes or anything like that out here in Illinois. But it was just, hey, I'm doing something very different. That's cool. Think of how many times we go to theme parks or travel to do something different. And VR makes that computational problem rather than a physical problem. And I think that that has a lot of opportunities.

[00:57:47.052] Kent Bye: Great. Is there anything else that's left unsaid that you'd like to say to the broader immersive community?

[00:57:54.078] Mark Miller: I mean, if it's sort of a advice thing, maybe just keep VR human. I don't know. Ultimately, you know, these technologies, they need to be supporting us. And so we should encourage the supportive aspects of it and let them grow and let them flourish. And I mean, ultimately you can't go wrong with that.

[00:58:11.473] Kent Bye: Hmm. Awesome. Well, Mark, I just wanted to thank you for diving into these issues. I think it's a key part of pushing forward this larger discussion, this larger issue. And so I'm glad to be able to have you on and unpack it because I do think it is very timely with all the other things that are going on right now and all these deeper discussions about federal privacy laws and maybe the insights that VR has to these larger issues of privacy can start to give a little more context as we move forward and try to sort it out. Certainly, very complicated for how that may play out, but it's a key part of the discussion. So thanks for pushing it forward and coming on the podcast to help explain it to my audience here. So thank you.

[00:58:51.549] Mark Miller: Absolutely. Yeah. Thanks for having me on and thanks for giving a chance for this work to hopefully reach who needs to reach and, and, uh, have become part of the discussion here.

[00:59:02.070] Kent Bye: So that was Mark Roman Miller. He's a fifth year PhD student at Stanford University, and he worked on a research paper that was just published in Scientific Reports called Personal Identifiability of User Tracking Data During Observation of 360-Degree VR Video. And some of the co-authors on that paper were Fernando Herrera, Hansol Jun, James Landay, and Jeremy Bailenson. So I have a number of different takeaways about this interview is that first of all, well, just the fact that this data is personally identifiable, this motion track data that we don't necessarily consider to be sensitive data, it's just considered to be de-identified at this point. But I think studies like this are starting to show that if you apply the right AI algorithm, then there could be these different immutable characteristics within this data. So your bone length is something that's pretty well set within folks. And so When you're just looking at the six degrees of freedom of the head movement as well as the six degree of freedom of each of the hands That's 18 degrees of freedom movement that you're able to feed into like a AI algorithm that does this ranking just trying to figure out the most reliable different features like the height as one of them and also just like the the rotation of the hands ends up being very personally identifiable and So again, it's not just like you're just watching a 360 video. You're actually at the end choosing what your emotional reaction is to that. And so you have additional movement data from your hands and you have these consistent ways of pushing a button and selecting things. And so just from that data alone of just watching a 360 video with your hands, usually in some sort of resting position, but there could be other idiosyncrasies that you could have there. And there's enough information that can be able to start to extrapolate your identity. Now, to make this more robust, they would want to have people come back and maybe test it in different ways, maybe different contexts, different experiences. I imagine that there's going to be unique motion signatures, but also quite a lot of variance as you do different VR experiences. I mean, what I do, Beat Saber versus Tetris Effect versus watching a 360 video. I mean, that would be a completely different profile. I may be sitting down. So there may be a lot of different variations as you go into different contexts. And so. The thing that Mark said is that if you think about having 511 people and for each of those people you have a question where there's 511 different choices of who that could be, at chance it'd be around 0.2% to be able to identify people, but they're up at like 95%. So you're at a level where you get a pretty confident sense of being able to identify people by being able to extrapolate these underlying features within this motion track data. But it wasn't originally designed as an experiment to be able to identify people. If they were, they would have maybe done some things that were different. But just in talking to Mark, the trajectory is that we're moving towards this future where this type of motion track data should potentially be classified as personally identifiable. I say potentially because I do think it's worth replicating and having other more robustness and different contexts and different algorithms that are thrown at this. But I sense that this type of response is going to hold because there's likely going to be immutable aspects that are able to extrapolate out from this motion track data, especially running your bone length and your size and how you're even holding the controllers. From experience to experience, there's going to be certain aspects where that's going to be just the same. So what's that mean for the larger VR industry? I think a big takeaway is that we shouldn't just be treating all this data as de-identified. And if we are going to be recording it, then there's other implications for what that means, especially when you start to be able to potentially infer additional information from this type of data. And Mark had referred to three different areas that you could potentially start to extrapolate additional information from user track data, whether it's conditions of ADHD, autism, or dementia. And so he mentioned some of those studies, and I'll just read them off quickly here. Schiprozo in 2004, In terms of ADHD, he was able to put a paper out called Diagnosing Attention Disorders in a Virtual Reality Classroom. In 2013, Gerald had a paper called Social Attention in a Virtual Public Speaking Task in Higher Functioning Children with Autism. And then in 2011, Cherniak had a paper called Not Just Fun and Games, Applications of Virtual Reality in Identification and Rehabilitation of Cognitive Disorders of the Elderly. So again, these three papers are being able to extrapolate additional medical information from folks above and beyond their identities. And so once you tie the identities and you have different medical conditions, then you start to get into all sorts of other different implications. So in the future, Mark's going to try to look at things like trying to record folks in an experience like VR chat, where there's quite a lot of variance of height and that height's not necessarily connected to your actual height, but there could be ways to be able to look at some of these immutable characteristics and extrapolate that from these characters. Um, I think there's millions of different avatars within VR chat and, you know, it may be a little bit more difficult to crack somebody's identity, especially if you're only. looking it from the outside and you have to be able to start to translate all the different degrees of freedom that someone has within a VR experience that then you're able to put into an algorithm and approach like this. There's also emotes and other things as well. That sounds like, you know, some of the different ways to at least address this approach that they have is to just have offsets, height offsets and offsets from your hands, which seems to be a way to potentially throw a lot of disruption across different session times to be able to have different enough variants that you would not necessarily know. One of the challenges of adding noise like that, though, he said that you can kind of average it out. If you look at someone's playtime over a certain amount of time, you might be able to extrapolate that because there's a certain degree in which that whatever actions that you're taking within a VR experience still has to have different colliders and different spatial relationships that may remain consistent among your head and your different hands. And you don't want to break the experience to the point where you're giving someone a bad experience just to be able to protect their identity. So there's different trade-offs there as well. So again, I think this is a line of research where as we move forward, it's going to potentially still continue to go down this road of having more and more robustness when it comes to different algorithms to be able to extrapolate somebody's identity. But also just there could be implications for how people treat this data. And I think the underlying thing is if you don't need the data, then don't record it. Cause if you do record it, then you may be unwittingly capturing all sorts of additional information that could be inferred from this raw motion track data. So that's all that I have for today, and I just wanted to thank you for listening to the Voices of VR podcast. And if you enjoy the podcast, then please do spread the word, tell your friends, and consider becoming a member of the Patreon. This is a listener-supported podcast, and so I do rely upon donations from people like yourself in order to continue to bring you this coverage. So you can become a member and donate today at patreon.com slash Voices of VR. Thanks for listening.

More from this show