 Hello everyone my name is Christian Sandvig and I'm delighted to welcome you to the Berkman Center lunch talk. I'm going to be introducing myself and the group here before we begin. I'm an associate professor at the School of Information and the Department of Communication Studies at the University of Michigan and I'm here visiting with my esteemed colleagues Carrie Cara Halios and Cedric Langbort. Carrie is associate professor of computer science at the University of Illinois and Cedric is associate professor at the Coordinated Science Laboratory also at the University of Illinois and both of them are co-directors of the Center for People and Infrastructures at Illinois. So okay so today's talk will be done jointly by the three of us actually sequentially by three of us and our title is Uncovering Algorithms. To get started I'd like to take you back to the 1960s. Back in the 1960s there was a computer system that was incredibly significant in the history of computing. Some of you may know it. It's called the Saber system. Anyone know the Saber system? The Saber system was one of the first large-scale applications of computing commercially. So at the time after it was built it was the largest computer network that was non-governmental in the world and its purpose was airline reservations. So the Saber system was actually built by American Airlines and the American Airlines had the idea that rather than using this really interesting paper and pencil and Rolodeck system they used to use for reserving seats on planes, computers might be able to handle the immense logistical challenge of reserving all the planes in American Airlines fleet. So they distributed terminals. The terminals were to be used by ticket agents and then later travel agents and they crucially allowed people to reserve flights other than American Airlines. So you might know Saber because it's now still exists and it's the engine behind sites like Travelocity or Expedia. So this is the Saber system. Now American Airlines that paid for the Saber system was at the time led by the somewhat controversial CEO named Bob Crandall. And Bob Crandall, this talk isn't about price fixing. This is just the picture of Bob Crandall that I found although he does look slightly guilty. I imagine that he provided this picture to the New York Times but he looks slightly guilty and he was a controversial figure because users of the Saber system actually found that it seemed to privilege American airline flights over other airline flights. And at first this was just a suspicion among the competitors to American but later it became somewhat obvious that this was happening and it launched a famous antitrust investigation against American Airlines. Now Bob Crandall built the Saber system and one of the interesting things about it is when he testified before Congress about what he was doing, he gave this quote which introduces our topic today. He said the preferential display of our flights and the corresponding increase in our market share is the competitive raison d'etre for having created the Saber system in the first place. And so rather than going in front of Congress and saying well it's just an unbiased search system, he said of course it's rigged. Why wouldn't we rig it? Why would anyone invest so much money in an internet platform and not rig it? So we might call this the Crandall theory of algorithmically curated material. And the Crandall theory would be of course it's rigged. Why wouldn't it be rigged? Why would we spend so much money on it anyway? Of course it's going to advantage our interests sometimes perhaps over the interests of the people looking for flights. He actually pioneered, HCI fans are interested perhaps, he pioneered this idea of a unit at American that he called Screen Science. And these people were tasked with manipulating the order of results on the Saber system so as to increase profits. So let's move forward then from the 1960s to today. You know at the time 1960, you know this was a pioneering computer system but now we live in an online world that's totally awash in algorithmically curated content from the search results that we get from Google and Bing to social media sites where our newsfeeds are curated. And we often have some uncertainty about what exactly those algorithms are doing. And so our talk today is to address this sort of rising I would say a chorus of scholarship that says that algorithmically curated material is important and there might be some reason that we need to know how the algorithms work even if we don't work at the algorithm providers. And these reasons could be legitimate. So in Crandall's case, some have argued that he was breaking the antitrust law. So that would be a legitimate reason to know how the Saber algorithm worked. But it doesn't have to be illegal. You might just be a user of these algorithms and in order to make an informed decision about what system you want to use, you want to know what they're doing. Maybe you want to know what they're doing with their personal information. Maybe you're a competitor and so on. So there might be a legitimate reasons for you to want to know inside the algorithm even though you don't own the algorithm itself. So this picture is actually from a great story in Read White, it used to be ReadWrite Web, about Facebook's system of likes. As some of you may have read in the background material that's on the link for today, the Facebook newsfeed has some interesting features and one of them is that when people like things on the newsfeed, sometimes those likes are repurposed and attached to advertisements that are shown to their friends. And so one of the interesting things about this story is that the journalist who wrote the story did a great job describing how people started to notice what we might call implausible like relationships. Like, oh, my friend, the rabid vegetarian likes McDonald's, my friend, the Marxist thinks that Facebook is great, this is strange. And then so they actually embarked on a process where they did things like some people created fake Facebook accounts and tinkered around to try and see if they could reproduce this behavior. Other people just talked to their friends and said, can anyone send me any pictures of things that Facebook says I like to try and figure out how the algorithm works? And so this is really our topic today is that you might be using a system that's algorithmically curated and you might want to know how the algorithm is working, but it's kind of a tricky thing to figure that out because we all have a personalized experience with these algorithms. And when I say a chorus of scholars, I just want to briefly mention that the name, some of them in the audience today, I mean, people have been writing about this like Gillespie, Nissenbaum, Zitrain, Barocas, Pasquale. I mean, it's really a large number of people are saying this is important stuff and we need to do something about it, but what is it that we're going to do is not always that clear. And so our topic is sort of the next step. So for the remainder of the talk, I'll address these questions with my colleagues. The first question is how can research on algorithms proceed without access to the algorithm? Is that just impossible? Should we give up? What should we do about it? How might we address this problem? It's a very difficult problem. And then for the remainder of the talk, we'll actually present some initial results from our attempt at trying to look into the Facebook newsfeed algorithm a little bit crudely. And we're going to ask questions like what is the algorithm doing for a particular person? How can we usefully visualize what it's doing, not visualize the newsfeed, but visualize the algorithm, which is a different problem. And how do people make sense of it? So our proposal in a nutshell relies heavily on an idea of a question, a quick question, a clarification question. Okay, an algorithm, a good word to replace the word algorithm is recipe. So an algorithm is a series of steps. Yeah, so that's, I think, is that my computer science colleague nods? Alright, so in other words, how does Google decide what search results to give you when you type in a query? There's a series of steps encoded in software that produce a result for you. I mean, technically, we might have a more formal definition, but but let's not. Okay, so social science audit. So the answer to this problem of what to do relies on this methodology called the social science audit. I said social science audit because the word audit might conjure financial audit. And that is similar. But the social science audit is a very famous methodology in the social sciences that was originally pioneered in the housing department in the United States to detect racial discrimination. And the idea of the audit methodology is that you send testers, they call them in audit methodology to see what actually is happening in a particular social situation. So in the case of the housing audits, they would send testers to try to do things like buy houses and rent apartments to see if they were discriminated against. And so this is our metaphorical comparison to what we'd like to do with audits. A famous audit study that was in the news recently that you might have heard about is this study published in APS where a psychologist and people at a business school actually did an audit of professors. I'll jump to the New York Times headline because it's a better headline. They found professors are prejudice. They basically sent professors requests for meeting that were identical. And then they varied the names of the person requesting the meeting. And they varied them according to, for example, the US Census has a list of the names that are in common use and it associates them with different genders and racial and ethnic groups. And so if you vary the name on the message, you could create an association between a gender and a racial and ethnic group. And they found that if you were a woman or if you were a member of a racial minority, you were less likely to get a response from your professor when you asked for an appointment. So that's a standard example of the social science audit methodology. So our proposal in this talk is that we should have algorithm audits. And I'll describe what exactly I mean by that. But, yeah. So our proposal is the algorithm audit. And what I'm going to do is I'm going to actually give you five example research designs of how you might do an algorithm audit. And these line up with other things that people have proposed. And so what I'm going to do a little bit is introduce them. But I'm also going to say why I think one is a little better than the other. And then we'll move on to the second part of the talk. We'll say what we're going to do. Our example is Facebook. But this actually is a broad topic. So Facebook is not really the overall point. And it could be any platform that has algorithmically curated content. So actually when I tweeted about this talk earlier today, someone who may be in this room tweeted back and they say, you should just get them to tell you what the algorithm is. And that would solve the problem. And in fact, that's been proposed by many authors. Piskwally is the most associated with this idea. The thing about that, though, is that it seems like a great idea at first. But then when you think about a little more, what would you actually do with that information? So for example, let's say that I had in this pocket the Google search results algorithm. Like what would that look like exactly? Like first of all, it changes all the time. If it were in code, it would be quite lengthy. We might agree on what certain parts of the algorithm are doing. But we wouldn't necessarily be able to use the algorithm to predict a particular outcome of a particular instance unless we had data about what kinds of things were fed into it. It's not really clear that reading the algorithms is going to be super useful. There are some instances where it might be useful, where there's some process question we might want to settle. But generally, having the algorithm public is not necessarily the answer to our concerns. You can see this, in fact, with platforms like Reddit. There are some platforms that have the algorithm public, like people who submitted to the, who won the Netflix prize, described their algorithm. People, Reddit has an open source platform, and so most of the algorithm is public, except for a part called vote fuzzing. The thing about making it public as well is that publicizing the algorithm might help people that we don't want to help, because many platforms have associated groups of near-duels, spammers, hackers, people who are trying to game the system. And so if it were comprehensible, we might not want to release it publicly because people might then use that for nefarious purposes. The second way we might do it is to ask the users themselves about how the algorithm works. Now, you could think about this. Does anyone remember consumer reports? They're still around, although they're not the cultural force they once were. But consumer reports, you know, they used to send surveys, for example, to car owners, and it would say, did your car break down? When did it break down? And then they would gather these surveys and issue reliability reports for cars. And so we might think of an interesting algorithmic investigation that would be an audit that would find users and ask them questions. And this has some great advantages. One advantage is that, metaphysically speaking, it might be really important to know what the users think the algorithm is doing. That might be more important than actually knowing what the algorithm is doing, because you could imagine the users modifying their behavior based on what they thought the algorithm is doing. And this could produce a very different overall system, depending on the thoughts that they had about the algorithm. So it has some advantages of finding out what they think. But then the disadvantage would be in a platform like Facebook, like the newsfeed, asking people a bunch of really detailed questions if we wanted to do a large statistical analysis. I mean, how would that work? You have a lot of sort of problem with remembering. Like, you know, seven days ago, did your newsfeed contain these four words more frequently? It's just not something that you can ask the users. So a third, a third approach that some have taken is scrape everything. This is an 1875 painting called floor scrapers. A third approach is scrape everything. I'm like, gee, it's cut off. Anyway. And this approach, you would have some sort of programmatic interaction with the platform in order to figure out what the algorithm was doing. But then you run into other problems, and one problem is that the platform might not like it. And currently, a lot of our interactions with platforms via APIs are at the platform's discretion. And so there's an adversarial relationship in an audit study, and it's difficult to know how you would manage that programatically. You also have a problem that in the United States, the Computer Fraud and Abuse Act makes scraping things against the will of the platform very problematic. And so when we were interested in doing the study we'll present later, we saw legal advice, and the legal advice told us don't do it because of the Computer Fraud and Abuse Act. The fourth out of fifth approach would be sock puppets. Sock puppet is internet slang for false accounts that are manipulated by someone else. So a nice thing about the sock puppets is it really mirrors the audit methodology classically to detect discrimination, for example, because the sock puppet could be a user account that the researcher inserts and then does something. So there's some intervention. Like let's see if I search for this over and over again if the personalization builds up and then I get different results. So the sock puppet has this nice way that you can control sort of what's happening and learn more. But again you have the problem of the Computer Fraud and Abuse Act and it's not clear that that's super ethical because you're inserting a bunch of fake data into the platform, right? It might be trivial but still it doesn't seem necessarily the wisest course. And finally the approach that we're actually advocating and going to talk about next is the collaborative audit. And that would combine some features from many parts of the things that we talked about before. The collaborative audit you could think of a little bit as this site biddingfortravel.com which is obscure. Has anyone heard of biddingfortravel.com? So basically it's a user community where they all got together and decide we all like to book hotel rooms on price line. What if we all exchange information with each other about what kinds of bids we're putting on price line? And then we see by comparing bids and results if we can come to a collective understanding of how price line works. So that's what bidding for travel is. You can really make a good deal on price line if you read bidding for travel. But for our purposes it's got some interesting features because you've got the users working together. It's not clear that it's not ethical because they're using the system as they normally would. At the same time they're interested in figuring out how the algorithm works. They're exchanging information. Now this is just a form but our idea for the collaborative audit would be there would be some software assisted way that you could organize a large group of users and learn from them. So on that note I'm going to pass the the mic to my colleague Professor Cara Halios and she'll talk about our study. Can you hear me? I think I'm on. Okay. So we began by sitting in a room together playing with a Facebook API. Seeing what we can get. Can we get what post people see or appear on their news feeds? Can we get what people like? What comments they like? Our goal was to better understand the algorithm ourselves and to create some sort of visualization so that participants in a study might be able to better understand the elements of their algorithms presented in the form of the Facebook news feed. So we gathered lots and lots of data. We were trying to figure out what was going on and we found it was really hard because we made a Facebook group page and it turns out that if you look at the how many people see it four of us saw it it would say that one of us saw it. So we were discovering discrepancies in the Facebook servers when they synced. How can you collect reliable data? So we spent a lot of time assessing what it is to get reliable Facebook data. Essentially we wanted to use this data to visualize the consequences of an algorithm. We did some pilot studies and casually in these users studies we found that some graduate students even in the CS department who routinely used Facebook were unaware that their feeds were filtered. So we were showing random visualizations and we just assumed that people knew about the algorithm. We were wrong. We had to take a huge step back and remove the complexity, remove all the likes and comments and all of that and just go back to the bare bones and create a simple series of visualizations to tell a story about the algorithm to take users on a visual journey that explores and reveals slowly parts of the algorithm to them. So there is some precedent for this. If anyone's familiar with the work of Kevin Lynch he studied invisible processes in support of design. So in the mid-1900s it was way finding studies. He explored how individuals perceive and navigate the urban landscape and what about, he looked at what about a city allows easier perception and more accurate mental maps for the dweller. And his work actually was incorporated into practice for better urban design. So we borrowed from him and the approach that we came up with was an interview survey approach similar to what Kevin Lynch did. We then created something that we're calling a prompt that I will explain shortly that exposes some of these hidden algorithms to users using the Facebook API. And finally the idea is to work with many users together to help them personally and as a collective understand what's happening. And today I'm going to present these first initial steps as a work in progress. So what did we do? We brought 40 people into the lab. We did a simple pre-interview to understand their Facebook usage. We showed them our prompt and again I'll explain that in a little bit. And then we did a follow-up interview. With these 40 people we spoke to them anywhere between an hour and a half and three hours. We found people really wanted to talk about their experience using the newsfeed. The study ran from November of 2013 to April of 2014 with the follow-up in June. Unlike most of the computer science studies I've done in the past where we get about 40 CS students or anyone we can find down the hall. We went to great efforts to try to recruit a good representative sample as good as we could get. In this pre-interview process we collected demographic info like I said. We discussed Facebook practice. I didn't know that people would just glance at the newsfeed, then go into people's individual timelines and back and forth. A lot of this was new to me. We got people's general Facebook beliefs and then after that we showed them our prompt which I'm going to describe shortly. Feed Viz is the prompt that takes you on this visual journey. It's a set of four panels that emphasize the content and the people that are reflected in your Facebook newsfeed. So I'm going to start by showing you the first panel. So like I said it's a Facebook web app that uses FQL and the API to extract posts and other features from your Facebook feed. What you're looking at here on the left is all the posts that were posted by all of the friends on your specific network in this case this is my network actually. On the right are the posts that appear on my news feed. And you can't see this very well in this projector but the ones that are both are colored the same way. So for example I did not see that Cliff Lampey liked to post but I did see that James Landay wrote VPN for China working check. One of the first things that people noticed when they saw this is just how long this left column is compared to the right column. It's huge. It's huge. You're scrolling and scrolling and scrolling you're not getting to the end and we only showed about a week's worth of data. I can tell you that I don't remember the first time I realized there was an algorithm behind the news feed. I just don't remember. Some people did for most of our subjects this view was the first time they're even aware of the existence of an algorithm. More so they had no idea how the algorithm affected their use yet almost everyone was very eager to talk to us start a conversation about it and probe parts of the feed exposed by the tool. Just to give you a glimpse of our subject pool it turns out that 37 percent of the people roughly were aware that there was an algorithm before participating in our study and 62 percent were not aware. So I showed you the content panel. The next step we took them to was on a people panel. This is a person view. And the idea for this one was to make sense of why people appear on your news feed and try to make sense why some folks are hidden. So for example what you're looking at here on the left are three bars representing three people whose posts are completely hidden from you. You don't see their posts. On the right are three people whose posts you see all of. And in the middle you get this hybrid mix of posts that you see and you don't see. So for example I saw all of Justine Cassell's posts for the week. Jennifer Chase posted two things. I saw one of them but I did not see the other. And Jim Foley posted a message that I think I really wanted to see but I don't know because I didn't see it. And so here you can refresh on the screen and you can keep seeing like different people and where they appear in this sort of histogram of whose posts you see and whose posts you don't see. And one of the things we found as people were exploring this was that people get really, really upset when family members, loved ones, appeared in the left two spots and not in the right column. As you keep refreshing these lists kept getting longer and longer. So you can see this full list of everyone's posts that you where you see all of them. And you see that everything these people wrote here was completely hidden from you using this interface. So aggregating these lists we didn't wanted to see what would people change if they could change it. So the next view they saw was a modification view where you could move people around. You could put them from the rarely seen category to the mostly seen or the sometimes seen category. And similarly we then showed people, this is a people view, we then showed them a separate modification content view. So for example here on the left you're seeing posts that were on your newsfeed. On the right are posts that were not on your newsfeed. So for example I did not see that that hers wished Lena a happy birthday. I didn't see that Eric wished his dad a happy birthday. I didn't see that Daveris wished somebody else a happy birthday. But I did see these two images. What we did is we asked people then to check on them. Like if you hadn't seen them would you have wanted to see that? In this case if you had seen it is there something that you would have been happy living without. And so I'm going to move along to talk a little bit about levels of awareness because throughout this probe this prompt that I'm talking about participants were talking to us they were discussing their thoughts, they were giving their own interpretations and theories like creating a model of how the algorithm works on the fly as they were being shown this information. And I'm going to briefly talk about their paths to awareness. Like I said some knew before and some of the comments they gave to us were along these lines. You know but some people have I mean 250 friends but some people have a thousand and if computer wants to show you everything there's no way it could. It crashes your computer. Plus like the phone newsfeed is different from the laptop newsfeed sometimes I look at something in my phone but when I go online there's a ton more. So basically the comments focus in three different areas. People felt that things were filtered just because of necessity. If you have 2,000 friends you just can't show everything. Two comparisons. Sometimes people look at the timelines and they look at the newsfeed and they can see the discrepancies or they look at one platform of the phone and one platform of the computer and they notice the discrepancies there. And the third way that they mostly found out was through blogs or you know the girlfriends came up a lot as explaining to boyfriends why something happened. I read this a while a year or two ago or that all of these keys of things that you have to do to get into people's newsfeed. Of the folks that were unaware these are some of the comments they gave us. I bet it would be on my newsfeed. I probably would catch it at some point during the day. And probably I just don't scroll down enough. People kept thinking that it was because of their own behavior or their own lack of investment that they didn't see many of these posts. I don't know for sure because I know there are some friends I guess. It seems like I don't see anything by them very often. Maybe they just stopped posting. But it didn't occur to them that something was happening behind the scenes. Exploring this data from over 40 people we found some awareness factors that led to somebody knowing about the algorithm. We thought that membership duration would affect it. It did not. We thought that the scene content versus the total content if you only saw a small amount would affect it it did not. We thought that if you had a huge network it would affect it because then you have like all these people and you might not see some of it. It wasn't a predominant factor. What did affect knowledge of the algorithm was usage frequency, how often you used Facebook, activity levels if you are a heavy poster versus a light poster or listener. If you'd ever created a Facebook page or a group because you get these analytic pages that summarize how many people go over the pages for you. And finally if you did use top recent stories and switched between the feeds or if you blocked people or hit posts. Some of the reactions that we got to FeedVis ranged from people describing folk theories, conspiracy theories, one person quit Facebook on the spot. But most people learned over time, over the course of our visual narrative to understand why it was happening and it made sense to them. So for most people we got these initial surprise shocks. So do they actually hide these things for me? Like hey it's kind of intense because you see the movie, the matrix, it's kind of looking up in the matrix in a way. I mean you have what you think as your reality of like what they choose to show you or just what the hell Facebook. Again I want to stress that these were initial reactions because very quickly after that people started saying so there are some algorithms, something or some rules to choose. Why things that appear to me? It's very interesting. I never knew that Facebook really hides something. So are they doing some kind of mining or machine learning? And again I mentioned folk theories. People came up with very, very clever and very likely probable features behind the algorithm. They discussed likes, comments, looking at the timeline. Like if I look at somebody's timeline I'll see more messages from them. If I like something they wrote I'll see more messages from them. If I communicate with them via the inbox I'll see more from them. We call these interactions via clicking. So they were really good at grasping like clicking interactions. Someone further to stress that I like politics. I read blogs about politics that's why I get politic information and they were moving even outside of Facebook saying that I read this blog hence I'm getting these posts on Facebook. So this idea of topics came up a lot. Looking at the other content visualizations and feedback from that in the shot where you see posts, where you see content that you don't see someone actually was like I wish I had seen this because I think she needs support for that. I think she needs support. If I see it then I will say something. But she was pointing to a comment that she had not seen and she wanted to support this person and wished she had seen it. In the people view somebody said for now I cannot really understand how they categorize these people. Actually this is my brother and actually he needs to be here. I want to see him. I need to see everything my brother says. In the screen where people move things around in this case move people around in this case move content around. We found that people, most people manipulated people from these different views. They moved family into the mostly seen category. It turns out in the content view people didn't change very much at all. They refused like people that changed a lot of things but for the most part it turns out that people were quite content with what happened with the content that appeared on their news feed. So we completed this prompt and then this past June we sent some follow-up questions to people just to see how they behave on Facebook now and if anything had changed. And we found that 29 of the 40 people replied to our follow-up and we found that after learning about the algorithm people changed their behavior. Specifically they changed their reading behavior. The interaction people were a bit more stingy with their legs and careful about what they liked. People switched between the top stories and the most recent post features quite a bit more. And they started hiding posts, they started blocking things. They started being a bit more careful about what they posted and people started unfriending people more than they had done in the past. One quote from this was after our discussion actually went back and started experimenting a little with the news feed and discussing with some friends and ways to streamline what I was receiving. Since then I've become more interested in checking my Facebook because it does not seem as cluttered with random information I have no interest in. So people are actually getting more and more involved in playing with it and probing give it further on their own and using Facebook as a platform for a prompt. And I am now going to pass the baton to Cedric. I've been given the hard task to summarize everything and try to tell you where we want to move from this. What we what Kerry summarized for you showed that actually being a giving people an insight into not necessarily what's under the hood but basically allowing people to test drive algorithm actually matters. And that's what that's one of the points we're making for the value of algorithmic audit. It does it actually had more of an impact that what we thought it would in the sense that as Kerry said some people just stopped called Turkey or came up with their own theory but what do we want to and what do we want to go from from here now. One of the things we're working on now is to actually scale feed these feed these up so that it can be used by more than 40 people and actually be widely available. And then based on all the data that we want to gather actually perform the type of audit that Christian was talking about. And in some sense if you like we want to machine learn the machine learning algorithm. The issue of course being there that as opposed to Facebook or Google or wherever you want the data isn't centralized the data is distributed among the users. And so we have to take care of our own algorithmic problems in terms of preserving privacy distributing data and and sharing information among the users. But at the end of the day instead of a whole bunch of individual folk theories we wouldn't want to arrive as a communal common view of about what we think the new speed is doing what based on all the data that people have gathered. As what that matter also and we've been trying to scratch our head as to whether that's good or bad is this idea of actually given giving users insight into why they're getting the data they're getting is something that's gaining wider acceptance. You are probably aware of the fact that there's this feature in Google that whenever you're performing a search you can click on this little yellow ad thing it would tell you you're getting this ad for such and such reason and very recently Facebook announced that they're rolling out a similar feature. We did see again through Kerry study that Sufid V is that this may this has can play an important role. The question is does it matter what the motive is. It's been discussed that maybe Facebook has ulterior motives. They may want to go back and correct this prediction so that they get more more data. We think that it might actually matter what the motive is and but there isn't nonetheless we think there is value in showing people definitely why they're getting the the recommendation or the ad that they're getting and again that's the idea that was behind 50s. So to conclude really one and with a question and then punishment and a call to arms. The question which had been underlying this study and really came out of not quite the audit because we're still on our way towards performing the type of distributed that we want but even putting a prompt into new speed in this case is this question of what do you what do users really need to know about algorithm. I think we hopefully convinced you that there is value in telling you something about this algorithm. Again we're not it seems that you may not have to be able to necessarily open the hood and look under the hood. Maybe test driving it is good enough but what kind of test drive do you want to do and again that that you could see that in the wide range of reaction that Kerry got at that that was a question. The admonishment is that again transparency alone is not enough and may not even be feasible. And we believe that another might actually be more valuable than and more feasible than really just opening the hood and letting you peer into the specifics of the algorithm and to do that we need to create an infrastructure for our disability not transparency but auditability and that's really a wide that's a wide ranging project. But there are some obstacles to our disability right now. Again at the first of all at the legal level as Christian mentioned it may sometimes even auditing depending on how you perform the odd the audit may be illegal or may run against some of the user's agreement of the platform you're trying to audit but that may be a great place not only to extend the law but also to extend maybe even how we give access to API's and also that may be a great place where non-profit or even maybe a government agency might actually do a lot of good and valuable work and I'll end with that and thank you for your attention. So so we didn't arrange in advance how we would do Q&A who wants to run Q&A? We're all gonna run me alright well yeah well you don't you don't have to stand up this final. So we're happy to take your questions and we need you to speak into the mic because of the the live stream which I forgot to mention actually during the intro this is being live streamed and so anything you say is public. Yeah my question is not about your conclusions about how you did this in the first place in order to show those screens it seems like you must have access to some sort of totally unfiltered raw feed that has everything does that mean anybody can do that and how did you get that and if you did it without the cooperation of Facebook how did you especially how did you do it? Every every single person came into the lab they signed a consent form we explained the study to them they logged into their Facebook account and used our Facebook application so by using our Facebook application the API allows for calls where you can get at everything that their friends post so this is a this is an application that they used to once they logged out you know it it was gone so we could only see the data while we were standing next to them and they were using the application with themselves signed in that mean any of us could write a similar app and basically see your raw feed anytime you want you could see your raw feed if you wrote an app correct yeah and actually we hope to release our tool in which case you could use our tool to see your raw feed but definitely not well that's an important point is that right now Facebook is coming to you embedded with algorithms and filter filters that you didn't choose but that doesn't have to be this way you can you can pick your filter you can make your yeah I mean another great point is they're probably listening so Facebook don't turn it off like do we want auditability to be dependent on their their saying it's okay for us to do this because they can change the API tomorrow right that's right Dan you cut that out no I'm just kidding where's the mic okay I actually have a similar question about the what is actually the all in the all right like presumably it doesn't include things that have been caught by Facebook spam filter whatever that entails presumably it doesn't include things that people have reported or marked a spam that have been pulled out of the all feed and I guess I'm wondering I mean so if top news is known to be a subset of the all we still don't really know what all is a subset of and I wonder if you think that your collaborative auto approach is able to answer that sort of question or if that's kind of an entirely different black box that we need to tackle through a different approach so I can take them those questions are excellent and we were struggling with that as well so they all approach more closely resembles the most recent feed and so we took for someone like me we took about a week's worth of weeks worth of news feed posts and it's it's quite long it's it's longer than you could actually probably sit and scroll through which is why the studies probably went over three hours and people wanted to sit there longer but it's more of a timeline approach that you get so if I wanted to get the top stories feed with the API today I cannot get that so we use the most recent we use the most recent so I hope that answers part of your question I know there was a part two but I forgot the part two I think it was the the spam and the marked content and is that Facebook is not in the all yeah I don't I don't know we're going to be able to get that right I but it's it's a great question you know it'd be neat so I also have a related question on the other side which is I mean in some ways Facebook is more open than most algorithms you see I mean even if their API goes away you could probably painstakingly redo this by taking all the names that someone is friends with and then individually going down what every single thing that person posted because it's on their page to create the all to create the all so you could recreate it without even the API but most algorithms are not like that I mean most of the things that we see that are algorithmically filtered there's there is no accessible version of all at the beginning you said that you saw this as being not specifically about Facebook but about all kinds of things we deal with with algorithmic filters from Google to Pandora to whatever I put into Pandora because their algorithm bugs me but how do you see applying your technique or your approach when you don't have any kind of access to something resembling all so we know we chose specific features to look at so if we were to choose Twitter you know we might choose some other features as well or maybe have a subset of all or more this idea of all on Twitter is just huge you know I don't have I think it costs right now like a million dollars to get a year to get like all of Twitter and you can get a subset you can get one percent but we chose specific features and I think for different tools or for different social media sites or different news feeds we could cater specific features of the site to expose we might not necessarily have to choose this specific one but there are other features you might expose that might give you an idea that an algorithm is you know is lying under the hood. I mean I think Kerry's point is partly my reaction is that agreement I think but it's like Facebook's all versus filtered is not really a limitation of the collaborative audit approach it's just what Facebook does because you know Twitter changes the font size on tweets that are looked at a lot but it doesn't have the filter that works in the same way except for search results but you could do the same collaborative audit because it just involves testing and looking at the results and comparing information in its most general form and so you actually have communities that already exist around many of the algorithms that we use every day that are trying to do this. So your example Judith was Google and I mean there you have Google has this big SEO community that tries to you know think about how the algorithm works by querying it over and over again and so our proposal is that if you organize this behavior a little more because right now it's often on these forums where people are just like yeah I tried that 50 times and I still got this you know if it were programmatic and organized maybe by mechanical Turk or some other interface or some something that allows you with data I think that you could actually learn a lot it might not be exactly these things but that wouldn't be because the collaborative audit wouldn't work in just because the algorithm is doing a different thing. So just quickly what would you call the algorithm on Twitter because it seems relatively non algorithmic. Maybe I'm missing relative to Facebook. Well I mean Tarleton's piece about Twitter says that the most relevant one is the trending algorithm. Okay yeah it seems so irrelevant in some way sort of space but they also I mean their search results are pretty heavy handed aren't they. I mean you search results in the all versus top. That's a similar thing to the Facebook right. Thanks. So I wonder I mean I'm hugely sympathetic to the questions you guys are trying to answer and I think they're fascinating. I wonder if part of what is coming up in the questions that Chris and Judith had. It strikes me that the metaphor of the audit is a little tricky because in some ways using the API it's as much sort of critical technical practice right where it's you have access to the information many reviewers don't know how to do that right they're not writing API interfaces themselves but you can do that and create a different access to the information and then do this kind of like contrast right sort of recognize that which feels a little different than the idea like the classic one where you just send in moles it's more of a sock puppet approach right you send in someone and go do the thing they're supposed to be able to do and get information back sort of on the ground. So part of it I think is this question of just you know being able to say like sometimes this can depend on the API sometimes there won't be an API or an API won't give you the kind of information you have so you have to play it differently but I was wondering if if the suggestion you were making about taking feed this and expanding it past you know the 40 students who bring into the room but being able to make it available on a wide scale was there also an idea that if you did that and then the the people who are playing with it were then telling you information by their choices right so like boy I don't care about the people I don't care about moving the mostly seen off what I do care about is moving the never seen on or vice versa right you might start to begin to have data on that is that part of the impulse so then the truly collaborative audit is you set this thing into the wild and then many many people begin to not only learn that there's an algorithm and learn something about how it works but then say here the impulses I would have about fixing the algorithm and I wanted also about well that's part of it so we do have the data for the 40 people were actually analyzing in more depth this is a work in progress I mean there are some interesting tools out there that also help expose people more in the wild there's this wonderful game by watchdog where it takes your Facebook feed and it actually the characters around are inspired by characteristics so if there's somebody that you don't speak to very often you know they might be it might claim that you're suspicious of them and so the features on Facebook actually affect the behavior in the game itself and that's another way to think of getting at awareness but one of the things that we learn doing the study is that you don't know what to expect so well we do want to get it out of the wild I think it's going to take a few trials and a few pilots to see what people want to do you know we've done on other similar types of audits and mechanical torque and we found that people just don't want to fill it out and so it's a trial and error piece that's an excellent question I mean I think you know generally speaking I'm I just want to say I'm pretty delighted by these results I think that they're actually really interesting and they're really give me a lot of hope because the transparency stuff that was written about algorithms I just didn't see how it was going to work and so it didn't leave me feeling like there was anything that we could do and yet we have like really important societal problems I mean maybe Bob Crandall wasn't enough I mean I should have mentioned like Latanya Sweeney's work about racism and ad placement or something like that right I mean so we have these really important problems that we would like to address and we don't really know how and so I'm actually quite encouraged and I like the audit app metaphor a lot even though I admit as you pointed out it's not exactly the same as the classic audit but I like it in part because we could ask who does audits now so housing audits are still performed and they're actually done by activists off sometimes in collaboration with researchers who help with the statistics they're done in lawsuits and so in some ways this is like a blueprint of what civil society might do to address this algorithmic structure that people are increasingly concerned about so you're right totally a fair cop it's not a perfect parallel but actually like the idea of testing over and over again whether it's a sock puppet or a real person I also like the idea of collaborating with the people partly because that gets around to the legal problems but also because just generally that seems like what we ought to be doing is working with people to help them understand what's happening so I don't think I was positive enough and again to some extent if we wanted to go back to really what to the essence of the algorithm if we were interested in really learning the algorithm that would be an issue that things keep changing obviously but that's not even the standard that we're going after in some sense if a large enough a large enough community of people get through multiple testing experiences the impression it's confirmed it's not just you own the cool your experience with with this algorithm that something is happening that may be ground enough to say that maybe that's not the intended effect that you wanted but that may be ground enough to go back and say well this is what your algorithm seems like to a reasonable amount of and that seems to agree somewhat to the standard of law that is rolling existing on it so again the point is not that we're not we've been having very long discussion including late in the night yesterday about are we reverse engineering or not we're not reverse engineering in the sense that the goal we're not going after a sheet of paper that says this is what your algorithm is but if somehow that might not being a lawyer but my feeling is that this is really how law works is there are of course rules but if a large and that's at least how our view works is if a large enough community of people have a common experience that this is how the the artifact interacting with works maybe it's true at least as far as law and RRB is concerned I would also be curious to see like people doing something similar you know over over periods of time so even just talking to them before and after it was nice seeing the change and you can imagine that over periods of time maybe like larger change as well you know have more collective shift so I would be curious from a qualitative point of view even to talk to people once every two years or so I was surprised that you didn't mention advertising and I think it's very reasonable to imagine that the actual algorithm the hidden algorithm has inputs from people who are buying your attention by the the feed and right now a lot of that is overt you know an ad announces itself as an ad incredibly annoyingly but more insidiously of course if someone mentions something that is either the name of a product for which someone has bought essentially advertising space it could be bumped up in the algorithm or even more insidious things and anyway you mentioned motivation and I'm relieved to see that Zuckerberg spoke with the Wall Street Journal and said that it has nothing to do with making money that the the reason that the newsfeed was opened up to advertising is to create the illusion of profitability so he could retain the best engineers I'm looking at an article here so don't worry about I was curious though here's my actual question you know there's there's a lot of hot air right now around consent with research in Facebook are you guys worried are you taking considering your kind of project differently in light of all this kind of misplaced I feel concern about consent and research I think the first part the advertising part so in the pre-reading the fourth the third link is all about advertising and we mentioned it a few times so I guess I'm just going to say bravo we agree so so the what the corrupt personalization thing is all about advertising and the example here about liking and the likes being attached to ads are about advertising so I definitely think that's a oh and Cedric's thing about clicking on the eye to see why the ad was recommended so I think we're we're totally with you on seeing this as being really relevant to advertising and the profit motive what Bob Crandall wants and what Mark wants so then you want to answer that you can have a consent oh consent I mean if you notice I mean we were we're probably more careful than we needed to be we brought people into a lab it's very consentee you know and you know they sat down they filled out a consent form you know they they opted to put in just to log into Facebook to do the study so this is about this is the more consentee side of things moving to more collectivism side of things Christian gave a great discussion and Cedric about you know looking at there's two different things here there's law and there's like ethics so like I see consent is falling into the ethics part of things and so far you know we've done you know we've been as consentee as we could be moving towards a more collectivist approach there's interesting ways to do consent and you can do online consent and so you know that's very commonplace it's more like you get a splash page if you like some if you agree to go further you click on it you know you give your consent and you you know proceed with the study you know just a minor point there I think the idea of a collaborative audit though would be collaborative like there was a really interesting this American life story about someone who performs housing audits as their job for a non-profit and they found this to be a transformative experience because they were shocked at the extent of racism and from my reading of the Facebook environment people would like to collaborate with us to help us figure out their news feed so in some sense we don't really have the same consent problem I think and that results with art that's we're also saying that's really a key thing like learning about the algorithm from three university professors is different from learning about the algorithm from Facebook telling you this is what it does hopefully we have more credibility we're not psychologists now but there's interesting approaches of things you can do like collaboratively as well and and still have consent people wanna people do wanna people really want to talk like these are some of the funnest discussions we've had I'm Nathan Matias a PhD student at the Media Lab and a Berkman Fellow and of the last eight months or so at MIT we've been reacting to legal challenges and things like subpoenas serve to students for research they're doing on things like Bitcoin and we're starting to talk to groups like the EFF to figure out how we can support researchers who are doing work that's in gray areas of law like audits which could fall afoul of the CFA and figure out where there might also be policy changes that could be needed I'm curious to hear how far down the road you've gone I'd love to chat afterward as well but I think I'd love to hear here about like how far down the legal road and policy road you've been thinking in this space well you know the thing about presenting about problems with the CFA in legal venues is that there's not a lot of debate because you say wow the CFA really sucks and then all the lawyers are like yeah it really sucks so the issue there might be political will but certainly this is something that we've talked about in other venues and I think it's great that you raise it here like there must be some lawyers around here somewhere right here we are and these are and we're researchers hopefully on the side of the just and we're struggling and partly we're struggling because of a sort of legal institutions that seem to be in our way here even though I think we're trying to do is the right thing similarly audits are really tricky with IRBs relating to the previous question because you are in an adversarial relationship with your subjects and the IRB doesn't really envision that because it evolved for medical research and so they don't really imagine the physician being in an adversarial relationship with their patients but so that there's an interesting discussion to be had there that probably is looking into although I think we've been lucky with a reasonable IRB so we say this is what we're doing and it's okay but this is an area that I think we need help with I mean the research community should stand up and say yes this is what we need to do person whose name I've forgotten your your point is what we should do we should reform these laws question about how you did you get the scene data uh... did you have to write your own javascript or does Facebook API give you the scene uh... content um... we go through a list of all of your friends and then look at their timelines I don't know if that's for the all for the scene, the API provides that we don't know if you actually saw it, we know if it was on your news feed if it was displayed by the computer and there's a difference between I'm sorry how far you scroll down too because if you get to the bottom then uh... you know it repopulates the screen and so it does that dynamically doesn't do it all on the fly so this is more of a speculative question then but so Facebook has that data why don't why don't they share that or when I post something why doesn't Facebook tell me who saw it or who that's a that's a good question there's a wonderful paper by Michael Bernstein what he did from within Facebook about looking at sort of like the imagined audience uh... and so they found that the audience was actually I think three times bigger than what you thought it was when you when you posted something but seeing, we were struggling with the terminology ourselves because technically it's not seeing it's more like appearing and whether or not you saw it one of the points made in that blog post linked to the talk announcement is that they do show the scene analytic if you manage a Facebook page or a Facebook group it is possible to get that scene analytic to show up and it's interesting because one motive that's been advanced not by me but by commentators is that they do that because they want to emphasize that that number could be bigger if you bought ads so that's why they reveal the scene analytic there is that they say oh look how low it is what what if you ran a Facebook campaign and then you would get you're thinking that very explicitly if you were on page I've seen that please this is kind of a silly question but is it possible that there's actually no one person in Facebook who knows exactly how the algorithm works I just say it because I've encountered situations where I've been on the user and trying to find out how uh... an algorithm works in a piece of software that we bought and I could never find anyone who could tell me how it was working just because the organization was large or the the key piece of code was buried very deep in the platform I wish I could answer that it's possible but I don't know if one person knows exactly how it works I do know that when you build a large complex pieces of software with many many people and if you have a really good chief architect then that person should know but I don't know for I honestly don't know well in the in the well I so I mean I I'm skeptical that they know I don't know what Kerry thinks about it but I I'm skeptical I mean but one thing that's important to point out is that in a collaborative audit we're not necessarily interested in a citing blame because we're interested in the consequences right now and so we are to detect those consequences so it doesn't matter that much to us if there is something bad going on like there's a particular person and I think that's important because some of our discussions about Facebook have devolved into a discussion of like whether Mark Zuckerberg is a nice man to quote Dan Schiller but that's really not the point like it's a you know Google plus has implemented many of the same algorithms as Facebook and so it's a structure that we need to address really not the whims of personalities of particular people although maybe there's an there's an evil person who knows it all and this you know it seems like your study in large part taught people to maybe be more aware of an algorithmic gaze with both interesting and complicated consequences and I was wondering if sort of in the process you guys got lots of what seemed like really exciting qualitative data if people reflected at all on the fact that it was their choices about relationships and communication that led to the results that they were seeing that the algorithm was in fact interacting with their own personal choices and whether or not you saw any self-reflection about how they might manage their relationships separate from Facebook I mean the idea of oh my brother isn't appearing in this maybe your brother isn't appearing because you're never listening to what your brother is saying so maybe you should listen to what your brother is saying if you think it actually matters or it turns out what people were saying about that specific comment was that you know I speak to my brother so much face-to-face I don't need to speak to him on Facebook so there's no relationship on Facebook because of the face-to-face relationship so there were a lot I couldn't go into that much depth because the terms of the qualitative data in the analysis that we did on the theories that people came up with they were men's they fell into the main categories of clicking behaviors like things that I actively do to articulate publicly that something is happening to sending inbox behaviors and then going a bit further into imagining even some algorithms that I don't know how to build yet but they were talking a lot about topic analysis they were talking a lot about reading pages you know outside like on Hacker News and because of something in Hacker News I'm getting this on Facebook right now so the stories that people were coming up with were pretty complex and pretty they were very interesting and I think that at least in the early phase is a lot of what they they suggested I think is very plausible like I don't know the algorithm but I can suspect that what I like has something to do with well actually from a Facebook paper written by the wonderful Mora Burke I do know that comments matter more than likes and so people picked up on that I think that's a really interesting point but I mean one thing to emphasize is that we could read the newsfeed as a sort of accurate lens of our personal relationships in some way but at the same time many of the features of the newsfeed are arbitrarily defined by Facebook and they're not analogous to any other system and they're evolving like the like like what is a like exactly what does it mean and so I think I would rather emphasize rather than this being like a result of my own choices that the fact that these are fairly arbitrary systems that we're teaching people to use and so like personally I never like anything I just am very careful about my friend selection so I usually want to see everyone if they're posting but the way if Facebook's own descriptions of its algorithm is correct that would really result in a sucky newsfeed for me because I don't interact with the like button or the comments enough and that's what happens so am I like wrong for not pressing the like button more and using the friend feature so but your broader point is algorithmic literacy which is a phrase that's gotten a lot of currency lately and it's an interesting one and I'm not sure what I think about it I don't know we talked about this so I'm not sure if I'm representing my co-authors but I mean in some way algorithmic literacy seems like a failure because it's like we collectively have built a bunch of systems that we don't like so the best we can do is sort of teach our children to be skeptical of them right so it would be nicer if the algorithm sort of worked and we're really everyone really liked them and then but at the same time when we do so for example if we taught people how to use the Facebook how to use the Facebook and the Facebook algorithm that might help them but in some ways as Kerry's pointed out in other contexts they don't need that knowledge I mean it's it's most useful when Facebook is doing something they don't like so if Facebook does something that they like all the time we might not need to know as much about how the algorithm works and this is the Zitrain proposal for a fiduciary relationship between you and the platform so that if you if the platform's acting in your interest that's great you don't need to know as much how how it works because you hire a lawyer you don't need to know the law they know the law they're working on your behalf but right now they're not working on your behalf and you're not sure what they're doing so anyway there's a lot to your question that I'm sorry so I don't think you you don't have a mic so it you said that there used to be a way that you could request to see everyone's post from particular person I think they did take that out now you can only hide them you can't we need a collaborative audit yeah to figure out this baby who has the mic now Dan is pointing yeah hi thanks this is really interesting I'm Nick Siever you see your vine anthropology uh hey we cite you Nick so I have a question about folk theories uh which are of interest to me disciplinary I guess but the the sort of take away being that most of these theories are folk theories and if you sit alongside someone who's working on a personalization algorithm say and ask them to explain things that are happening you get a lot of similarly well I wouldn't say uninformed but like informed in a way that doesn't necessarily have anything to do with what's going on under the hood explanations right they'll say oh well this must be because whatever and this is a lot to do with this Facebook question of like does anybody know what's really going on probably not it's distributed but also be not just because the algorithms are complicated but because they only do things in conjunction with data and the thing that these people spend all day bashing their heads against is the fact that unexpected data types come in screw with what the algorithm does and it's all edge cases right and so one of the things that I liked about the paper about auditing was this question of you know talking about sort of sampling the algorithm because what you're doing is you're also sampling the space of data that can go into the algorithm but I'm wondering where this goes I like the you know okay now people are aware of the algorithm and we're moving from ignorance to awareness or something like that but i'm not sure what happens in the future when you move from folk theory to something else especially if this idea of what the algorithms really up to is kind of fuzzy in a way that i would suggest is one of the we were just saying on twitter one of the only certain things about algorithms is that they're actually fairly uncertain about what they're doing because they're changing all the time etc so I'm just curious like what you're going to do about the fact that you might say that all theories are folk theories at the end and so why pick one over another so I think that's a that's a really nice point just to to go back a little bit you know we came across that same problem in developing our study because we we used to use the term mental maps a lot we wanted to see what people's mental map was of the algorithm and it turns out if you don't know when exists you don't really have a mental map of one in the first place and you don't create a mental map by using our tool once you know you need to experience something quite a bit to develop you know a model in your head for what that actually means and so in some ways you know one path forward is to you know collect some of these collect some of these folk theories and see you know what the common denominators are you know another path is to actually put out a few like tasks for people and say look if you do this you know can you tell us what the consequences might be and collect a bit more solid data if they volunteered to give it the idea is not to like I think starting with the folk theories has been so much fun because it's been it's amazing how quickly people know the insides of their networks in ways that is outsiders we could never fully comprehend and they know sure that the the small little things that they did on facebook like like two days ago that might have influenced something that we couldn't even have picked out in an interview but one of the things that we're exploring right now is looking at specific tasks like if I do this then what might happen or christian had a really nice story where one of a posts that he made stayed at the top for a really long period of time and he was asking people you know how long it was there why it was there I don't remember the exact details of your big story but yeah I mean so I I think I completely agree with Kerry's response just to get to your first question or maybe it was your last about what if what do you do are you just going to gather what people think about these algorithms and it's all going to be sort of a folk theory but remember that one point of feedback is that it is actually gathering information about the feed and what is shown to you and what isn't and so that part isn't a folk theory really it's we're asking people to explain it but that is actual data about what we think the algorithm is doing and keep in mind as well Crandall's theorem like Crandall's view of algorithms is that they're all rigged and I mean not to be sensational but I mean you know he was testifying before congress because he was in trouble and similarly like the Latanya Sweeney example I mentioned like there are laws against things that algorithms might be doing and the consequences are real and and that's really important and so you could imagine not just an assemblage of interesting things people said about algorithms but you could imagine the discovery of really troubling things going on with algorithms and I mean one example of this would be Adelman's work here at Harvard where just by making individual queries in a variety of platforms himself not in a collaborative audit but just like an individual audit he's found all kinds of stuff going on that you know seem like I think he puts it as you know raises concerns about fraud like I mean and his most pithy example is probably the way that he added commas to certain search queries on Google and revealed what he believed were hard coded rules that would only respond to certain keywords even though Google has said it doesn't use hard coded rules but he seemed to think that if you put a comma after some search words that related to health Google health would then become the first first result and then if you eliminated the comma it would not be the first search result and that's very interesting because I'm not a antitrust lawyer but I think they would be interested potentially well maybe not in the US in Europe they care about that so actually building on this question of sampling I have a two question same question from the computer science point and the law point so the data here is extremely highly dimensional and the idea that you can sample even a non-zero fraction of that is sort of ridiculous like it's impossible because there's only seven billion people and they're easily you know trillions and trillions and trillions of possible inputs so computer science from the computer science standpoint have you thought about how to deal with the dimensionality and exploring the space from the sampling standpoint I guess from the legal perspective is does that matter to you if an algorithm could be doing something like so for example I think of discrimination in advertising so it is the fact that it could be doing something legal or is currently doing something illegal or not ethically desirable does that make a difference so let me take the the second one the second one first which I think again is that's where I see a difference between to be technical mal-lientification and what is what is what might be required for law purposes that provided and I'll go to the first question after that provided I could come up with a model and I could say that you know there is a model that's learnable that seems the best explanation based on the sampling I have is that you discriminate I don't have a proof but all that may all that what that may buy me in a court of law or towards a court of law is maybe the option to subpoena maybe I can actually if that's what seems to happen I'm not I will never be able that that's the difficulty is I don't think we'll ever be able to assign intent I will never be able to come out of an audit and say I do know for sure that you know you're using race as a discriminating factor all I can say is here are very simple models or here are the most probable models based on you know the traces of the data the data that we have that explain the data and it seems that you are actually discriminated and maybe that might actually be of interest even to the programmer because you you may not even know for example in French in France it is illegal to have any data that is based on religion you cannot ask about religion right you and to some extent you can always take surrogates for it but if by doing so you were to construct an object that could predict that could predict religion you might be in trouble you may not want to do it so again no I I I do not believe I do not expect you ever be able to assign intent or to even to come up with a model that's likely that is the true model with hyperability but it might just be enough and again that's also what audits do is we you never at the legal level you never end up with you know you cannot say that discrimination is at play for sure all you can say is it very much looks like you're acting as if you're discriminating stop do something about it if add noise so be it so the standard is different also that's that's why in some sense we think that the concept of audit is actually powerful here in terms of sampling you have a good point that definitely they are that's something that we started looking into I again there my there are intrinsic limits to what's learnable from what we could do and maybe we won't be able that's one of the things we hope to be able to learn as we scale feed these up is maybe simply you know there you can go up the level of fundamental right fundamental theorems as to exactly you know what's the sample complexity learning particular thing so we're hoping to be able to we're hoping that we can still learn something that that would be a great experiment in practice is it can we are we going to hit those bounds or not but absolutely I I also do not expect to be able to learn everything perfectly and again neither do I need do I even want to do this I mean just but that's a great question a closing point on that is that you know the quantitative models we use now for things like this really suck so like to stay on the antitrust theme let's look at the HHCI is it a really great model of anything wait what does it stand for something hirschenfeld someone know no one knows eight all right all right that's whether whether you are right is there is there right that's the test to decide whether you are uh you are um you deserve antitrust it is a quantitative test to determine whether you deserve antitrust scrutiny and it's very very simplistic and in fact some somewhat ridiculous and yet it's in wide use and so we don't actually have to do that much in order to do as well or better as some of the things that we're using now so I think we can improve on current practice I think Dan is giving us a signal here yeah is this is that the signal you're giving is it this signal we can manage one more question well who has the mic is probably the person who gets asked the question yeah am I mourning this algorithm by never viewing my run my quote raw facebook filter you know that's a really good question can you tell me a little bit more about how you create these and who's on these because I don't give us people I actually want to see okay I can't Facebook so much that I'm on it twice okay and the account that I actually used which is not the easy one to find much smaller than other people but then even on that I have a smaller list that's like people I know that are actually part of my life they don't post it on Facebook because I have good friends who do that yeah it's very if I you know kind of like somebody's stuff when I go look at the alpha limit yeah hanging out with them all the time and so that's the only thing I ever look at on Facebook so I guess I'm so where is the fourth ink so you're seeing this list I see I see so you're not using the smart lists you're making your own list okay so they know who you see I can tell you that the smart lists are more like a recommender like system so the idea is that you know from the beginning they're just not very good at all how many people here use smart lists exactly okay he said he said what are smart I might be using them without knowing they're called that so they basically prepopulate some groups with close friends acquaintances and so Zuckerberg actually gave a talk I think that less than five percent of people actually use these so they're not that widely used if you have a in a sense you're creating your own news feed so you've made your own personalized news feed so they see what you see in that feed and they probably make predictions within that feed but right now like the close friends and the Quentin list from my interpretation I don't think that there's that much that you can that much meaningful data you can get from that yet but it's an interesting point I mean we are admirers of this stream of literature that you may know which is about sort of the tactical use of a system in opposite ways to what it's intended for some political purpose and a good scholarly example is what does it track me not yeah so that yeah like Helen Nissenbaum and I forget her co-author right now but it's a great so it sends false data about you in order to protect your privacy so there's a really interesting area although that there's an interesting critique of it by Bush Nyer from a national security viewpoint of it yeah like some other thwarting might be if you put names of people in there also that you don't care to read and somehow find a way to read what you really want out of that mess but that gets I don't know how you would do that yeah yeah and so it turns out people don't want to do that people it takes time Dan's giving me eyes yeah so so thank you very much for a very engaged audience