 Hi everybody. Welcome to ML Talks. Today our guest is Julia Angwin. I think not a week goes by when I don't hear somebody say that we need more Julia Angwins but but there's only one and we have her today. And Julia is a data scientist and a journalist and it's a really important combination as you'll soon find out. And she's a director's fellow of the Media Lab and working with us and has been helping us for about a year now I guess. And as usual this is being streamed and so if you're watching this on the internet you can tweet at hashtag MLTalks and we will towards the end be taking questions from the audience and from Twitter. So comment and feel free to ask questions. But we'll start with some comments from an presentation from Julia. Thank you. Hello. It's great to be here. So as you can see my talk is called Quantifying Forgiveness which is sort of a strange title for a talk. And so I'm gonna start with just actually a little bit of background about who am I why am I standing here which I think is always a little bit helpful. And then talk about forgiveness and then talk about quantifying forgiveness. So I'm gonna start with just me. I grew up in Palo Alto and I really probably thought I was gonna end up at a place like MIT. I learned to program in fifth grade because Steve Jobs was teaching everyone in the public schools in Palo Alto. This is my first computer. I worked my summers at Hewlett Packard. I was really ready to go into the personal computer industry which is what it was at that time. And I took a wrong turn. I fell in love with my college newspaper and decided to go into journalism and I thought well I'll just try it for a few years and maybe I'll go back to the real world of computers. Because when I grew up in Palo Alto there were really two life choices. Hardware, software. And I was pretty much a software girl but I didn't know there were other choices. So journalism was my rebellion. I eventually ended up at the Wall Street Journal. I joined in 2000 during the dot-com boom which I guess now is ancient history but it was hilarious. They were like you know computers. We'll hire you to cover the internet. And I was like well anything in particular about the internet? And they're like no everything. I was like okay seems fine. So they were just couldn't get enough people to write about technology. And I was there for 14 years and then I went to ProPublica where I am now which is a nonprofit journalism startup which if you don't know about it's investigative journalism started by the former managing editor of the Wall Street Journal. He left when Rupert Murdoch bought the paper. So I want to tell you a little story about forgiveness in the real world based on my experience as a reporter at the Wall Street Journal. So I joined in 2000 I left in 2014 and I covered technology right internet whatever. And during that time I achieved what you know journalists sadly consider like their great dream is to get you know somebody locked up right you got you wrote such hard-hitting stories that somebody went to jail. And during that time two people did go to jail because of my reporting. Strangely they were both black men. Now how many black men are in the technology business right of all the executives that I wrote about it is surprising to me that this is the outcome. So I'm going to tell you the stories. So first of all when was this 2003 spam was a really big deal. And so I was like I'm going to find a spammer and this is you know exciting stuff. So I worked with EarthLink which was looking for a certain spammer. I tracked this guy down found him at his home in Buffalo knocked on his door. He didn't answer yelled at me through the door. But I wrote a big story about like you know the hunt for spammers and how it was really difficult. And eventually he was charged and he was sent to prison for you know the maximum sentence for 14 counts of identity theft for three and a half to seven years. So you know everyone in my office was like very excited. I felt it was a little weird honestly you know but I was young. I was like okay journalism. A couple years later I'm writing about AOL and I actually heard a tip that somebody inside was embezzling. So I investigated I found out there was a guy he was the head of HR also a black man. He's removed his pictures from the internet so I can't show you a picture of him. And he was caught and AOL had been trying to cover it up and so I wrote about it and once I wrote about it they brought charges and he was he was sentenced to 46 months in prison. Now neither of these people were doing things I mean they were doing illegal things right. But think about what I wrote about that was the most illegal thing that I wrote about. The most illegal thing I wrote about was at AOL the round trip deals they used to inflate their revenue so that they could increase their stock price. So they did these crazy deals during the dot-com boom where they would instead of like contracting for their cafeteria vendor just to pay them to deliver food to the employees they would say actually we're gonna have you overpay you and then you're gonna buy at and because these companies were only being measured on ad revenue not on net income not on profits. And so this was a scheme that actually inflated their revenue by billions of dollars they paid 300 million dollars in fines and they're all doing completely fine. Okay so Steve Case is worth 1.36 billion he invests in all sorts of good causes. David Colburn who led all those deals actually is bringing lots of investment to Israel and Bob Pittman who actually was our detective at all is the chairman of Clear Channel which is a huge outdoor billboard company right. And so like you guys know this story right in your guts we all know this story. This story is the story we all know which is some people are forgiven for their crimes and some people are not. And they kind of have similar traits some of them are white some of them are black. And that's just my anecdotal experience but there's an enormous amount of data that supports that right. And so that's my personal story of forgiveness which is I feel bad about it for participating in this and I feel sad when my fellow journalists want to get together and crow about who they got put away. I don't want to participate in that. So I've started investigating forgiveness in the digital world because actually the weird thing about automation and technology is it is audible. And so we can see systemic bias in a way that we can't really see in human minds. And so I'm going to tell you about two different investigations I've done that have led me to some conclusions about algorithmic forgiveness. The first is oh first of all just I forgot algorithms are very important and you know about them they're in your lives all the time. This is the Facebook blue feed red feed which if you haven't seen is a really great project by the Wall Street Journal just shows you what your news feed would look like in a blue state or red state basically in terms of your political leanings and how different your news looks. So an algorithm that I've looked at is this one that predicts the risk of recidivism. So it's used across the nation in criminal justice to decide whether you're likely to go on to commit a future crime. And it asks you a whole bunch of questions and their input into some software and it spits out a score one through ten are you risky or not. And then it's used for pre-trial whether you're gonna get out on bail. In many states it's used for sentencing and it's often used for parole and in some places in California actually it's used within the prison system to sort you into medium and high-risk prisons. And so it's one of the most popular there are dozens of risk assessment tools and use in the criminal justice system but this is one of the most popular ones it's a proprietary software not open-source not inspectable. But I wanted to look at it so we went and fought a FOIA battle in Florida and got the records of 18,000 people who had been scored by this program over a two-year period. So in Broward County when you're arrested every person who comes in for booking gets scored and that's entered into the system. Interestingly and then the pre-trial judge looks at it when he is making the decision about whether to release you out on bail. Interestingly everyone in Broward County that I talked to had no idea they were being scored so they were just asked questions at intake but they didn't know it was going into a scoring system and the score is not described or just discussed in the pre-trial hearing the judge just gets it as information to be used. So the first thing that we did after fighting a five-month legal battle to get this data was just to look at it. What does it look like by race for instance since we know race is a big issue in the criminal justice system and this is what it looks like. Basically black defendants scores on the left were steady one through ten pretty evenly distributed and white defendants scores were strangely clustered at the low end right. So we thought okay if we were lazy we could write a story right now saying the score is biased but the truth is who knows right maybe every one of those people in the low risk category is actually Mother Teresa they were picked up for littering and they're the greatest people on earth so we had to do a very sad thing which was to look up the criminal records of 18,000 people and their criminal outcomes so basically what we did was we found everyone's criminal history and then we also found their actual recidivism outcomes so we had to drop a lot of people from the sample because not everyone had been out for two years but essentially we got down to a sample of 7,000 people for whom we had full records meaning we had their criminal history and then we also had two years worth of days that they were free because we took out the time that they were incarcerated for jail or for prison and added up do we have a two-year stretch so then we had this very nice sample which by the way required an enormous amount of blood sweat and tears okay terrible amounts of blood sweat and tears joining databases on name and birth date is a task I would wish on no one okay typos they're terrible aliases all sorts of terrible issues but Broward County was very helpful because they had wanted to join these databases forever to see if their score was working but they didn't have the time or interest in doing it so they actually hand-checked for us 1500 records of missed names and birth dates so in the end we had I think nine months ten months after starting we could run our five minute long logistic regression which is the fun part and what we found is that if you controlled for all the factors so you basically if you don't know what a regression is it's just a way mathematically to try to create like a balanced pair to see what would the equivalent people when you control for these you remove all these other factors there's what would it these people look like to if they were similar okay that's a terrible description of a Russian but anyways close enough so we basically controlled for prior crimes for your future recidivism your age and gender meaning if you had two people who had those same exact things their same prior criminal record same outcomes same age same gender what was their difference in scores and you have this difference that was pretty stark 45% more likely black defendants were more 45% more likely to be assigned a higher risk for with the same set of facts now the problem is that it's really hard to write a news article that says 45% more likely editors don't like that readers don't like it it's very hard to comprehend what is 45% more likely mean so the way to really describe this is in false positives and false negatives so a false positive is somebody who was deemed to be positive a high risk but actually was not so they were falsely accused of being high risk of future criminality and false negative is obviously somebody who is falsely accused of being low risk but it turned out to be high risk and so then when you see when you look at the false negative and false positive rates is that there's this huge disparity African-American defendants are twice as likely to be given a false positive then a white defendant and similarly the white defendants are twice as likely to be a false negative then the black defendants and so what was super weird about this was that the problem with these scores was all in the error rates the score did I forget to put it in I forgot to put the slide but anyways the score is 60% accurate for both races so that's a pretty crappy record to be honest like I'd be fired if my stories were 60% accurate but like in the criminal justice system this was considered like an okay finding so we found it was 60% accurate but all the bias was in this 40% error rate so that one group was getting over scored and one group was getting dramatically underscored and what that looks like in in real life and this is how you tell this is how I would I tell the stories I found people right who had a similar crime and described their situation right so here's a guy who's high low risk Vernon Prater got a three and Brisha Borden got an eight now let's look at their facts so Vernon had previously been first of all they were both arrested for petty theft Vernon had previously had two armed robberies and it already served a five year sentence for armed robbery the arrest he had for this score was he had shoplifted $80 worth of stuff from a CVS and after this score he went on to break into an electronics warehouse and steal thousands of dollars of goods and he's serving a ten-year sentence right now Brisha was also picked up for petty theft Brisha is 18 and she was walking down the street with her friend and they saw a kids bicycle in the front yard of his house they grabbed it and tried to ride it down the street the mom came out and yelled hey that's my kids bicycle she came back and gave it back however in the meantime a very nosy neighbor had called the police and so her she was arrested for petty theft actually they charged her for burglary also but later I believe dropped it she was scored high risk now her previous offenses I don't know they're juvenile misdemeanors so the juvenile records are sealed but I do know that misdemeanors are not usually armed robbery so I'm guessing they were less than Vernon's and her subsequent offenses were none right and so this is exactly what a false positive and a false negative look like she was a false positive she was considered way more high risk than she turned out to be and he was considered way more low risk than he turned out to be and the thing that's weird about it is in your mind if you somebody had said to you what do you think these people are likely to do you probably wouldn't have made that mistake but the computer made the mistake because of the way that its inputs are scored now we don't know what the score how they generate the score this is a secret algorithm so they don't tell you I will tell you this though the night before we published the company was very upset obviously about this story and they said okay our our secret equation is trade secret you can't share it with anyone but Julia you can look at it so they sent it to me it was a linear equation with like KD whatever for constants for the weights for the variables well I don't know I'm supposed to know if this is biased or not right and I would defy you even if you had those to prove the disparate impact right the thing is you have to analyze the outcomes to really figure out how this is behaving and so really what was interesting and there's a lot of interesting things about this and there's been many papers on this work because we put out the data and the code for people to analyze and I encourage you all to look at it if you haven't but I think it really speaks to the idea that we think about bias but what this was was unjustified forgiveness right actually this data our intuition was correct right it's not the only part of the story but it was actually a big part of the story was that these were getting a massive break and it wasn't justified so I think it's interesting to frame it around forgiveness because I think are also that's intuitively what we understand to be going on that's what I understand to be going on based on my own experience of covering the criminality of the tech industry which is like basically those three examples that I know about so I want to tell another story about another algorithm that we were able to quantify so this is an algorithm that protects the risk of car accidents it's the one that car insurance companies use to set your premiums so insurance is supposed to be a risk-based metric where you contribute to the pool based on how much risk you're bringing to the pool so we decided to test that because in fact there's been long observed that minority neighborhoods get higher rates and no one has ever been able to explain why the car insurance companies say those neighborhoods are more risky but no one has been able to measure it so we decided my team because we just hadn't had enough fun joining the criminal justice databases that we would try another gigantic data project so we went and actually worked with consumer reports which bought us a data set which was 30 million quotes for car insurance by zip code across the US and we bought different driver profiles and this we could have obtained by reading every car insurance filing in every state ourselves and calculating but it was easier to buy and then what we did was we filed the public records request in all 50 states for the actual risk of actual payouts that insurers have made by zip code now tragically only four states collect that data so we could only analyze it in four states but we still had four states so we looked at California, Illinois, Texas and Missouri and we compared premiums versus payouts for a single safe driver so essentially controlling the risk of the driver what do you see in the difference between premiums and payouts because car insurance companies have this extra factor that they additionally to your safe driver profile to your driving profile they choose to put a certain surcharge or discount based on your zip code this is something that they're allowed to do and so they base it on this idea that some zip codes are less safe than others I don't personally understand this because I don't know about you guys but I do drive outside my zip code that's the whole reason I have a car but anyways I guess this is their their fun times so one so basically we wanted to remove all things other than zip code and see what was the difference and so we did this horrifying chart which I'm sorry if any of you've ever looked at it we clearly need some data visualization help but we did the average prediction of the average of the minority premiums over non minorities oh my god this is the worst and looked at here I may just go to the next one so basically the risk versus the risk is the x-axis which is the actual payout right scaled from least amount to most so the farthest risk is on the right hand side and then the premiums are on the y-axis so the increase in premiums and what you see the red the linear line is minority neighborhoods they actually track risk so the premiums go up as risk goes up what you see in most and this was just one put company we did this per company so this was one company in Missouri this is a Geico in Missouri but what we thought we saw the same pattern in every almost every sip in every company you what you see is an unexplained difference reducing risk for white neighborhoods so what this showed was there was an unexplained discount in white neighborhoods that didn't track risk and that was a very surprising result because everyone again thinks about bias right but it was an unexplained discount and this is what it looks like in real life so Otis Nash plays $190 a month for Geico car insurance he's had no accidents he works two jobs he's you know really diligent father and a really lovely person who I hung out with in Chicago he lives in East Garfield Park which I don't know if you guys know Chicago but it's one of those really kind of rundown West neighborhoods that is filled with graffiti and trying to emerge but you know what we call the inner city now this is Ryan Ryan lives in really Wrigley Park and he is like it's a classic bars and a yuppie people neighborhood and he pays $55 a month for his Geico car insurance even though his spouse recently had an accident and the thing is that the difference really a lot of it was this base rate so they these insurers have set a base rate for property damage in East Garfield Park of $753 a year and in Wrigley Park of $370 a year so literally twice as much in East Garfield Park and when we looked at the payouts they are actually lower in East Garfield Park than in Wrigley Park right so this is not explained by risk and this difference in their prices is largely driven by this crazy different and property damage base rates and the reason is because Chicago actually weirdly tried to get rid of redlining in the car insurance market so they said no one can ever change the price the non property damage rates by zip code so they lump all of their changes into the property damage part of it despite the fact that the risk doesn't support it so once again this is a question of forgiveness right we have this gap and this is a gap where we've chosen to give one kind of group of people a pass and so I guess I would just like to challenge all of you when we talk about bias to also think about forgiveness because the data suggests not always but in these two particular cases surprisingly that it was an unexplained discount not based on risk that was really the problem and so I think we should think about that as a society that that's one way to think about the challenges we're facing and I guess I would just want to leave it with the fact that I really am thankful in a weird way though that we're choosing to automate some of these biases because I think we need to collectively see them and the ability to audit them is really powerful right and we have made change through these California has forced several companies to change their rates as a result and there's bills pending in other states as a result of it on car insurance the criminal justice field is debating heavily the use of these risk assessment scores so I am hopeful that these kind of data can help change the debate thank you so I wanted to sort of just start with the last thing that you presented which was forgiveness in the in the Chicago premiums but well first of what what like who did it who did it is it the data is it somebody going in there and being racist and change yeah sure well we what we see what we found and really we have the best evidence for this in California because the companies have to give more information there but what we found in California which I think is likely to be true in the other states as well is that actually the the real problem was that in white a lot of these whiter neighborhoods were rural and there wasn't a lot of data and so they didn't have enough data to really make a true risk calculation so they guessed so in California what they did was there was a loophole in the law where they could string together a bunch of zip codes that were neighboring and use so you it was like transitive you could take your neighbors neighbors risk score risk and put it in yours and so they were just transferring one low risk and assuming that it was spread around and so the regulators have stepped in and said you know they're gonna have to work harder to justify their use of those addition neighboring zip codes risk in places where they have sparse data but I actually think it was like they didn't have enough data so they made a guess and their guess was look these are a bunch of nice white people so so so the because the Chicago one yeah is slightly different right because they do probably did have data right yeah so I'm not sure why because Chicago there is plenty of data there's a lot of history in Chicago redlining and maybe yeah one thing that is interesting when I talk to the insurers because I've talked to them extensively about it there is no one's ever said this directly but there's been a lot of like you know Julia it's hard to change people's rates they might leave and so like I suspect there may be some like oh these people might shop around right and so we want to keep it low so so in a way they were just obscuring it so so it could it's hard to tell whether it's crappy data bag of algorithms or just somebody hiding behind sort of this veil of data and doing infiddling being corrupted you can't really tell or I mean I actually think it could be lost leader right like you know it's just marketing yeah it's marketing like you've got to get those white people in because they're going to bring more people or something right yeah yeah and in insurance I mean as you're starting to poke into this because I was just reading a paper about the use of FICO scores the credit scores for really shady things like targeting predatory product sales based on household and stuff like that but it's it's really at the edge of the law in the stuff the stuff that you showed in the insurance is that is any of that illegal I mean is it regulated are they doing the right yeah it's interesting so what I learned about insurance is it's I don't know if you know this they have an exemption from antitrust law so Congress gave them an exemption and so they're only regulated by the states and I think it's fair to say that a lot of states are not really heavily regulating them California is the most aggressive regulator Illinois has chosen I'm sure it has nothing to do with the fact that state farm and all state are based there that to entirely not regulate they don't check anything you can do any I could start an insurance company tomorrow they don't pay tax either right well nobody pays taxes so but so I guess one of the things that's interesting is as you start to shine the light on these things are the regulators responding say oh wow we didn't know maybe we better do something about it I mean we've had a really constructive dialogue with California they're the most well-staffed regulatory insurance regulatory office they're really they have tons of actuaries I think they have hundreds of actuaries and so they take their job really seriously and we had some really good back and forth about the data and the real technicalities and they took it to heart Illinois Missouri and Texas said whatever guys thanks for your paper you didn't find the modern untouchables there take down crime no but I mean I think it has you know it has given some conversation there's these groups of the National Association of insurers and stuff so I think it's being talked about but it's a hard industry to make change in and I think you you were mentioning before in a conversation we're having that people had anecdotal evidence of this but the data actually gave a lot of energy to the conversation is that you do you find that true it broadly yeah I mean I think the one thing that's sort of depressing about a lot of my findings is that people are like well obviously of course this risk score is gonna be biased against black defendants like of course they're redlining and yet it's still important to prove it right because even if you think it's true it's we need the data to support it and so sometimes these stories can feel a little underwhelming you know my editors like whatever like we know that and yet it's I think the real benefit I think we always release our data and our code is that I find that that is what propels the argument like we're in a world now where you can talk about whatever but until you lob data over the fence you don't get a real policy delay dialogue going I heard Kathy O'Neill is the author of weapons of mass destruction on the radio the other day using the term math explaining that when you pull out the data and hit people with it it's really hard right and because the the the companies have been doing this forever right they and even now and I think Kathy was telling me that wasn't Kathy was another friend Sandal was telling me that like even like just unemployment rates we're measuring right now people who are looking for jobs that can't find them right but we're not including people with disabilities people who've given up and so if you look at those it's actually going crazy and so the other part you were talking about visualization is how you present the data yeah and I think you as a journalist are presenting the data in a specific way to sort of shine the light on the bad guys but that's also really interesting and important and and and partially also where you get criticized too right because you obviously have a point of view that you're using the data to expose and the company will come back and say no no no her data only shows this and it doesn't show that right right I mean the insurers say look Julia you're using the wrong data you're using the average losses per zip code so what we got from point is the average of all insurers and all their losses on average per zip code over a three-year period and they're like our individual losses could be a huge outlier however they don't share those that's secret data so they're like well we have secret data that shows that everything is awesome and I'm like well fine let's share share that let's talk about it but they don't want to but you know all these data conversations always come down to that which is like you're looking at the wrong pool and that's why I feel so strongly about journalists collecting their own data we need to know what we're looking for and go get it because received data sets have the people who collect them there's a reason they don't collect it if they don't want to know it right and so you have to go get it yourself right it's they don't ask what you don't want to know right but but let's we want to talk about some of your criticisms too because you're I think the first presentation about the risk scores I mean that was such a huge impact that I think the company responded and then academics responded and then you responded and I sort of want to go through some of these because I think it's actually interesting because the word bias also we just I'm teaching a course with Jonathan's a train and actually we had Kathy O'Neill and last Tuesday was about algorithmic bias and bias can mean so many things right it can mean a point of view it can mean unfairness it can mean data that's skewed and so you know what I think one of the criticisms was that if you optimize for the thing that you pointing out which is the false positives then the accuracy rate would go down and that you can't optimize for both and the argument I guess from the company was we're more concerned with making sure that we get the number of the risk of the recidivism risks right and the people who end up a little bit longer in jail don't matter as much is that roughly yeah the their basic argument was it's 60% accurate for both black and white defendants and we've optimized the algorithm so that it's fair in its predictive accuracy and we don't care we don't think your idea fairness where you think that like this disparity in the error rates matters right and that's a point of view and it's a point of view that is shared by that whole field the current you know all of the risk assessment scores are designed this way and it comes from you know a history that they've had but you know if you talk to people in medicine obviously you're not gonna not pay attention to the false positives of the people who died because your medicine is bad right that's a huge part of your decision in diagnostic tests and so I think it's like a semantic argument we are pointing out that they've chosen a definition of fairness that has this disparate impact in the error rate and they're saying well that's not a fair thing because if you change the error rate you would change this optimization for fairness at predictive accuracy but like I feel like in the criminal justice context to say that you're totally fine with false positives when the whole point of due process is actually the default to innocence and so I find that a really hard argument but that is the argument that they're making and then there's the other argument that even with that they're not as judges is that well yes and that people say that to me all the time you know what judges are so much worse Julie you've got no idea and I'm like that is true I have no idea please present me the data and I will analyze it and I mean I think that is probably true of some judges and probably some are better and it's a question of how do you do that controlled study and I'm not necessarily the one to do it I couldn't you know in the jurisdiction that I was looking at at Broward they use the assessment there was no controlled judge who wasn't using the assessment that I could compare outcomes with and I think that's important academic work but it doesn't in any way take away from the fact that the score itself is biased and also it's kind of like fair for who right and I mean it and I think that's the weird thing about the word fair everybody uses of like they want to be fair for them yeah and and it's it's it's sort of a weird question like just on on fixing bias this is a slightly sort of philosophical questions like is your goal to eliminate like what are you solving for who are you I mean okay you're your journalist so you're trying to be neutral and shine the light but but I wonder I wonder you know like if you have I think Cathy only was talking about child abuse right so you have these predictors now that try to figure out which families are beating their children yeah and so false positive where you take a child away from a perfectly fine family or false negative where you don't intervene right you have very different outcomes both horrible right I mean obviously accuracy is important but assuming you're gonna have fun of the you're gonna lean one way or the other how should we be deciding what what I mean yeah and then again I think their view would be similar to the criminal justice but our tools are still better than what we have now which is we can't predict anything and we just wait for a phone call right right I mean I think that's right I think I don't know if you've read Virginia Eubanks book automating inequality but she's really good on this point and she she talks about the fact that you know these child abuse algorithms are themselves to it's too small a lens on the problem right child abuse and neglect is usually a symptom of poverty and so if you were to bring some resources to bear to help the family maybe that would probably be better but instead it's all about predicting this tiny narrow thing which is actually really really hard to predict right so predicting human violence is extremely difficult one thing I didn't talk about is we there's a score the compass had for violence so they predict violent recidivism it was only 20% accurate it had the same exact bias what is what is 20% accurate mean that means 80% on inaccurate right so that means worse than the coin toss yes so it's like when predictive accuracy is when you predict it will happen I see 20% of the time you're right I see okay so it's a little better than you can't compare it to a coin it's not the same as a coin toss but it's not a good number okay and 60% for the recidivism is also not a good number right right the industry's gold standard by the way for criminal risk scores is 70% they think they're winning when they get to that point okay but I would like to say that previously in I looked back in the literature in the 70s psychologists used to make these you know violent predictions they would be brought in is this person gonna be violent they'd interview them and they were judged to be only 53% accurate and so they were actually taken like that was decided to be not good enough and now we've like come with this automation that is only 20% interesting and and then I think you mentioned it and and Chelsea's in the audience that she's gone around interviewing people we your article actually spurred the creation of they were calling it how he was it humane humanizing AI in law but they've been running around talking to jurisdictions and doing ethnography but one of the things that I think she found was that the data is just crappy yes right I mean you're underpaid clerks entering data and it's what how much of it do you think it's just that oh for sure I mean first of all the data is crappy in the sense that the really large sense which is like even what they're trying to obtain which is the questions on the risk score are do you live in a neighborhood where there's a lot of crimes anyone in your family ever been committed a convicted of a crime so already it's like anyone plenty of people had written before we did this analysis like this is obviously going to be very biased against poor minority neighborhoods secondly the outcome of what they say recidivism in is is actually a new arrest now arrest is not the same as a new crime a new arrest is obviously people get arrested for all sorts of things you know Chelsea and I were outside joking that we could stand on the street corner and smoke marijuana and we would probably never get arrested we do no matter how we try because we're two white ladies right and so I hope you didn't try no we didn't try but you know there's just we know that there's over policing and over arrest in some communities so I guess I have to be a little bit more technical in my terms because I'm crappy can mean several things it could mean just noisy or it can be socially crappy which is sort of what you're saying so I guess the question is and this is something actually Karthik was working on which is let's assume you had completely accurate data and that you were predicting a hundred percent right would it be fair right so that's a question I think that is hard for me to answer I personally feel very uncomfortable with I think that we should all really question the use of a future crime in the sentencing of a current crime right like just on a philosophical level whether or not it's true right so I think that's like a bird like that's a barrier we all have to cross as a society together and be okay with I'm queasy about that right I believe in human change and redemption you know and so I guess I'm I wouldn't I'm not really on board with that but I think we have to make these decisions as a society we've made a lot of really terrible and really good decisions together society yeah and and and and the work that Karthik is working on is to try to stop focusing so much on prediction but to focus on causal inference and trying to understand the underlying causes and address them and try to maybe lower overall crime or reduce income disparity rather than just more accurately throwing criminals in jail and the other thing and this is again something that Kathy and Anil talked a lot about but like if you look at first of all there are two slides that were pretty amazing which was the relationship between arrests and crimes and I think she was saying something like homicides only half the people are caught and that most of the people who are committing crimes aren't arrested and most of the people who are arrested aren't actually committing violent crimes yeah and so the relationship between bad crime crimes and arrests are sort of not really correlated but arrest records are what you're using for predictive policing and obviously if the police are being guided to neighborhoods where there are lots of arrests they're gonna make a lot of arrests so they're gonna find their share of criminals and they're not gonna catch you guys smoking pot on the corner and then it's gonna actually be a self-fulfilling prophecy that you're gonna be a recidivism statistic because you're much more likely to get arrested if you live in the neighborhood where you have high recidivism score and then it becomes this sort of self-reinforcing positive feedback loop that just cause like makes you into a criminal and so even if it were accurate I think when you think about systems dynamics what happens is it just creates these reinforcements which I think confound the social reinforcements we already have which is poor people don't get the opportunities and so on so forth so you know could these algorithms be actually not only just representing societal bias but amplifying them at some you know terrible rate yeah I mean I think they are and I think the one the thing that I that is slightly hopeful I think is that in that moment we are in right now where they're just starting to be automated and amplifying if we can catch them and diagnose the problem and we all could decide together that that's wrong and fix it there's an opportunity so let me poke the phrase you just said decide together how does that work well you know democracy okay so we decided together on our president we decided together on a wall so I think one of the one of the worries that I have is if you have a somewhat clunky situation outside and you're pointing out these biases first of all you have to actually have people think that they're bad right I mean if you're on the if you're some white guy that's saying I'm getting lower insurance rates and I get out of jail free card what's wrong with this right so so I guess the question sort of is do you think of yourself as a you know left leaning liberal person trying to hack the system towards your own personal agenda of making things less biased in against the popular norm of society you know no I don't I actually see myself as a data terrorist I just throwing data I'm like I think it was already but but yeah no I really I do take my role as a watchdog seriously and I see that as my role in life and I really enjoy it I like to be the thorn in the side you are because and I was on a mailing list and I won't say who but somebody was arguing quite eloquently that we should just ban all algorithms and and automation in the criminal justice system do you think that's too extreme or I don't know I mean I think the problem with the criminal justice system is it's having spent so much time you know as a technology reporter and then I spent like a year and a half basically in jail and prisons talking to people and and people who had gotten out the terrible of Ness the terribleness of it is so much deeper than algorithms and so indefensible and so many levels that I guess I'm I just don't think algorithms are the only problem and I think it's a really complex problem but I mean I was so horrified right the there's this trend now that you can't ever have human contact so all these jails that I visited just have you can only skype with your relatives you'll never be able to see them in person many jails are being built without any natural light and no outdoor space and you can be in there for two years it's just shocking yeah but then there's this I mean being on a couple of foundation boards I know that foundations and society likes metrics and I think one of the things that both the Koch brothers and the left wing have agreed on is that incarceration is bad that we're trying to lower jail populations and so we have foundations like the Arnold Foundation that are funding a lot of these risk scores they because they do seem to reduce jail populations and that that feels good to both the people who want to save money as well as the people who don't want to see people in jail but you know we were talking to a judge recently and I think this is another thing that Chelsea's been working on but they're being let out with all these conditions with GPS ankle bracelets curfews you know one of the judges said you know these are kids who make have gotten stuck with minor infractions because they're not good at following the rules and then the lawyers come in and bargain less jail time but with tons of rules that they're never going to be able to follow and so they're gonna get dragged back in again and so it's sort of interesting to see that as you optimize for a single score which seems like a good proxy for bad because these jails seem so horrible you may be just smearing the problem around into other places that aren't being measured and I think as a data scientist that's also to me an interesting question is because you know are you looking at the right numbers and could you be reducing the problem to something that maybe isn't representing actually I think it's a really good point what I mean my basic feeling about these algorithms having looked at them for so long is that the reason they exist is because people need to tell the public that they're only letting low-risk people out so it's part of the movement to end mass incarceration and that is a very good goal and this is basically the political step that people feel is necessary to accomplish that goal is to tell the public look we've sciences here don't worry sciences on it and science says these are the cool people they'll be out and you'll be safe right and so it's a political story more than it is a data story really the data is there just to solve that political problem yeah and actually I'm Kate Crockford from the ACLU here but she like there was a you're not from Boston but in Boston there was a algorithm that really actually a MIT team one for scheduling the busing and actually the team that one I read some of the stuff that they had talked about before they were actually a very thoughtful team who wanted to go out and and talk to the community and figure out what they want to optimize for and stuff like that but they created an optimization algorithm for school busing but then the outcome of the algorithm was a terrible system outcome where you had kids elementary school kids starting at the setting 715 in the morning getting dumped out of school at 130 and the parents were in an uproar and I think the mayor's office initially says something here was trying to optimize for high school learning outcomes right and then later they said something like oh but we were also optimizing for costs and maybe they were saying oh and we would save the money and then we poured into pedagogy or something but but but in any case a lot of people are blaming the algorithm right and I think the thing that Kate allowed me to write one sentence and co-author a op-ed for her that she wrote and but but it was it was the point was don't blame the algorithm it's the political system that created the optimization that the algorithm was set for right you know optimize for money over the convenience of the families and so I think that's the that's why I was kind of pointing at we should all decide right I think that's the really big meta problem is we don't really seem to be good at figuring out how to decide and I think part of what you're doing is you're using math and science not math the science I guess math and science and algorithms and data to make it so that we can see what's going on right and reflect so that we can then inform ourselves and then decide things and I think that what the problem though is that the deciding part seems to still be somewhat broken I agree I guess the reason I keep like kind of pushing back on that is that essentially I'm just really good at problems and I suck at solutions I am really good at diagnosing problems and I guess I just want someone to pick up that ball and run with it like I have my skills and and I but I do think that correctly diagnosing a problem you know until we did this math people didn't realize it was the optimization of the algorithm for fairness and so I'm glad we brought that to the table and I hope that people can thread the needle from there and so I feel like my my value and the work that I do and the work that I hope more journalists will do like this and more activists is like bringing really quantification to these problems it makes it addressable right I mean Facebook everybody's writing articles like Facebook is so bad they're so big that's not an addressable problem right the addressable problem is the thing I showed which is like you can buy ads targeted to Jew haters on Facebook because they had an ad category right and then they took that ad category away like so I'm in the world of addressable problems yeah yeah yeah and I guess in the case of something like Facebook it's their job to address the problem and so it's it I think that's I think you know they they are thanking you probably somewhere for their service but I think when it involves a political system it's a little bit tricky we're sort of at the half the comments part I don't know if there's does anybody have any questions maybe Kate and then okay it's a throwing situation there's actually a little warning underneath that says don't throw at people's head okay is this on okay great sorry hey guys so I just had a couple thoughts one is that it strikes me that you know if we are producing risk assessment tools to for example say there was a risk assessment tool in the criminal justice context that instead of determining whether or not someone would go to a cage or remain in a cage would determine whether that person needed maybe direct cash assistance right do you need help getting to court for example here's a bus pass or so in other words you know it seems to me that the risks involved with risk assessments can be substantially lowered if not eradicated entirely if the the action that is taken at the end is something that it doesn't matter if there's a false positive or a false negative because it doesn't hurt anyone to give them you know health care or a babysitter or a bus pass or something like that and that maybe we should start using risk assessment tools in those types of situations because it'll help us get more data about how they actually work and stop using them in context where a false positive could be really detrimental so that's how they're used in Canada so they were first developed in Canada and the people who developed them all had this intent which is like they're actually called risk and needs assessment so what are your needs and basically in Canada you they actually try to meet your needs when they came to the US they still have the needs part of it but the judges that I have spoken to say look I only have three treatment beds for drug right now and so I can't give it to all the people who need it right so it's nice to have this need section but also the needs at least with the compass are like in green and kind of like and the risk part is like these giant red like high risk you know and so the judges are also really scared of being that statistic where they let the person out right so right they're just guided by the risk portion I will say this the in the California prisons they are only using the needs right now and the I went and spoke at San Quentin and everyone there knew their compass risk score like immediately and they were like it's actually good to get high risk because you get more services and so they were fine although they weren't super happy when I told them about the bias and if and then they were like whatever we all have high risk scores anyways right but nobody cares like you're saying yeah getting a high correct it didn't have any punitive to the gym for longer exactly special classes or something right so that was one thought another thought was that just like you said Julia you know the reason I think that we are turning to these tools is because of things like the Willie Horton problem and for folks who don't remember Massachusetts political history there was a crisis here when Governor Dukakis let somebody out of prison and he killed someone and as a result of that I think judges are terrified of the political consequences especially in places where job judges are elected of letting people out of prison so we as a society really have to change the political zeitgeist so that judges aren't relying on tools like this to sort of deflect personal responsibility because they're scared of what may result from you know a bad decision and then I had a question which is how did the folks in Broward County respond to your work and did they actually change something about how they're using well they were really happy because they were like look we won't wanting to join these databases for a long time so thank you for doing that work and also by the way we're not going to change anything I was like okay you know they were like look the company that built this course says that you're using the wrong definition of fairness and I was like okay you know and just I want to add one thing to what you're saying and is I think we work with the town of Chelsea and they have this thing called the hub which is not only the police department but other social services and supports and I think that they're not using the risk scores but they're they're they're trying to address the underlying causes and this is sort of Carthage's causal stuff and you know there's really interesting like failure to appear right could be like you said all these different things and so I think when you're looking especially at the pretrial stuff if you could just get one layer deeper and figure out what the failure was then you could separate the people who could be helped by a little bit of you know bus money or you know maybe they need some medical support or maybe they you know they're out committing a crime and what they all turn out is failure to be it's kind of like when diabetes was one thing and so so I think a lot of it could be addressed by having more data but my concerns still about something like that is if we created a massive database that identified all the needs and all the vulnerable people you could use it to help the people but you could also use it to discriminate against the people and to sell you know for for-profit universities you know spam to these people and so so that's I think of the other fear that I have about creating massive databases to help people as you can use the same databases to hurt them right and do you want to and then what sorry yeah thanks this is pretty interesting and it's pretty provocative to think as a problem of forgiveness instead of bias and I think has quite a bit of value so you were saying that you're not good at finding solutions but identifying problems that's the first step that's good but let's see if I can try to help in identifying some solutions what you're finding you need the definition of the definition of firm of for fairness what exactly is fair well if you identify that for a particular problem you have two communities whose experience follow two different distributions like the case with the insurance companies in which black neighborhoods had this kind of increase in premiums where the white had some kind of a better forgiveness maybe fairness would not be to actually make the white communities go up and start paying as much as the black communities but have as a policy that whenever you have communities having different experiences just map to the better one maybe that's something that can be constituted as a policy what do you think I'm definitely in favor of more forgiveness instead of more punishment I agree with you I think when you're talking about companies profit margins like their likelihood of adopting that might be low but you know but I would I would do that that's why nobody's letting me run any kind of profit company and I've heard that a number of times that we're willing to try to be more fair as it does as long as it doesn't cost us money right which is I think you have to throw it and actually we're gonna go here and then behind and then to you thank you it was very interesting because it illustrated structural inequality in a quantifiable way and the first question that comes to mind is when we do that who do we serve and what do we serve sometimes measuring and rating and demonstrating is really the easier even though it's very complex as you said that but the easier task when we look at constituting fairness yeah so in order to be fair not necessarily we have to be more specialized in analyzing data but more as you said more specialized in inventing new criteria how can we resolve the issues for example can we quantify the economic loss to society for all the biases that are being done can you do that it's a question that seems hard I would like to would you well I think it's gonna be really hard because there's so many confounding factors that I would say that by sector right like you can do I can do it for insurance I can do it for criminal justice in some small ways but also economic projections are not are different a little different than what I do and so I wouldn't want to try necessarily because I I'm sort of against the future like I don't want to project I'm really into the ground truth like basically I'm like what is happening on the ground can I quantify it and that is my sort of sweet spot and there are people who are really good at predicting the future and spinning out a story from the ground truth but I'm sort of a specialist in the ground don't you know they say you predicted by inventing it the price that we are paying is current for structurally in a quality we are paying yes and I'm doing what I can let me just say I'm doing what I can maybe you behind you behind you and then to Judith and then to over thank you so much and really I'd like to actually come back to a point that you just made about data being able to be used in either way right and combine it with your point Julia about solving or thinking about solvable problems because one thing that I keep thinking about in particular as you make your suggestions is how are data analysts trained nowadays right and I really see that as a as a key component to actually a potential solution to the problem because if we obviously only train them in you know using data and maybe targeting it and tailoring it to the extent as they can then we will never give them the opportunity to develop that conscience or that that awareness of the wider implications and to be honest just as a sort of side comment as you were speaking about the legal system I would actually argue and sort of have a legal background so it's not that I'm speaking completely of of the top of my head but I would argue that part of the problem is when you try to make the decision based on data so when you basically have people in charge who think about how can we come to a solution based on the availability of data then you get these weird outcomes well if you think about the other think about it the other way around in terms of what is the independent of the data availability of data that we want then you can get different outcomes so I think to me if I may make a suggestion the solution would be to bring more social science education to the data scientist I mean I certainly I definitely agree with that point I mean I look at it through the lens of journalism and so in journalism there's just you know because the profession is so underfunded and under pressure but also because it's not really filled people don't choose to go into it because of math literacy it's only people like me who fell off the train somehow and got there so you know journalists are too happy to receive also to write about the available data right I always joke that there's like three clean data sets baseball the Fed and polling and wow what is 538 right right I mean it's like really easy to receive data sets and then make a visualization and write a hot take and like it's not easy but like it's something right but I believe we have to be artisanal and we have to basically collect our own data we have to think what I do is I think what question do I want to answer and then I think about what data have to go get and it's a total nightmare every time I edited like why do all your stories take so freaking long well they're artisanal so it takes a while I really like the idea of artisanal data but I also think there's an interesting side to this work that's a little bit less on it explored I was very struck in listening to the insurance piece because it seems that in a rational world the insurance companies would not be doing anything like this because it would seem like it's costing them money to be treating people without equality and so what I think you have also is a very very interesting set of data to help us understand the motivations for some of these structural inequalities and I think that's a really important thing to understand because they're not just mistakes you know they're systemic things that people are doing that they intend to be doing I think it's partly certainly why you get so much pushback like okay thank you for providing us with this data now let's put it in a drawer and go away and so I think you know one thing that comes out of it is you know you don't have to look at it across all of society but you can basically say okay we can now understand how much our insurance companies willing to pay to be able to treat people unequally and that's something we haven't really thought about in that way what's its value to them why is it so valuable right is it is there some reason why it's economically valuable is it something that's not economic that they're doing so I think that's an interesting piece of information to understand for itself and I think it's also essential for understanding how we can change this because if we just look at fairness with the assumption that everyone's ultimate goal is to be fair and leave out trying to understand these motivations we're not going to make much progress right I think they're right although I suspect that what they've done is just raised I mean they're not actually going to lose money right so in order to give this discount which I suspect is some sort of marketing cost in their mind they've raised everybody's higher right so that that line that looks linear would have been a lower premium to begin with but I agree with you like the economic incentives are obviously what drive these decisions at least on the for-profit company side of it and it's definitely worth exploring and I'm I'm working on more stuff along that line right now and it may some of it may not be economic I mean there's a woman whose name I've just based on who did a lot of interesting work on she has a book called pedigree how elite companies hire which shows that you know they will systemically make a lot of really really poor hiring decisions because of just embedded beliefs of what kind of people they want there so you may uncover that they are doing things that are actually to their own economic harm right but it has to do with their view of the culture and society it's not necessarily a good economic decision they're making right and I suspect that that is true also I mean I don't I haven't yet met anyone in the insurance industry has led me to believe that there's like a person sitting there going haha I'm gonna figure out how to get these people and not really screw them I don't feel like that I feel like it's a lot of well-intentioned people who don't who are shot who were shot as far as saying well intention but although I will have I will say they invited me to speak at their convention but I was like why are you inviting me the whole industry's trade group they have a meeting in Texas for all their top lobbyists and they said we want you to do a keynote I was like are you sure and I asked like six times and I was like I'm gonna talk about the work of the lights fine it's fine fine and then I got there and they said send the slides the night before I fly in and I give them the slides so they can load it up in the machine and they were like oh we didn't know you're gonna mention the names of companies because it says Geico and they said you can only speak if you take the names out so I didn't I withdrew yeah and I sat in my hotel room above the ballroom and did a tweet storm during my supposed session trolling them about how I wasn't speaking I was supposed to be on stage and it was called pro-publica like this whole talk was supposed to be an interview with me but anyways that wasn't your point at all but I do think though that I don't really believe I one thing that I think is a fallacy that sometimes is so easy and such a narrative that we all want to believe which is like you find the bad person and root them out the one who's making this bad decision and I think it's oftentimes not that I don't think it's necessarily bad decisions my guess is that it has to do with just fundamental understandings of risk yeah and just a societal systemic assumptions about yeah what makes something risky correct oh hi so there seems to be an objective unfairness and then there is the subjective feeling of unfairness that may or may not correlate or show up in a in a visualization and it seems quite clear that the system of compass for example is being unfair and at the same time that minorities in general feel they are being treated unfair by the judicial system they don't really trust the judges or the or the lawyers they're getting is there ever a consideration of well if there is this subjective component and there is a risk of for example reaching the fair trial right why isn't a right it a right of the accused to define whether or not these systems are put in place since there is this subjective component of them having the right to be to believe that they're being treated fairly not only objectively but subjectively I'm not sure if I totally understand but I'm gonna answer what I think or want the question to me which is right now the way our criminal justice system works is all the due process protections which are the ones that you think of of what is in to design to embed fairness into the system are only really required at trial and nobody goes to trial right pre-trial is really the only decision and then you plea and so there are very few trials and so the due process requirement has been totally ignored during the pre-trial phase and so for instance people have argued that people should be able to contest their score during the pre-trial and say look you know it says I'm a seven I'm a four now the problem is I don't know what that debate looks like I'm a four I'm not a seven how's the judge gonna adjudicate that right but at the very least you could have the conversation or you could have some other way to embed that discussion of risk into that but right now the defendant really has very little rights to to fight that battle about their quote riskiness and so some of the issue is just how to build more due process into what is effectively the judgment phase now yeah and I think but there are some cases where I think like in Wisconsin where they try to use due process to go after compass score being used in sentencing and I think and and and and and to your point they say well it's a it's a secret we can't tell you and and it just it doesn't make sense because you couldn't you couldn't say that if you know well the challenge is that and the compass challenge was when due process is I think up at the Supreme Court now but is is that judges are really differential to other judges so essentially every ruling so far on due process for risk assessment the scores has been like you know what judges can consider whatever they want in sentencing they can just not like you and sentence you right pre-trial judges can consider whatever they want there's sometimes there's a bond schedule what you have to follow but most judges have extreme latitude and when it's appealed the judge above them is like well judges are awesome they shouldn't really get what they want I think Madaris had one then we'll go to Sean and then yes all right we at media lab have been working on new cryptographic techniques that would let you fuse data sets in privacy preserving way for example you could fuse together data sets that relate criminal history and data sets that relate mental health that nothing about into each data set gets revealed but you still get utility out of it and there's a national question about weaponizing this because normally what you could compute on you could also FOIA at least in certain settings and here this ability for you to peak under what computation was done under the veil would no longer simply exist can you share some of your thoughts of what would be like to be doing investigative journalism in the future like this using homomorphic encryption using techniques like it's really difficult right so there's I'm really in love with math as you may have noticed and so I love stories that are like my friend at BuzzFeed just did they've done a couple of these like where statistically it's impossible that judges in figure skating are fair right the data shows that they're they favor their own country in ways it's just there's no way for it to be explained but other than bias right but journalism is reluctant to to do probabilistic findings right it's a difficult travel for me like I have to produce those people Otis and Ryan like I have to have anecdotes that is like what is the that's the currency of journalism is the narrative and so I love the idea of pioneering these these areas where you're like I can't see it but I know enough to know but I think mainstream journalism is not quite there yet and then I asked my dars if I could actually steal it since I was here so my question is actually your actuarial science is really expanding in the age of big data and actuarial science is fundamentally based on this idea of risk so my question is is risk like fundamentally a reductionist neoliberal concept and if it is is there an alternative concept that you'd like to see data science start to orient itself around for modeling purposes that's a hard one I do think risk is often narrowly and politically defined and people are unwilling to acknowledge that so I think that is true I still think it's useful in the sense that what I really love the most is the fact that I'm not expanding beyond the scope of risk and I'm still showing these companies are not doing it right so I'm like I came to your playing field and I'm using your rules as not you don't have it going on right and so that to me is still the best proof right like I agree there's definitely like other ways to have those fights but for me like I like to win on the playing field of the opponent throw it to this corner and then throw it oh whoa nice hi so we're talking about a structure and kind of like how insurance companies like look that zip codes and determined how rates are based off that but I was wondering kind of how they respond to any changes in the urban environment for example if the socioeconomic factors of an environment change and if they're the risk or the rates change over time in response to those factors and also like the economics as you were saying how those correspond to those those dynamics and the insurance companies are interesting because they are you know they were like kind of the first algorithm users but they're sort of really old-school because of that so their systems are really legacy so they kind of update their rates every couple years right and so every couple years they'll file a new thing like okay this code has adjusted a little bit this way and it's supposed to be based on the risk that they have seen in terms of what they've had to pay out in those zip codes now that data is secret so all I can see is the average that everyone has paid out but I don't I don't know you know how well they're policing it because it feels like it's pretty disparate right the reality versus what they're charging particularly Geico was insane right so so I just I don't know how often because they don't have much public scrutiny I mean they do file with the regulators but what's interesting about the way the regulators look at it they're not looking at this question right they're asking a very different question of the data which is basically their main question for an insurance regulator the only thing you care about is do they have enough money to fund all the possible claims or are they going to go under right so that's your main question as a regulator and so they're not really looking at this question so I don't know how often they check and I believe that basically when people don't check their metrics they fail to update them and then in the back so considering the fact that it might be very hard for us to get criminal justice systems to stop using this data to make decisions based on what we think people might do in the future do you think it's possible for us to start using this data to get criminal justice systems to repay defendants for wrongs that the justice system has done to them in the past to repay them can you expand a little bit on what you mean by that like pay them back their bail money or the time they lost from being in jail so I know that some like Canada and California are doing things like that for nonviolent drug crimes so you know other things like that so yeah or paying back bail money or somehow finding a way to you know it's a little hard to quantify finding some way to repay you know repay you know the the injustice that the justice system has done to a defendant that was wrongly labeled as very high risk and right I mean I think in generally I'm philosophically inclined towards reparations I think if you can quantify a harm and and do right by the person who's been harmed it's a good idea and so I think it's it's very complex in the details and I don't know what harm measurement would work in this case but in many cases like torts and stuff I think the harm is on what your potential income would be and if you're a poor person it'd be much lower than a rich person and that would be unfair as well I think but I think it's interesting to figure out how you might do a retroactive fairness and yeah and then over that way this is she represents Twitter here I think okay yes so Twitter the community not the company I am Twitter I have a question from Twitter that I'm going to combine with a question of my own if you don't mind so CJ on Twitter asks what's what's your best practice for being such a thorn in the side of justice systems or unjust systems that they have to listen to the data logged at them and I want to follow up on that by asking you about your experience with the backfire effect and this idea that people when presented with facts and data about their biases or facts like climate change or racism say that's clearly not true you've just reinforced my position to the opposite I guess what I'm kind of asking is do you read the comments on your articles but he's asking right you found a way to get through that okay so first of all I'd never read the comments okay secondly that's like my main best practice in life no comments so I have a whole jihad about how I do journalism which I'll just give you like a short version of which I believe that journalism has needs a guiding light so for a long time I was raised under this idea of objectivity was our guiding light and that really just became false equivalence and so everyone I think has sort of agreed that it's no longer good but doesn't have a new load star and I'm arguing mostly shouting into the wind that we should use a scientific method as our load star it's the scientific method is super nice because it's actually kind of like a little loosey-goosey when you really look at it it's just like do you have a hypothesis do you collect evidence for it and then you know do you have reproducible results that's your goal right and those are my goals and that's how I run my investigations is we come up with a hypothesis and then we figure out what are the tools and data we need to test this and mostly we do lots of testing so I always tell this story about our research on to Amazon I'd heard that there was price discrimination on Amazon if you used a mobile browser versus desktop you would get from prices so we set up all this big experiment in the cloud and we had all these Amazon accounts and we are running it for months and the data was just not there it was like there was really no difference so then we were like but we saw something weird between prime and non-prime so we're like okay let's test that so we for months for testing prime versus non-prime is there some difference in prices no sad so you know then I had given up on it and three months later I went to the bar with a guy who's an expert on antitrust and he he was telling me about how terrible Amazon was to the booksellers and I was like yawn and and then I was like you know I've been doing this test but I can't figure I just don't have anything and he was like oh well the thing you need to test is this Amazon advantage itself first when it's a seller versus third-party sellers that's the test so we went back and ran that we already had all the accounts set up within the cloud within the Amazon cloud running away and boom immediate results right and that is like I have seven of those going at any time and most of them are total miserable failure this legal oh we have to take a look at prices so I believe in the idea that you don't know what your story is until you've done the tests right and most journalists get a tip and then they report out three anecdotes and they're done and then they go to the data desk and say build me a visualization and then those guys say actually you know what the data doesn't support your anecdotes and then they have a fight and then the data guys get really sad and then they quit and you know that's what happens and so I'm trying to build a new way of journalists and programmers working together and my team is two programmers and a journalist and a researcher and we are like four people we work collectively from the beginning on these projects and I mean I the the is it legal part is only half a joke in that let's just at MIT we actually can't do a lot of the studies that we'd want to do because and I'll just advertise this because I think people should some people know about it but there is a law called the computer fraud abuse act that was created after the war games movie because everybody got afraid that people would hack into computers and it says that if you use a computer in a way that is against the intent of the person who runs a computer and it's online that is a felony that will throw you in jail and terms of service has been deemed as a description of how the person wants you to use the computer so if you go on to Facebook and try all these experiments it could turn into a felony and we've we've seen cases of that obviously so so it's interesting also how and I'm actually going to sort of name names a lot of these companies who are really into trying to like do the right thing when it comes down to these laws there's also the anti-circumvention law which is that it's a felony to break copyright protection on anything except for a very small number of exceptions so if you have a algorithm running on your computer but it's it's protected you can't audit it and and these are all really stifling things for researchers but you can imagine Hollywood doesn't want to loosen up copyright protections and software companies and online companies don't want to loosen up your ability to get and research how their systems work and and and the fact that a lot of people who talk about internet freedom and all the stuff don't talk about the fact that these laws are impeding research I think it's sort of a shame and yeah I mean they're not impeding my research but your research is all legal but other than but but but I do think it's something that we need to push against because unless you push against it it won't change but on that happy note I'd like to thank you so much really this is thank you