 But of course those of us who work in human rights know this is nothing new and it is not an accident that I put mr. Milosevic's picture here because telling grotesque lies is a common feature of people who abuse human rights and So in human rights it has always been our position to struggle to tell the truth and we are successful because we are persistent and because we speak with moral authority and I say all of this Speaking to you about statistics. I want to emphasize that I think statistics are probably the least important part of a human rights argument The most important part are the voices and experiences of the victims and then there's all sorts of other things There's the law the legal argument the forensic material the satellite remote sensing imagery There's tons of other really valuable stuff and the statistics are there as well look. We're a footnote But we have to be right Like all the rest of the pieces in the story we have to get it right Here's the problem when we're doing data collection, we usually don't know what it is that we don't know and Come on clicker. There we go And this is the key because then we don't know if what we don't know is systematically different from what we do know Okay, and this is a really big challenge because if we collect a lot of data the work It takes to collect all that data often convinces us that we have it all That we have so much data that we're our data is somehow representative of the total world And I'm here to say that's not true There's a reason that you have the data you have and that there's a reason you don't have the data You don't have the data you have you have because people trusted you enough to give it to you And the data you don't have you don't have because those people don't trust you That's going to be true for each of the groups in your space Which is why I say we need many different groups Okay Human rights stories are about that the worst possible things that can happen to someone the most terrible days of their lives Experiences that leave them traumatized for literally the rest of their lives indeed for generations that trauma may persist Why should they trust us? with their stories Many of them won't and that creates Bias not bias in the sense of racial or ethnic or religious prejudice bias in a technical statistical sense It means that we are more likely to learn some things than others And I'm going to illustrate this by looking at one of a really really really good database a database about Iraq called the Iraq Body count and this is a really good database that nonetheless produces deeply misleading statistics And I'm going to explain to you why that is So in Iraq at the very beginning of the American intervention in Iraq in 2003 the Iraq body count Began collecting all the published stories about people who died as a result of the conflict in Iraq Okay, and so every newspaper I think in 12 languages They collected and they organized and they categorized them all together and they put them all in a giant database and from that database they kept a running count of the number of people who died in the conflict and if you Asked them they would admit that there were many people who didn't get documented because their deaths were in no newspaper story No one ever heard about it, but they said yeah, but we're getting most of it. This is a minimum number Here's the problem Which stories were we getting which stories were we missing and here's how I approach this problem? the Iraq body count was good enough to share some of their data with me and With that data they gave me the sources for every one of the incidents that they that they Reported that they had in their database and I organized all the incidents into size This bar represents events where one person died this bar is events with two to five people died Each of those bars is shaded by the number of Sources we learned about each of those events So what this says is that of these very large events with 15 or more victims Here You'll remember that the problem I'm facing you with is what is it that we don't know One way to understand that problem is what is it that we have zero sources for if we have zero sources We don't know right Okay, what size of events do you think we have more zero sources for does anyone guess Just shout it out Are there more events of size one zero sources? Very large Uh-huh anyone want to take another guess go ahead What why That's the answer the answer is that these these events with size one are generally only reported once and implications there's many many more In fact, we did a little bit of calculation and the probability that an event of 15 or more people being reported is about one We always hear about events that are very large But the events that are small with only one person we only hear about a fifth of them About 20% why is that important? Because they're totally different conflicts These were two completely different conflicts happening at the same time the large events were largely committed by al-Qaeda in Iraq or by Coalition collateral damage. They were committed using either IEDs improvised explosive devices or air strikes and massacre the victims were a random selection of the of the Iraqi population and The goal was destabilization or control depending on which party was committing the violence the small events completely different These were committed almost exclusively by Shia militias using firearms against adult men and their goal was ethnic cleansing and Their goal was successful. They drove a million people out of Baghdad and into Western Iraq where of course they welcomed Isis Okay, now I am not trying to say that bad data analysis led to the rise of Isis That would be a little strong however We looked at statistics To correct our understanding of the world we looked at statistics to go beyond what we can perceive ourselves But if we build statistics simply on the data we can see We are reinforcing what we see what we understood. We are simply reinforcing our own prejudices. That is not what statistics does That is bad statistics Okay, so Statistics is not simply counting what you have. That's accounting and bookkeeping Statistics is understanding what you don't have what you don't know and building a probability model that allows you to overcome that Let me skip the story and go on to this picture. Let's imagine that we have three databases Okay, and these three databases are denoted in this diagram by the white circles and We put the databases together and we've determined which cases are in common between the databases So we have the white circles we living in the world of the white circles, that's what we can see We can see what's in the white circles What's the world? Does the world look like the left where we see a third of reality or does the world look like the right where we see almost all of reality These are completely different worlds Which world do we live in? Well, as long as you're living in the white circles and you haven't done any statistical modeling You have no idea and the problem is those may represent sub fractions of the whole of the overall world So in practice in Peru for example We only did we're able to document about a third of the cases committed by the gorillas of Cinderella luminosa while we were able to document almost all of the cases committed by the Peruvian army now we had about the same number of cases in both and If we had just said well, this is how many we have we might have said well The two parties are responsible for about equal numbers of killings and we would have been wrong We would have been terribly terribly wrong About perhaps the single most important question for historical memory in Peru The stakes are high and so what I'm going to do now is try to begin unpacking for you How we can solve this problem and I'm going to do so by explaining how we study homicides committed by the police in the United States and This came from a government study done in published in 2014 in Which they took two lists one compiled by the US FBI and another compiled by the US Bureau of Justice Statistics part of the Department of Justice and these were all cases of people killed by police between 2003 to 9 and 2011 2010 was omitted for reasons that have never really been explained to me anyway They put all the cases together and they said okay These are the cases that we have from the Bureau of Justice Statistics These are the cases we have from the FBI These are the cases in between and they were able to estimate I'll show you the map in just a second I know you were all eager to see the map Okay, sorry whoa geez and they put them together and they estimated that 2100 deaths fell outside either of those databases and I'm going to explain in a moment how they got there That's how so I'm not going to do the algebra. I know Who here loves algebra? Come on Cliff put your hand up put your hand up. Yeah, okay So there's three hands. They're about 80 people in the room So I'm going to skip that I would be delighted to do so when we are drinking at a reception later This math is done very well on the back of a napkin Instead I'm going to give you a metaphor Imagine that you have two rooms that are completely dark. You can't see inside them and What you'd like to know is which of the rooms is larger and The only tool you have to measure this is a handful of little rubber balls Okay, and these balls have a curious property that when they hit each other They make a noise Okay, so you throw the balls into the first room and you hear Collect the balls you go to the second room you throw them with equal force and you hear Which room is larger the second room is larger. Why is it larger? You only heard one click and what does that tell you? They spread out Okay, that's precisely the intuition that we're using with this method We've taken two databases and we've thrown them into this space and we observe how often they collide And because we know how big the databases are and we know how often they collide we can use algebra to figure out how big the space is Pretty cool, huh? That's why math is so cool because now we know something we didn't know before We didn't know before How big in the population of all the people killed by police is Now we do we can estimate it and that's what this study did But it turns out unfortunately It's a little bit harder than that as everything in the world always is what we assumed when we threw the balls into the room Is that the balls don't know about each other they fly around independently? What if the balls are friends and they like each other and as one ball zooms by the other it goes Hey, buddy boom and it bumps or it gives a little bump Huh now what happens is instead of a few clicks. We hear lots more We hear too many clicks and that's called a positive correlation in the probability of reporting and that happens all the time Because when a lawyer is killed at high noon in an urban center Everybody knows about it But when a farmer is killed three days walk from a road at nighttime. We never hear about it Okay, the probabilities of reporting no matter what the reporting mechanism is are is going to be correlated Positively we're gonna hear about the lawyer. We're not gonna hear about the farmer Consequently we have to fix that and we have a bunch of statistical methods that adjust for those correlations And so we looked at all our previous projects in a bunch of countries and we figured out what those pair-wise correlations are and We use those pair-wise correlations to correct the estimate of police in the United States And it turns out that Columbia the lists that are collected in Columbia are very much like the list collected in the United States And so we estimate that about 10,000 people were killed by police in that period 2003 2009 and 2011 to put that in context that means That in the United States people who die by homicide Mostly die Being killed by someone they know three quarters of all homicides are killed by people who know each other One quarter are killed by people. They don't know strangers The single most likely category of stranger to kill you in the United States is a police officer That's why it's such a big deal Okay, it's not gang bangers. It's not terrorists. It's not serial killers. It's not mass shooters All those things on the on the left we completely freak out about people with handguns On the right people completely freak out about terrorists. You know what both of those are ridiculous We need to worry about police police are the problem in the United States for homicides by strangers So on that cheery thought Let's move on and look at Kosovo And what we did in Kosovo Is we faced a question and this question originally came from NGOs But it also came from journalists and ultimately it was a key question at the tribunal and we discussed that some this morning And the question is Was the Yugoslav government responsible for the killing and migration of ethnic Albanians now? That's a really big question. It's a really big and difficult question And we flipped the question around because scientists often do this when we have a question that we can't answer directly We ask it in a different way to see if it's easier to answer Okay, and so we asked instead could it have been NATO? Who committed this who caused this violence or was it the kla? And what we did is we got data from the border crossing guards in Albania as well as by from unhcr and from the Albanian government We also did surveys random sample surveys in refugee camps and in those surveys We asked people when did you leave your home? When did you cross the border? How did you get from here to there? How long did you spend on the road? And we used all this information to create models of people leaving every village every single day Okay, and then we had data from exhumations from human rights watch from a bunch of Albanian NGOs organized by The american bar association and the observer teams of the organization for security and cooperation in europe And we put all the data together And created some models and those models resulted in these graphs, which are the center of the testimony That I gave at the icty that we discussed this morning. And so now let's go into a little bit more detail about what it was I actually argued What I said is that at the beginning of the conflict There was a huge spike in the number of people killed and in the number of people leaving their home The line on the top is people leaving their homes And the line on the bottom are people killed and notice how closely these lines move together They move up together and then they drop together And then there's this quiet period and then there's another spike and then there's a quiet period And there's some kind of noisy stuff and then it just sort of trails off So we said look The first finding we can make is that these two series co-vary they go together Now that does not mean that one causes the other Remember we all learn correlation does not imply causation. That's not what it means But when you have two closely correlated Behaviors it often means that they have a common cause. There's something else that's causing both of them Okay, that's why they're varying together. They're responding to something else And so we asked what could that thing be? And we had patterns on nato bombing which the yugoslav government published on a website And so we took all the nato bombings and we plotted them against we said well No, actually the nato bombing happened generally in kosovo. It happened in Serbia earlier, but in kosovo. It happened Here and after Couldn't have caused it. It's too late and then we looked at KLA activity and the tribunal gave us databases of kla activity We said no the kla activity happened in different parts of kosovo It didn't happen in the same places where this stuff was happening So we rejected both of those hypotheses. We said those could not be the causes of killing and migration That's not the same as proving that the yugoslav government did it. Let me be clear But a really interesting thing happened on the night of the 6th of april So the 10th of april that year Sunday the 10th of sunday the 11th actually sorry sunday the 11th of april was orthodox easter And on the night of the 6th tuesday the 6th The the yugoslav government spokesperson went on tv and said you know There's a terrible war going on, but we are going to respect a ceasefire starting tonight in honor of orthodox easter Okay, so there's a government ceasefire that happens that day But it turns out that the kla and nato totally ignored the ceasefire. In fact, they increased operations So on the 10th of april The government on saturday before easter the government went back on tv and said well nato and the kla are ignoring this Ceasefire, so we're going to resume operations okay, so Yugoslav government stops operations and resumes operations It's a very strong correlation It's a coincidence. It does not prove anything But it's very suggestive and it's the kind of piece that when we're using statistical arguments in trials We add a little grain of sand To the balance of justice Okay, it's not by itself proof But it is circumstantial and it is important and it is useful and it is something that the finder of fact can say On balance of all the evidence Of which this is one small piece I can make a finding So we observe the coincidence, but let's be clear about the grounds on which we are making the claim We are not claiming proof okay so I want to start closing with some bigger picture stories Big data, just forget you ever heard the term. It's meaningless Okay, it's meaningless I've looked at a lot of data over many years and these are just a few of the sources that we've used And the problem is that If you take a bunch of data and you make a graph And if you make a map you made a graph by the way a map is a graph okay If you make a map or a graph or any kind of claim from the data You're using a model. You don't think you're using a model. You just counted it up, right? No model. Here's the model you're using My model is if I just counted up my data and I made a graph or I made a map the model you're using is I assume That my data is just like the world I assume that if 40 of the victims in my database are women and 60 of the victims in my database are men That 40 of the victims in the world are women and 60 percent are men Why would you assume that? There's no reason to think that's true If you find that two-thirds of the violence is in the north and one-third is in the south Maybe that's because the people who worked for you in the north. They're just much better at their job You have no way of knowing Okay, so you have a model that you're using Sadisticians call it the naive model There's a reason we make fun of it It's because it's not plausible So raw data is never a foundation never a foundation For a statistical claim. I know that's tough because every one of your donors is going to say, where's the graphs? Well, and you have to say well that would be using the naive model We'd have no way to justify that graph that would be incorrect Unless you do some modeling. I've shown you one kind of model. That's the model my team and I use there's several others I just happen to use the one I use because it's the one I use but there are others and I'd be delighted to talk to you In q&a or elsewhere if you're interested um So there's three ways and this is it these are the three to get reliable statistics per first you could have all the data All of it and sadisticians will call that a census Okay, a census could be a census of human populations. That's the way we usually use that term But anytime you have all the data every single piece of it You can do anything you like That's hard It's expensive and it's rare Okay, I know of one case in human rights where I think that's true Okay Second you can get a random sample of the population. This is hard It's very hard because it's hard to know what your population is and when we're looking at human rights violations Human rights violations even in a place where they happen very frequently are pretty rare Two houses in a two households in a hundred something like that three households in a hundred that would be a really severe human rights violation Well, that means if you sample a hundred households at random you only get two or three who have anything to tell you So it's really hard to get a random sample now. There are better ways to do it And there's a lot of interesting science around this, but it's going to take a statistician and an awful lot of work to get there Third you can do what's called posterior modeling or post stratification And that's the kind of thing we do in our team There are four or five different approaches for this kind of approach this kind of technique But it requires exactly the right kind of data. It requires a lot of math a lot of computing and some some Some sophistication on what what it is. We're explaining That's it folks Those are your ways if you don't have one of those you don't actually have statistics Okay And I know that's really tough to hear but that's the reality Anything else is self-delusion But it's worth it It's worth it. I'm going to tell you the story now Of this man here His name is edgar fernando garcía And he was a student and labor organizer in guaramala in the late 1970s and early 1980s and one day in february 1984 he left his office And he didn't come home His wife nenez She knew what that meant Okay, and when he didn't come home She organized all her friends and she went to every police station every army base in guaramala saying do you have my husband? Have you arrested my husband? Where is my husband? She organized legal challenges to the guaramal government. She got the embassies involved Amnesty international had a campaign. Where is my husband and the police were like We didn't do I don't know. You know, I mean he's he's a noted left wing. I'm sure one of the other leftists killed him Honestly, that's what they said to her Okay But in 2006 In a giant warehouse in guaramala the human rights ombudsman discovered the archives of the national police Three warehouses including 80 million pages of paper 80 million pages of paper covered in bat feces dead insects and mold filthy We spent years cleaning it processing it organizing it cataloging it and my team took random samples from it We randomly sampled the documents so that we could statistically characterize the entire archive And so when they found the document That described the campaign in which mr. Garcia was arrested tortured to death and had his body hidden The officers who did that Were arrested brought to trial And convicted sentenced to 40 years in prison Now part of the trial Was showing statistically that the documents used in the case were perfectly consistent in every statistical way With the normal flow of documents in the archive Now why is that important? That's important because it's evidence that this campaign was completely normal practice There was nothing special about it. These guys were not rogue agents They were following orders the way bureaucracies work is that Goals are set at the top those become plans the plans become orders the orders are sent to the operational units The operational units go and do their jobs and then they write reports and their reports go back up the chain That's how bureaucracies work and we found that chain of documents for many many many campaigns We only found fragments of it for this campaign, but statistically we could show that this campaign was completely consistent with normal police practice And these guys they got in court and they said yeah your honor we did it and we were just following orders And the judge said, you know Yeah, nuremberg Not a defense. Thank you though You're guilty. Goodbye, and she turned to the prosecutor and said go find their boss Go find Colonel ecter bolder la cruz director of the national police During this time and he was arrested and if you look carefully, he's wearing cuffs Because this photo was taken while I was sitting across the room testifying against him And our evidence again showed that this was completely normal He said I don't know who these guys are just a rogue agent So we're like no, this is exactly perfectly bureaucratic normal practice Guilty convicted sentenced to 40 years This little girl Is a grown-up human rights lawyer in guatemala now and here she is embracing her grandmother mr. Garcia's mother This is what justice means This is why we do human rights work Because We need a way to help family members know when to speak about their loved ones in the past tense Thank you very much