 So, election forensics is a term that I invented back in 2006 to describe work I was doing at that time, especially using some of the digit tests that Jazz was just talking about. And they'll make an appearance here. You can see there's a 2BL, second digit test thing. So that's where a lot of that stuff will come up. But the main arc of the talk is about a slightly different problem or problem that I discovered. When I first heard about these digit tests, an undergraduate student approached me, and of course I was teaching on campaigns and elections and American politics, and asked me if I heard about them, and I said, yeah, I think I'd seen a web page where they were used to apply to data from the United States, and I thought they were basically like astrology in their reliability, meaning I didn't believe in them at all. So fast forward, and then I discovered after a long while that it seemed to me that the tests respond to strategic behavior, people voting strategically as much or even more than they do to frauds. Otherwise, there's frauds all over the place, and I'm not sure that's true. Secret frauds that nobody knows about, for example, across the entire United States over many decades. I don't think so. But there's strategy all over the place, of course. So part of it is the ambiguity with that particular test, but also other kinds of tests that people have proposed to measure or detect election frauds also have appeared to respond to strategic behavior. So the generic problem I think is that it's really hard to distinguish strategic behavior from frauds. The commonality is that both happen if votes are shifted, and so one person may decide to vote a different way because someone has a gun to their head, but someone might decide to vote a different way because they think their favorite candidate isn't going to win, so I might as well vote for my second choice. And so all these methods detect shifts in votes, but you don't know who shifted them. So it's not astrology, it's just ambiguity about what these things mean. So I want to show you that ambiguity along with a couple of other slight detours about how easy it is to maybe fool one of these tests at least by faking data very well, and then hint that maybe there's a method that possibly can mitigate, reduce, maybe not completely eliminate this confound between fraud and strategy. And so that's the clinic model and its variants, mainly the variants about that. So without further ado, Election Forensics is a term I made up for a paper, but I think also I wanted to create not only a paper and a book or more, but a field. And now that a lot of people work on parts of this, I worked some with computer scientists who are concerned about hacked voting machines and all that kind of stuff and election administrators. But specifically the field devoted to using statistical methods to try to determine whether the election results are accurate. I think is the close to best definition of what I'm about. And the definition of an accurate election is one where the outcome, the collective choice, is the one implied by the intentions of voters given the election rules. So that turns out to be a pretty theoretically rich definition that I'm not going to spend a lot of time talking about today, but the implicit is that the election rules that are in place are normatively fair. There's a lot of work on collective choice in political science and economics, aero's theorem and all that kind of stuff. So I'm basically assuming the kinds of criteria for normatively good outcomes that you have in an aero's theorem, which of course implies that through Gibbert-Satthwaite and other kinds of results that people will act strategically in this election kind of context. So whether people psychologically want to do this is a hard question, of course, but if at least some voters are rational actors, a lot of voters do it. They may not do it all that well, but they certainly do it. I mean by strategy more than just the kinds of strategies that come up with this, however, so you might just adjoin the concept of political mobilization as something that's strategic in my sense here. Again, I'm not going to spend a huge amount of time on this, but one implication this has is that if you have kind of a non-strategic sense of what the distribution of, say, vote counts ought to be based on just a large sample kind of theory, there are millions of voters in many cases distributed across, say, polling stations or precincts. So you might just, you know, okay, large sample, how about normal distributions? Well, the problem is that the independence that drives a lot of those large sample results is specifically not what politics allows. People form coalitions, you look to your left, you say, well, those people are voting this way, so I will change my behavior. And so the strategic behavior in particular introduces these complicated dependencies across individuals that will change the distribution. I say here, make them complicated, okay? And that will maybe have implications for statistical tests which are designed, a lot of them, based on ultimately independence kind of assumptions. And I think frauds break that, you know, give me 100% of the votes from this place, independence is broken, but strategy can also cause that, right? So the big question for one of the questions for election forensics is this is the case that data from elections, accurate elections, do they have a typical statistical pattern? Okay, you might also ask the question, what about the opposite? A lot of this work, when I started with it, had the idea we're not looking for accuracy, looking for frauds, or fraud, if you think that's a single thing, I think it's multiple things. Maybe the pattern of fraud is typical, okay? And that's a research question, right? So we can do a little intuition here and show that intuition may lead one astray, and it's really better to do statistics. But strategic behavior can produce as well as frauds results that look weird. And look weird is in quotes because looking at interocular trauma, I see it, it doesn't look strange, or it does look strange, is probably not good enough, okay? And I'll show you a first case where I think, well, maybe you'll be really good at it, but I think you'll be fooled by the difference in what looks like something and what statistical tests show. But my point is that many of the patterns that we see that look weird, fraudulent, could also be produced by strategy, okay? So I just want to slow down a little bit just for one second, because I started off doing this concern provoked by the 2000 election in the United States, actually, which shocked me into starting to worry about these elections. And then a few years later, I got drawn into it and focused on the frauds and the wrong elections. But I think if I'm finding this confusion between strategy and frauds, there's actually a significant discovery in there, and I've come around to kind of see it as a discovery, which is that strategic behavior can cause distinct patterns in voting data. And so if they're regular patterns, then that might be useful for diagnostic purposes for political science research, because there are probably more strategic good elections in their fraudulent ones, so that contribution, if that's what it turns into, shouldn't be overlooked, okay? Fraud can also, it's not that they don't trigger some of these tests, it's that they also do, and the problem is you can't really tell the difference merely using statistics. So another message in the way I talk about a lot of things I'll show you, I have what I claim is knowledge about a lot of these elections that go beyond the data, may or may not be accurate, but you need to go beyond just mere statistics to do this diagnosis, either of frauds or of strategy or anything else. It'd be great ultimately if one can incorporate a lot of this extra information that's used for interpretation or informal kind of stuff, build it into a common model or an analytical frame, and part of the very end of the talk will show you some steps I'm hopefully taking to do that a little bit, okay? So types of data, I'll just put all these on. So types of data that are involved in this, the very first polling station level data is the minimum thing that I would, that I focused on for a long time, and these are the vote counts in each of the precincts or polling stations in an election system, and maybe things like the number of eligible voters, the turnout kind of things I count those as vote counts. Ultimately, it'd be nice if you could just look at that alone and determine whether the election was accurate or not. That's sort of where I started off. Other kinds of data, very recently I've gotten interested in geo data. You can map a lot of this stuff to show where the polling stations are or other kinds of locations, and maybe there's some information in that that's useful, and I have some of that geo data that I'll show in this talk, and the third category of things that I've also started looking at a little recently in the domain of supplementary information is complaints that voters or other people file after an election about the election. So I have some data from Germany where they've done this, and in Mexico I also have data which is not going to be in this talk where people file thousands as it turns out of complaints to nullify the ballots and so on, and so that's a signal of some kind that they have a problem. These are file strategically, of course, but I think they're a good indicator of some kind of maybe fraudulent activity, even if they're not necessarily a perfect indicator. To the end of what I'm sort of working on here, this is without some of the colors, the web page we've got. We have a grant from the USAID, I guess you call it a contract. It's the USAID. So I'm a co-PI on that to develop an election forensics toolkit, which will be a website and maybe some an ARC package that's not part of the grant. The website is where people can upload their own data and do a lot of the analytical techniques, some of which I'm going to show you in the course of this talk today. So little USAID logos in the background. These are the country radio data for all these countries. So see me in August for the final results of all this analysis. None of these countries show up in this particular talk, however. So a property, let me get one example. And it's a slightly off direction of the main theme of strategic behavior, but I think it's kind of fun and it illustrates some of the complexities of doing this work. So unimodality is a property that various people have advocated as a property that a good election should have. Most of the proposals that I count as in this area have been focused on defining what a good election should be, accurate in my sense, but they just mean non-problematic. And if it's not that, then it's fraudulent. And so I think that's weaker than we can possibly go, but unimodality has been one kind of suggestion. So these authors look at, they make a claim that the distribution of the proportion of votes for the winner should be unimodal, given turnout. And they, along with Peter Ordershoek and other folks from Caltech in MIT, which have long taken the stuff, argue that the distribution of turnout across precincts or whatever should be unimodal. And if so, it's fine. If not, it's maybe suspicious. So I'm going to look at two examples. The first one, I haven't studied all that much, but it's new data I've got and it's a very recent election, the 2014 election in Brazil. And so the N is large, 400,000 more polling stations that we've got from Brazil. If you're not familiar with the Brazilian election, this is the presidential election I'm looking at here. It's a possibly two round election. The top two winners in the first round running a runoff and that's the winner. So the turnout distribution, at least by the interocular, I don't see it, maybe looks unimodal and you can tell me whether any of you don't think it looks unimodal. And so here's the, I think this is the first round of the election and I think that looks kind of unimodal. You would agree, maybe. At least from the point of view of the usual standard that political scientists doing this have applied, they will show a plot and say, see it looks unimodal. So this would certainly, I imagine, pass that test of looking unimodal, a single kind of mode. So here's the second round election and again it kind of looks unimodal, maybe. Let's get the distribution of votes for the winner. I'd say these distributions don't look unimodal, maybe they look sort of unimodal. And again I think that Climac et al. would have a not see this distribution for the winner in the first round as unimodal in a way that I'll explain in a moment. But it kind of unimodal has a dominant peak and then it kind of always goes down, so maybe it's close enough. And here's the second round election and it kind of looks, it has a little lumps up here. Whoops, maybe it looks unimodal. Well my tone of voice suggests what you already have guessed that if you apply formal test to all of these distributions none of them are unimodal. So the particular test that I'm applying is the so-called DIP test, article by Hartigan and Hartigan and analysis statistics in 1985. There's an R package that implements this stuff. So basically the p-value is zero, the R package doesn't generate zero, it generates these dinky numbers. So none of those distributions that may have looked unimodal actually are unimodal. And whether this raises a question for the unimodality criterion or these elections have some problems, I don't know yet about this, but let's go forward a little bit. So maybe we can see them now that we know they're not unimodal, where's the lack of unimodality? So if you look down here, the very tippy top, there's a little shelf at 100% and maybe you think, okay, 100% turnout, voting is mandatory in Brazil, but 100% is kind of hard to get, so maybe that's ballot box stuffing, who knows, right? And in the second round, there's the shelf, if that's what it is, is even higher. Maybe that's the unimodality trigger. In the winner distribution, it's kind of lopsided, but maybe now if I look closely, I can see that there's a little lump sort of right here and maybe here and there's certainly a big kind of lump right there, so maybe the lopsidedness is actually lack of unimodality. The clinic algorithm that I will talk about later does actually diagnose a lot of fraud in this election, which may not be fraud, it could be strategy based on this thing out there, I don't know what to call it, right? And these lumps in the head of this voter distribution are frequent, there's one here, there's one here, there's one right there, and there's one maybe down there, so there are lumps, okay? But you might be asking, you know, what about a few lumps? Really, you getting excited about that? I don't know exactly what's going on in Brazil yet, because we just started that analysis, but let's look at Russia instead of Brazil, because I've done a lot of work on Russia and I know a lot about what happened in Russia. And so we're going to look at a couple of elections, the Duma legislative election in 2011, where it's a proportional representation rule. I'm not sure if anybody here doesn't know what proportional representation is, anybody? I can explain it if I need to, everybody knows, okay? And the presidential election, which is, you know, one winner, which in 2012 was Putin and in 2008 was Medvedev, okay? Putin's, anyway. I Hope's dashed, okay? So in Russia, it looks a little less unimodal, and if you test it, it's obvious isn't. This is the kind of thing that Peter Ordeschuk, when he advocated for unimodality and turnout, would look at the Russia. This is the winner's distribution, but still, there's an obvious mode out here near 100%, and so I remember him running up a conference once, saying, see, there's precincts or polling stations with 100% turnout, 100% of the vote for Putin. Obvious fraud, okay? So lumps, and also along in the distribution, you can see a lump here, and also there, and maybe one that's there, and there's one there, and there's one here. So lots of lumps in Russia, yes. Density of what? The question was, why should density, I have to repeat the question because you don't have a mic. Okay, so you asked, why should density be related to turnout? Can you follow up and tell me what you mean by that? Oh, that's basically the large sample kind of argument, I guess. I haven't seen a formal justification from Ordershuk and all, nor actually from Klimek, but my own version of that is, if you just have a common distribution across the entire electorate, so the probability of turning out, say, is 0.7, right? And you divide voters up into precincts, and then you compute what is the turnout in each precinct. So that's the mean of a binary variable, did they vote or not, and the overall mean is 0.7, or whatever number it is. Well, you basically have a bunch of random samples because of the independence assumption that's made in the voters, and so the distribution of those sampling means, essentially, should be normal and so unimodal. Interpreting it that way is what leads to the unimodal insight, right? Okay? That answers your question? Okay, so, and the winners proportion has a similar kind of logic, conditional one, okay? So here's turnout in this 2012 election. This is the vote for the winner, and you can see it similarly got this lumpiness. Well, let me fast forward here. So the problem is, if you look actually at the distribution of the votes under these lumps, it turns out that we're going to discover, and we'll test for this, that the number of votes is really distorted even in the maybe slight looking lumps that we have there. By distorted, I mean that the votes at those lumps and the lumps coincide with turnout or winners proportions or percentages, I'll turn the percentages because it makes the most sense, 100% scale. If the percentage rounded is evenly divisible by 0 or 5, then there's a number of votes there that is higher than if the percentage of votes is not evenly divisible. And the argument in these papers by Kira Colleenan and I, who Kira Colleenan is one of my grad students, is, well, to make a long story short, it's like any political machine, you have to send a signal that you stole the votes, and the signal is that I produce turnout figures when I fake the vote count or stole the votes or got people to turn out that is evenly divisible. And so that's a way that we argue that the peripheral actors communicate with the center. So to test for this, I use a randomization test. And so there's a few equations here, I'll just describe them quickly. Basically this is, we have the turnout number on this 100% scale. So the J here going 0, 5 up through 100, these are the percentages of turnout, or even these will buy 5, like 65% turnout. So we're going to consider a window around that 65%, or whatever the divisible by 5 number is, I pick 65 for talking here. And so anything that would round to this evenly divisible number, we're going to consider the average number of votes. This V is the actual vote count for, say, Putin in the 2012 presidential election. So this is the mean of those votes in that window, the round to that number. We're going to contrast that to the votes in a slightly larger window that is basically the 2% on either side of this evenly divisible by 5 number. So this would be the votes with my 65% focus, either 63 or 64% or 66 or 67%. And the test is whether the vote counts outside of this 5% rounding to number are lower than the votes in the window. So this is the set of those arguments, and we end up computing a randomization statistic. Basically we keep the turnout the same and randomly permute the vote counts that go with them and do that a lot of times. And then we can test. These are the appear, the non-randomized mean originally that we get, and then we have a lot of these randomization permuted kind of means that we result. We can compute a p-value for this by just seeing how many of these randomized means are less than the actual one we had in the data. And I'm going to use a false discovery rate adjustment to correct for the fact that we're testing this thing at lots and lots of data points and for lots and lots of candidates. And so the result of that leads to this kind of a graph. So you can see here, this is a display of the p-values that emerge from this procedure for several of the candidates. The liberal candidate, which doesn't mean, what do you think it might mean in Russia? Apparently these are not nice people. LDPR, the Communist Party, KPRF, the just Russia party, Muranov, who I think is much in the news recently for his anti-gay stuff. Prohorov, which is the owner, I think, of the New Jersey Nets. All billionaire, that's a billionaire. And Putin himself. The Zhuranovsky, Zhuganov, and Muranov candidates are widely understood to be phony candidates in the sense that they always run, they never win, and they all get along so with Putin. And so that's that. So these votes, if the null hypothesis of no difference is correct, then the p-values ought to be uniformly distributed on the interval from zero to one that is scattered all over the place. So you can kind of see that looks like that graphically for Zhuranovsky off to the left here. But clearly these values here are, I'm not sure if you can see that in the back of the room, but there's a little dashed line here that indicates the .05 level in case you think that's an important level. And so these are significantly less than that. And the fact that they're black means that even if you do the FDR correction, they're also significantly less than that. So for these three kind of phony candidates and even more so for Putin himself, there's a significant number of these vote counts at these evenly divisible kind of points that are augmented, right? And I could go on and on about Russia, but I'll just say it seems to me that this is pretty clear evidence of Putin's fraud in this election. And if you do a similar thing, not only in 2011, but in elections going back to 2004, you find a similar kind of pattern of votes being stolen, either for Russia, United Russia, which is Putin's kind of party, or for Putin himself and also for his stand-in in the 2008 election. So I think everybody also believes there's lots of fraud in Russia that we get here. No, he's the supposedly non-part of the club. I'll try to use that polite term. He has business interests in the West. I see he owns the New Jersey basketball team or something, so he's the savior candidate in the election to the extent you had one. So the argument is that he's not, has no significant manipulation. Seems to have some patterning that's not uniform, but not as much as the others. My view of this, I don't know exactly how it happened, but I imagine what happened to anticipate the story later, is that the votes in these precincts are entirely faked. And so what they did basically is just augment everybody's votes to make sure Putin had the most votes. And so Prohorov got drawn up a little bit by some of them, but since he's not part of the package, he doesn't count results that you should have. He doesn't get drawn up as much as the others. And I'll hopefully in a few slides show you the evidence that leads me to believe that these results are entirely faked. At least these results are faked. So hold on for just a second about that. So if we go back to the unimodality test, I'm not sure if you, just looking at the data, here's 2008. It's kind of unimodal except for this. So it's not just this obvious peak out here where this fraud is happening, because the percentages here are like 60, 65, 80, all along that route we have this vote augmentations happening, and there's no particular distortion that's as obvious as in the extreme tale. Okay? The other story I want to tell about this is that we first discovered this pattern in the winner's vote, especially this is turnout in 2008. So when I first ran this in my computer, I thought there must be something wrong with the settings of my empirical density computer, because really? But Kirill and I wrote a paper published in Russian as it turned out about this pattern, and people say it's fraudulent and all that, and Russian experts have agreed that that's the truth. And Kirill made lots of presentations in Russia and to public groups and also carefully explained to the head of Russian elections not only this test, but all of our tests in Russia. And so he likes to think, I like to think, and he sort of believes as well that we had an impact on the Russian election, because this was 2008's turnout pattern, and here's what it looks like in 2012. So it was sad for us in a way because we thought, oh my god, now the public is going to be harder to publish in English because it's gone away, and in a randomization test I invented and we got them anyway. They're still doing it, all right. So I'll just... I like to think that my research is having an effect on like Russia, but I imagine there are other people who point this out, yeah. Well, I mean, I think... I don't know exactly, because I haven't asked the people who run the elections how exactly are you getting your orders or how are you signaling back you're taking this. We have a paper that we argue from the center and involving transfer payments and all that. There's a big set of analysis that this is at the foot of, and I'll be able to tell you probably end of the year whether that still happens. They must have found another way. This was turned out to be too blatant, so they must have other ways of doing it. It may be the prevalence of it is much less than it was, but I don't see that in my own analysis. So I'm not exactly sure. I think basically the same kind of control from the center is happening, the publicity of it. I mean, in 2008 was an election that election monitors refused to go to. They said it's so terrible, not a before election day, and on election day, so they were just out of control here. I'm not sure if that responds to your question exactly, but that's sort of what I know about it. Yeah. Oh, yeah. Political machines apparently do that a lot. Someone told me in our audience once that he'd seen data from Chicago in the 30s and said, oh, yeah. That's the usual political machine pattern because I haven't seen the data myself, but it was a guy who was reliable. He said you could see the same turnout visible by five in those wards because that's the way the ward boss convinced the guy her up. I did what you asked because you can see the turnout. So he said it was general. I've seen data from recently in Chicago elections, 2008, and in Chicago you can kind of see the same pattern actually in 2008 in state legislative races, not so much the national race. But on the fakery thing, I guess I'm... Yes. I think the implication of the fact that it's greater at five means that it's not going to be greater at some other thing, but I haven't formally done the test at other windows. I know the mean is higher in that point than in the surrounding four points. I don't know whether at three is higher than at two. I haven't done that kind of comparison. You could easily modify the software I wrote to do that. I just haven't done it yet. Okay. But let me go ahead because it's not... The talk's not at all about Russia. This is supposed to be just the introduction. So let me show you... So the question is a test. So I did that work basically for the Wall Street Journal a couple of years ago when they asked me about the 2011 election. So that's when I developed this thing. A test that got published in political analysis of political science methods journal in 2012. So they argue that you can use the last significant digits of the vote counts to test for fraud. They also had a Wall Street Journal op-ed where they argued the Iranian election in 2009 was fraudulent because of the last digits. There's more to that story, but I'm not going to tell it right now. It's kind of embarrassing for them. So let me show you this. And so the problem with this last digit thing, this is the last significant digit. So if I'm talking about the first, second, last digit, if the number is 1234, the four is the last digit and the two is the second digit. It's the second significant digit. So if you believe in lots of distributions about numbers including the so-called Benford's law distribution, if the number of significance is good, prediction, they have a theorem in their paper that I absolutely do not understand. But anyway, the problem with this is you can easily fake uniform digits. If you have any pseudo-random number generator, probably it's on a calculator now, you can easily fake these numbers. So they have evidence that stupid fraudsters in Nigeria working on paper, have pictures of the ballots, they are faking it and you can catch them, a little more sophisticated than the Nigerians in the countryside, nuclear weapons and all that. So in fact, the fakery in the last digits happened. So my position is I've shown strong evidence that there is fraud in those votes and so here's what the last digit results in. So let me walk you through what this table full of numbers is about. So these are basically chi-squared tests the so-called uniform discrete distribution from the digits from zero to nine. Each one occurs one-tenth of the time. It's what the uniform distribution means. And so you can look at all of the polling stations or you can look at only the polling stations where the vote count is greater than 100. So LU is last digit uniform and with an asterisk it's the big counts. And so you can see this is the significance probability. It's kind of the objective Bayes adjustment from the p-value produced by paper in 2000 published in American Statistician by Sulky, Biari and Berger. Preach in Torres like it their name will be on a slide later. This is the formula for that thing. It's basically bigger than p-values and is used the same way. Long story short you can see there's many many polling stations 11,000 83,000 for Putin and the other candidates all have significant departures from uniformity not Vladimir Putin chi-squared of 13 with you know nine degrees of freedom p-value or significance probability of 0.43 for 83,000 actually for 95,000 ballots 95,000 polling stations and of 95,000 chi-squared of 13 too perfect for words ok? These are the polling stations here this symbol here indicates the ones that are the precise ones even a subset of the precise ones that were I showed the augmentation for these are the polling stations whose turnout is visible by five as well as the winners by five and you see the chi-squared for there is even more perfect 5.7 ok? We know those votes were manipulated so totally faked is my conclusion ok? We also have some other things other subsets of the data chi-squared of 4.9 8.5 the problem with this digit test is that it's just trivially easy to fake it out and I believe the Russians did that and if they fake this stuff you know who knows where the extent of the fakery goes ok? So that's one beat so I think the last digit test is not particularly useful just in case you're wondering the let me change to the second digit where I've spent a number of years of my life working on it and I'm not going to say much about Benford's law but I did put this little table up to show you the implication of Benford's law I could give a long discussion of Benford's law so let me just say that Benford's law implies that under many many different kinds of circumstances that the digits in numbers in say base 10 arithmetic in this case it works in arbitrary bases actually I want to check time here ok? Let me just speed up ok? So the second significant digit has this kind of a frequency of digits from 0 to 9 a 0 occurs 12% of the time in a 9.85 and so Perichi and Torres in a paper published in Cisco Science they actually developed this stuff in 2004 argue that if you look at the digits the second digits in a vote count distribution and you don't see it compatible with Benford's law then there's something wrong and so the mean of the digits you get unconditional mean if they do satisfy the Benford's law thing is 4.187 in vote counts I've discovered the first digit does not satisfy Benford's law the second digit often does it's not Benford's law there's a story about going to a conference with the mathematicians who drive double theorems about Benford's law and they said we've never seen that before so I call up the second digit Benford's law like test because it's kind of like Benford's law but not really Benford's law but anyway let me just give you a quick overview of this so I developed a way a few years ago to simulate election data that satisfy this distribution so I could artificially manipulate it to create artificial frauds as well as generate realistic features of the election data for example districts that are not exactly competitive 50-50 districts between two candidates under gerrymandering other kinds of population shifts you get lopsided districts and so I've simulated that so things I discover are that merely having kind of competitive but inferior candidate can produce a departure from the Benford's law mean so candidates Y1 and Y2 are basically symmetrically opposite some simulated ideological dimension but then you plunk down a couple of really less supported candidates near the Y1 candidate and you get slight departures from this 4.187 mean more interesting to me and I'm going to carry this forward for a lot of cases here some strategic behavior if voters act in a wasted vote logic kind of way which is to say candidates who are supported by voters they really like them but they're coming in third in this single winner case then they will abandon that candidate and vote for their second place candidate in order to maximize the chances that the candidate they hate the most doesn't win and that produces for both the first place candidate and the second place candidate these mean digits of 4.35 and I've seen that in a lot of data actually some of which I'll show you so those are unconditional means under just making the district not exactly competitive equal with turnout decline this is the simulation that matters over here you can see that this mean of the second digits really moves around it has this bow shaped kind of pattern for the second place candidate or the winning candidate also has this kind of bow shaped pattern so you could see this and find Benford's law like pattern is not satisfied and it merely comes from the distribution of voters in the district strategic behavior if you get strategic behavior with more than two candidates we get second digit means that are really below this 4.187 marker here and non-strategic behavior with multiple candidates not just two can produce this dramatic pattern of decline so I have several chapters in book manuscript about this and so I can go on about it but let me just use this in a couple of cases right away the first is Canada so these are data from Canada and Canadian 2011 election and the plots are a little bit complicated so let me walk you through the first row okay so we have the Liberal Party against the Conservative Party so in that row are all of the polling stations in Canada from districts where either the Liberals or the Conservative one and the other party Liberal and Conservative came in second so they were the top two the first two the leading parties in that district and the others have Liberal versus NDP the national what's NDP stand for whatever the D is for Democratic party they came in second in this election actually overall nationally with the leading party anyway I can't remember all that but this is Liberal versus NDP but let's look at the first row so to the right of zero here this axis indicates the difference between either the first place or the second place party and the third place party some theory by Gary Cox argues that that margin with the second loser is informative for strategic behavior long story truncated with that and to the right of zero to the right of zero show the precincts in districts where the Liberals one and to the left of zero shows the votes in precincts where the Liberals lost to the Conservative Party and similarly the Conservative Party over here where they beat the Liberals and where they came in second to the Liberals okay what is that to the left of zero for the party came in second the distribution here has this hump shaped kind of appearance 4.187 is right in here this little dashed line horizontal and it looks for all the world just like or very much like this pattern here of non-strategic behavior where the variation in the second digit mean is induced by the imbalance between the parties okay to the right of zero in this picture and generally for most of the parties and I've done this analysis for elections going back to like 1997 you get pretty much the same pattern across all the parties even the BQ of what could be quiet in Quebec to the right of zero it looks for all the world it's got a slightly different curvature but the second digit mean is well below 4.187 and it looks like you have strategic behavior by these part the voters who voted for the winner okay so it looks like a weird case where the people who vote for the winning party are acting strategically switching their votes to join the winning party but the losing candidate is not gaining any or at least not as many strategic votes okay this pattern was first observed without this micro detail actually by King Colman and Pradeep Chibber in a book they wrote where they have a long discussion of Canada and their evidence is that the effective number of parties in Canada is significantly or at least substantially and for a long time greater than two they have data going way back into the 1950s or so I have data going up to the 2011 election and so here's the effective number of parties for 2011 and computed district by district and you can see that it's slightly bigger than two it has some districts that are really bigger than two this is the median I believe this dashed line here and it would be two if both the winner and the loser had voters who were acting strategically but this kind of half strategic behavior maybe what explains this excess of parties so to speak in Canada so I use this argument to suggest that these digit patterns do actually swore to seem to be diagnosing strategy and match with other people's interpretation of interesting elections okay I have another case oh it would be great if something like that is explaining what's going on in Canada because otherwise across Canada we have a diagnosis of massive fraud all the time these are the chi-squared tests for whether the vote digits satisfied this 2BL second digit for the law-like distribution and you can see a whole lot of zeros here for the significance probability the second digit means are very rarely contained 4.187 between the confidence bounds so I don't think there's massive undetected fraud across Canada I think there's complicated but widely recognized strategic behavior in Canada that's affecting the claimed fraud diagnostic statistics here's 2004 years but I didn't put them on the slides okay recall these are the patterns and now I'm going to look at data from the United States this came in my own life before the Canadian stuff and is connected to a really large-scale strategic theory about American elections that I've written about separately so Allison and Rosenthal published a big theory arguing that American voters are involved in a giant coordinating equilibrium whereby the votes for president are tied to the votes for the legislature read the congress, read the house representatives so that long story short when there are more votes for one party's presidential candidate there are more votes for the other party's legislative race so in a race like 1984 where the Republican presidential candidate is really winning big then Democratic house representative candidate should benefit from gaining strategic votes Jazz will remember this because in 1996 he came to my office and had their book saying it was a great book and I declared I go home over the weekend and refute the book and four years later I published a paper that said it was true so thank you I guess for that and we published a paper a couple years later an APS saw that showed it's sort of true for midterms as well so I've gone from a skeptic of their theory to a true believer I'm not sure if you still believe it or not I haven't thought about it we'll go with that so in that light we have clear predictions what should happen in this picture so these are votes for the second digit means the rug plot here shows all of the districts in the United States for which I have data and I have data from all states except California and these data so these are the Republican winners these are the Democratic winners these are the Republicans the Democratic this is the Republican loser in the district where the Democrat won and this is the Democratic loser in the district where the Republican won and again the Republican winners look a whole lot like this non-strategic pattern with the bow shape picture the losers of both parties look a lot like the non-strategic in this case in this simulation right here and the Democratic winners have mean digits that are not substantially different from 4.35 exactly in line with the strategic theory of Alice in a Rosenthal with this giant coordinating equilibrium so I was really excited when I saw this a few years ago I thought wow this stuff really is not fraud but it's around responding to strategy that in midterms you shouldn't get the strategic behavior because people know who the president is and so in 1986 you can see that the Republican winners still have a slight bow shape to them but the Democrat winners now instead of being this elevated picture well above the line at 4.35 have come down to be mostly not distinguishable from 4.18 and have maybe a slight bow shape that you can see in these non-parametric densities right so that looked to me like exactly what Alice in a Rosenthal's theory should should happen and it's the second significant digits of the vote counts I mean really ok so I'm all excited at this point when I'm seeing this stuff and it's great if they pick up strategy because like in Canada if you compute the chi-squared test where the second digit means not only does the presidential election look like it's massively fraudulent and the House election but the state level offices also and in the longer analysis you can see similar patterns in the states and so I think it's strategy not frauds it's also great because for the data I have for the 2000 period 2006 you can see what Republicans might like to call the a-corner effect which is to say apparently if you believe the stuff measures fraud the Republicans have cleaned up their act because none of the Republican races have a significant departure but all the Democratic ones do so acorn ok I don't believe acorn did this I think it's a sign of something else happening alright Alice in a Rosenthal may or may not be true for the 2000s but I think something else is going on and in the interest of time I'm going to skip by this next set of stuff but the catabill with Alice in a Rosenthal's theory doesn't work out exactly right for the 2000s but I have much more I want to talk about so I'm going to skip by this quickly in Germany as well I have data and in Germany is great because there's lots of strategic measures you can use because of the mixed system in Germany the mixed system in Germany refers to the fact that they have both proportional representation votes and district specific votes that happen on the same day with the same voters so it gets two votes the overall size legislature share the parties is set by the PR vote but any difference in the vote margins with these two different election rules well it's the same election so it's entirely or largely due to strategic considerations and that's a conventional kind of view so in Germany I pulled together three different elections for this picture 2002 2005 and 2009 and you can see except for the greens and the first votes the single-member district votes you have a lot of zeros here and so what I have done in this picture is compute and the conditional second digit mean is really important for diagnosing the strategies for the most part here I have along the x-axis the difference between the second vote and the first vote for the cdu which was the party that got the most votes in this election and the y-axis I have the second vote versus first vote proportions for the fdp in this picture the fdp in the cdu are both kind of right wing center right parties fdp is really small so an argument that has some support in some publication says that voters are shifting their vote in the PR vote to get the fdp over the threshold so called threshold insurance contrary to the idea that the PR vote is entirely non-strategic because well why would they cause a PR right so I'm not sure you can see the numbers in this non-parameter regression that I have here but the numbers match the shift down in the pattern much like the simulation I did under entirely different election scenarios with artificial data they go down from about 4.25 here to about 3.85 right there so it's a curve that goes up and you can see it's mostly proportional to the shift in the fdp and there's a nuance that when the fdp is actually gaining more votes in the second in this than they are then you have a shift and it gets more complicated but the main point as I want to show that this is strategy affecting the second digits and arguably not fraud the literature on this argues ask the question do the greens also benefit from the so called threshold insurance and they kind of do in these data you get slightly different locations but you have the same movement here from numbers around 4.15 up to about 3.9 for the greens the small party on the left as opposed to the right so the x axis now is the spdp okay so again strategy is affecting these fraud measures and I'll skip the bottom for example again time except that I've got this cool plot that shows that the greens really benefited after the Fukushima nuclear meltdown so the election and bottom verteberg the state election shows this big upswing in support for the greens who won the election and you can see that their significant digit becomes really small and significantly different in that year but I'm mostly skipping that example in interest of time okay case in Mexico that involves coalitions and in this case I'll just say about it that it shows that the second digit mean pattern shifts in the case where a party runs as a coalition which is here the PRR is part of a coalition and the second digit means are really elevated above the 4.187 number but in other districts and separately and so the coalition induces a big shift or is associated with a big shift in the second digit means up compared to here right it is similar kind of thing over here of the coalition against another coalition the PR the pan party okay so I also have some evidence that using geographic data and so I'm using this particular measure for weights that is essentially the low S weight function the tricube function advocated by a couple of authors and so let me see how much I can get here right so I do applications of this in Germany and Mexico so the geography is slightly complicated in Germany because we have both in-person districts as well as mail in ballot precincts and these don't quite overlap exactly but I'm not going to get into all that detail here the point is that we can estimate how big a window in a sense or how wide a net you need to take in terms of nearest neighbors and you can also then use this to see whether this set of geographic estimates for these mean digits that don't use any district boundaries or anything are they confined to the district boundaries so they have a cross-validation method they suggest for selecting the bandwidth and the only point here you can see that I computing here the bandwidth for estimating just the simple proportion of votes in each of the polling stations on the left two columns and the right two columns are computing the second digit means and the bandwidths are much wider from the one than the other and same thing in Mexico and the results we get for example here's an example of Berlin and the weird thing about this is you can see how the second digit mean which is color coded in the map the blue means that the second digit mean is less than this 4.87 value reddish kind of means it's greater than that and to a very remarkable extent the blue is all inside Berlin and outside Berlin it's really different and it could be there's like massive fraud in Berlin that no one noticed in this election but I think it has to do with the strategic behavior going on if I back away from Berlin and now this is Berlin embedded in Brandenburg you can see the red kind of spreads out into the rest of Brandenburg and then it has a little kaleidoscope kind of pattern I have these maps for the rest of Germany as well but the fact that it's contained inside this one place is a it's actually me it's around responding to something maybe strategy if you look at the vote distribution you can see that in Berlin which are the columns on the left essentially four different parties are competitive in different parts of Berlin but essentially only two parties are competitive out of Brandenburg so there's some difference there and I'll just move on because the talk is not entirely about Germany in Mexico City so this is the federal district in Mexico it's kind of this heart shaped region it's mostly red and it seems to be red if you back away and look at Mexico but I'm not going to talk about that much because I want to say a few words about Klemik et al. before I shut up how much time do I want to take here okay five minutes well let me talk while hopefully my computer comes back okay so I said that Klemik et al. guys claim that they believe that there's unimodal turnout and unimodal winner vote proportions if the election is not fraudulent they have specific claim about what fraud is though and so I love this model even though I think it has lots and lots and lots of problems I think the idea of asserting positively what fraud is is a huge advance and I'm working on a project to continue this that I talked about a little bit on Wednesday here at another session some of you were there too so they have extreme fraud and incremental fraud so incremental fraud is the case where many of the votes are taken for the winner from both other parties and also from non-voters we call that ballot box stuffing and extreme fraud is when you get basically all the votes again and so this leads to a multimodal distribution they have a specific functional form with this notation I can skip it so they have a specific functional form here n is the number of eligible voters that they are conditioning on these are the normally distributed a turnout new winners proportioned votes that are each assumed to be normally distributed these are the votes that are stolen from non-voters that's going to be another mode and these are the votes that are stolen from other candidates so they have this model they have a simulation protocol that I have lots of problems with I last year wrote a paper where I developed a finite mixture model kind of alternative to that so this model isn't again a lot of details here and I hope you'll bear with me in not burying down and showing what this all is but this is entirely a model that's motivated by election fraud and it's not something esoteric like the digits in the vote counts or something weird like that I guess this is this weird thing but it's explicitly framed as a fraud kind of model right so here are some estimates just to show that you get different results with the finite mixture code that I like sort of it has some problems as well but it's better than their version for Russia you can see that we get smaller estimates for the amount of incremental fraud this is the probability in a sense that each unit has incremental fraud or extreme fraud and so point one as opposed to point nine if you use my estimates the finite mixture ones you do not find that California has more fraud than Russia which you do find with their method the same or pretty much so I think mine is slightly better okay another point that I really like with my method or at least sort of did until a couple weeks ago is that the finite mixture method seems to eliminate the association ambiguity of the fraud measure with strategic behavior so again in Germany in this case in 2005 I estimated the clinic model in each of the 299 districts in Germany for the first vote the district based vote and I think this was on the ad material so I put on my slide this is Saxony you can see that Leipzig over here has a pretty significant incremental fraud number which is why it's red if I had more time I talk about why Dresden really is the interesting case in the selection but I'll just go quickly through this so the problem with this if you look at 2009 I guess I should have said in this case for the graph here there's a pretty strong association between the measures of strategic behavior the X axis here for example is this difference between second vote and first vote for the FDP that I was showing in the other graph that is the switch their vote between election rule so that's strategy and the P values here the thing that's the outcome variable in these non-parametric regression non-parametric regression plots are the FE statistic the probability of incremental fraud and the main point is that the P values are really small 10 to the minus 4 with their own approach with the finite mixture approach you have P values of 0.9 right so I was really happy to see this I thought okay and finally we have a technique that can mitigate at least here it looks like it eliminates completely the confound between fraud and strategy okay problem is that analyzing the same kind of situation for 2005 we come back to pretty hefty associations between strategies and frauds and not for the minor FDP party but for the two the main party that won the election Angela Merkel's party P values of 0 for incremental fraud here this is FI and similarly for SPD the left winning other major party big associations there so it may be that 2009 was a lucky fluke and it could be that the confusion is inherent in the concept of fraud and strategy behavior and it's going to be the method that will discriminate those two which is the frontier where my research is right now just because I have them I'll show you that we also are doing late and variable analysis we're using these complaints that were filed after the election to a committee of the Bundestag in Germany we code them by type of events so we distinguish absentee ballot problems from complaints about the electoral system from problems at the polling place and a bunch of other types and you can see that these are the factor loadings in this model and there's some complexities that I'll skip over talking about but the FIN a fee the fraud probability parameters from the clinic model are associated with only one of the three dimensions of these complaints and so it seems like we may be able to use this auxiliary information to maybe disambiguate the parts of those that respond to frauds and post the parts that respond to strategic behavior and that's something that I'll be working over the next couple of years okay so in conclusion I guess I'll show all these bullet points here and conclude election forensics began with an idea that deciding you wanted to decide the whether the election was accurate using only minimal information I mean in my own life history the idea was back around 2005-6 there was a big round of people complaining about electronic voting machines with no paper records so you couldn't audit them or do anything you had to trust the machine and it was also well known that it's basically trivially easy for experts to hack voting machines that are all electronic and so I had tried to see is there some way that we can test whether the votes are accurate and we can't audit and that was one of my original motivations for pursuing the digit test for such a long time because the only thing you'll have there then is the vote counts and so can you tell from that there's a I'm not sure that works anymore but you can get a lot from just the vote counts even but the problem is that A they can be manipulated I showed how the last digit stuff I think is manipulated in Russia because you can do it with your calculator you can also manipulate the second digits because you could take the code I have in some of my papers and just run it and simulate anything you want I showed you simulated elections Putin could pay me to do that I'm not paid by him to do that because I wouldn't be here talking to you if I had been paid by Putin to do that I wouldn't tell you that I had been paid anyway and so it's ambiguous what's going and they're also ambiguous because of the strategy versus fraud problem so the other things I think are redundant with what I want to say so questions comments yes the statement was this might be an argument for having only two candidates because then there's no strategic behavior going on should I respond to that well the question is how you get down to only two candidates and so you just move manipulation to the process of winning it down to candidates and many systems already have restrictions where they eliminate the possibility of candidates being on the ballot Russia is such a place people are arrested jailed whatever to prevent them from running these oppositions we have ballot access problems in the US I don't think that's a solution to the general problem of guaranteeing that the election is accurate the rules will be then violated in a sense there also are kinds of strategies you can actually have with two candidates they're not necessarily wasted to vote strategies but if you think about a series of elections then you don't eliminate strategic behavior you just change its form yes the question was Russia used to have the option to vote for none of the candidates and is it still true? yeah in 2004 the whatever they called it in Russia the none of the above option was on the ballot and it got a lot of votes which is probably why they got rid of it and basically it behaves like another candidate in a sense in that I can see variations of all types like here I don't have a sense of exactly it's not like another candidate because you're not going to get a seat in the parliament for that's the none of the above seat we left them empty yeah so the statement was it's a way for extremists to vote for somebody to vote at all but not vote for an extremist party I don't know the origins of that to see whether it came up as a way of accommodating extremists I don't know why it was on the ballot in the first place I know it was there since the 90s when I started looking at Russian when the Russian data started that I was looking at and then it was eliminated in the mid-2000s they thought there were too many parties in Russia and so there are books about let me restrict the number of parties in Russia and as part of that they also changed this ballot to get rid of that option but in 2004 I can see the very same pattern with minor digit differences that I showed you for 2008 and for 2012 in that there's vote augmentation at the place where turnout is evenly divisible by zero where the winners proportion is evenly divisible by zero and five much more than turnout but you also see it for turnout so the attitude of Putin and United Russia toward the election is invariant regardless of them also making the changes that eliminate the number of parties that people are voting for anything else? the question was whether online voting would be a good thing or a bad thing or enhance like internet voting for example I think my attitude is like with Spilf Stark on this which would be a complete and utter catastrophe to have internet voting the internet is not ready for internet voting it's impossible to make the internet secure massive interference just changing people's votes where they go it's really hard to prevent coercion people voting next to the person the abusive spouse or whoever who's telling them what to do let alone elaborate vote counting systems the problems go on and on you can't protect the software at any point in much of the machinery in the internet because you don't see most of the hardware that's transmitting the signal from the mobile phone to the voting system I think it's I guess there's a big fight because there may be a few people who are expert who are advocates of internet voting but I'm on the side with the majority of computer scientists who know much more about the technology than I do for sure who don't think that simply imagining well I can do banking on my android device why can't I vote or my Apple device whatever it's just not ready for that the features of voting that make it utterly unlike banking are you're supposed to have a I'll call it private ballot or a secret ballot no one's supposed to know who you voted for and you shouldn't be able to prove who you voted for so you may know yourself but you shouldn't be able to prove who you voted for to anybody else because if you could then that opens you up to coercion so that anonymity if that's what it is and that's also one of the normally desirable features of collective choice procedures that the vote doesn't depend on who the vote came from which is not quite the same problem but it's a feature and so those good properties of elections make it very difficult to get the kind of security that you want the way this is going people are talking about end to end encryption as a solution and there's a lot of proposals for doing that some of them have been realized in kind of smallish a couple of public elections mostly non-public elections so it's not impossible to imagine remote voting, internet voting but the technology we currently have that are popularly widely deployed are not up to it does that answer your question satisfactorily? okay, thanks tell Philip I said that when he gets back the question was about smartphones and you said something about Russia and smartphones like Estonia, for example had an electronic voting system in some of the countries up near there geographically they had versions of electronic voting problems right well the problem of taking people the question is people take their smartphone and take a picture of their ballot and then preserve the picture and then show it outside of the polling place so that seems like a bad thing and that's a problem I believe there's some US jurisdictions that have made it illegal to take pictures and for I don't know if anybody says you have to leave your smartphone in a thing there but that's still sort of okay because you can always fake that picture and so if someone is relying on the picture to be accurate they can't really tell if that's your ballot or just some picture of the ballot you got so it's not quite making the situation of coercion impossible again there are papers on this and there's a journal called the Journal of Electronic Technology and Systems that I'm co-editor of we have a lot of literature about this kind of stuff so I direct your eyeballs to that and there's other journals as well that are in this space and so people have been writing about this in a much more sophisticated way than I'm able to answer right now yes the question was of all the day I've looked at what has been the most well formed least fraudulent election that I've looked at to get there so well just to clarify I don't think strategy is a problem I in a sense love politics and strategy strategic behavior is the essence of political it's a large part of the essence of political behavior so I do not want to eliminate strategy necessarily sometimes I personally don't like the strategies because my candidate or party loses but if you don't have strategic behavior then that means you've done something else to in arrows term get a dictator or something so something else is wrong there are situations where the choice seems like it's like a binary choice like you mentioned before have only two candidates so votes for constitutional amendments in the United States for example which some states have Florida they don't call them constitutional amendments in California but ballot initiatives that's a little complicated in California so it can get kind of strategic but I've looked at oh sorry cartoon the Al Gore cartoon but the constitutional amendments I've looked at in some states they seem to be not really looking strategic those are the ones where I've done second digit tests and the second digit for law pattern works completely I can't think of a case I've looked at with candidates because generically looking like it was non-strategic some local elections I've looked at but many local elections I've looked at like state legislative elections in the United States have they look like just elections with people doing wasted vote kind of stuff yes I have not got data from open primaries anywhere I mean California if you call the top two system an open primary well Louisiana California decided to become Louisiana for some reason I hate it because it ruined my data series of having the parties so I would have voted against that if I had been in California so I haven't really analyzed such data I've mainly cursed at it because I don't know exactly what to do with it but I expect I have California data and now it's a lot of data it's not just Louisiana in the United States some of the countries that are in scope for the USAID grant mostly there are small developing countries that USAID is interested in have they I don't have any primary data from those elections but we have data that not in the grant that may be from there so there may be some other countries but the answer honestly is I haven't looked at those kind of situations yet because I haven't had conceptual ability to think about what they do thanks everybody