 Jeremy Darrington talking about election data and I'm very excited about this. Jeremy has been working, we worked with Jeremy quite a bit in the webinar series and he's very knowledgeable about political issues and political data and political research and so I'm very excited to have him here. He is the Politics Library and at Princeton University Firestone Library. He received his MLIS from the University of Washington and has had degrees, has degrees from BYU and UC Berkeley. He's the past chair of the Law and Political Science section which is now the Politics, Policy, and International Relations section of ACRL and is the past convener of ACRL's American Geospatial Data Services and Academic Libraries Interest Group. He is also the Political Science Editor for resources for college libraries. In addition to all areas of politics, his research interests include technology and libraries, government information, digital privacy, dabbling in code and data science, and a wide range of social science topics. So I'm very excited to have Jeremy present to us today. All right, sounds good. Well welcome everybody. Like Linda said, my name is Jeremy Darrington and I'm the Politics Library at Princeton and I work a lot with election data and seem rather apropos at this time of year, just about a week before election day that we should have a session about election data. To introduce this topic, you know, if you've ever dealt with questions about elections data in the past or fielding those now, you know, these tend to go in cycles. You see real uptake in these anytime we have a presidential election cycle in the United States. But for me, I see them often around mid-term elections as well, the more so with the presidential elections. But one of the big challenges of course is in the United States, our federal system makes this much more challenging than doing elections research in many other countries in the world. For a couple of reasons, the first one being that we don't have one national agency that's in charge of administering elections, gathering elections data, and reporting it out, as many other countries do. Obviously this is an issue of sort of federal jurisdiction and scope. And it's nicely illustrated in the next couple of slides here. So first, this is a slide that actually shows sort of the reporting about absentee ballot usage in the 2008 elections came from a Pew report from a couple of years ago. But what it does nicely is it illustrates really clearly for you how many different counties there are in the United States. There are more than 3,000 of them. And elections in the United States are governed primarily at the county level, right? So we're not dealing even with just 50 states which would make things complicated enough. We're actually dealing with more than 3,000 election agencies in the United States that are in charge of both administering elections, counting ballots, reporting elections, and they all do it in different ways, right? They all have different systems. They all have different voting technology. They have different rules and regulations about what is to be reported when and how. So this makes our job a little bit difficult. And if that weren't really bad enough, we have the fact that in addition to 3,000 counties in the United States, there are more than 170,000 precincts where the actual voting takes place and which are staffed by almost entirely across the United States by volunteers. People who generously donate their time to spend all day at an election poll helping to make sure things run as smoothly as they can in this big complicated country that we live in. So this is a map from the 2008 presidential election showing the winners by party in and around Cincinnati. And so you can see each dot represents a different precinct in and around the city. And this is kind of a nice map, I think, because it shows one of the truisms of U.S. electoral politics, presidential politics at least, certainly in the last several electoral cycles, which is you see a growing divide between sort of urban and rural voters. So you see the suburbs are all in red, voting Republican and the key urban areas of Cincinnati all in blue. That's one of the big challenges with researching elections is that the election data is not always easy to find because it is so fragmented in terms of how it is collected, how it's reported. And interestingly, it's not just even that's not the only challenge. The other challenge has to do sometimes with in terms of the jurisdictions or the geography in which people are interested in getting election data. This is rather complicated as well because the boundaries for elections can be complicated. And one of the ways that a good way to illustrate that is that, for example, in congressional elections, each constituency is a congressional district. Well, those don't always align with county boundaries in states, in the United States. It would sure be nice if they always did, but they don't. Let alone other types of geographic or pseudo geographic boundaries that people sometimes want to get election data reported at. And so an example being like zip codes, sometimes you get people coming in asking for, you know, presidential elections sorted by zip code. Well, it's just not reported in that way. And zip codes are actually sort of a strange artificial pigment of our imagination anyway, because they don't really align with real physical boundaries. They're an artifact of the postal system, right? So that makes for additional challenges as well. So today, there's a lot we can talk about elections, but we're going to talk specifically about electoral returns. And then I'll talk a little bit about voter turnout data. And then a couple of things actually about the administration of elections, which I think is interesting, a little bit lesser studied in political science. But one of those areas that I think is of growing interest, especially, you know, in this current electoral cycle, lots of claims about rigged systems and rigged elections. It's useful to have some knowledge about different types of means of measuring how elections actually perform in the United States. And I have the link here to a lib guide that I put together on this topic. It has all the links that are in this slide, plus many more in different aspects that we won't have time to cover today. A brief overview of kind of electoral return data. First, I think it's useful to think about when you're working with your patrons, what types of sources actually report electoral data that you might be interested in? Specifically, I'm looking at electoral returns here, so the actual tabulation of votes. So this could come from a couple of sources. First would be the unofficial sources. So these are the actual boards or divisions of elections at the county level. Often these will be aggregated and reported back out at the state level, if you're lucky. And so those are usually done through like a state secretary of state's office or sometimes it was a state division of elections that will aggregate all the reports down from the county levels and report them all in one place. There can be unofficial sources that can provide lots of useful data on elections. So a good example are like news sites, like the New York Times has a great site always on the presidential election and usually on the congressional elections as well. And those are best obviously for live coverage. So when things are happening in the current election, that's the place where they're going to go to get sort of up to the minute information on the elections and the returns that they come in. And those are all pulled actually from the Associated Press, which has a very extensive field operation down at the county level and all the counties across the United States where they have people basically shadowing the county officials as they tabulate the data, the electoral returns, and then they're just sending them back in by phone or by data entry to the AP offices, which then have a live link out to the major news sites who purchased that data from AP. Those are great. The news sites have some really amazing data visualizations and interactive features. One of the problems with them, of course, though, and many of you probably experienced this, is that the news sites, number one, the data on the news sites is not easily converted into actual tabular data if people want to use it for analysis in a paper or some kind of research. Usually they're more about the visualizations and sort of making sense of the data, which is not surprising given that they're primary audience or people just consuming news. The other problem is that they don't tend to be preserved over the long haul. So even the New York Times is venerable as an institution as the New York Times is, typically won't keep up their interactive material for presidential elections more than a single cycle. Right now on the New York Times website, you go back and find most of the coverage from the 2012 presidential election is still there and fairly usable. But if you go back to the 2008, you can find some of the 2008, but half the data is no longer there. The links are dead and completely unusable. And anything before 2008 just forget it. It's not there. So you're going to want to obviously need to look to other places beyond just the news sites. Now, a third source that often people don't think about is that the actual reports of voters themselves constitute important information about election returns, right? About voting in the election, who people voted for. And there are various voter surveys that do this. The primary one, the primary two really series of surveys that cover this are the A&ES, the American National Election Study, which is a long running study of major elections, primarily presidential, but also congressional elections going back to the 1940s. And this is the primary source in political science that's been used to study voter attitudes, you know, as well as get information on both prospective ideas about who voters think they're going to vote for in the election, as well as sort of retrospective analyses of who they say they actually voted for in the elections. The National Exit Day Poll series of surveys, these have been done by the major networks, the major news networks, in conjunction with sort of the consortium that they've put together to field these in recent years to kind of share the cost. And these as they sound, you know, these are exit day polls. So as people are coming out of the polling booth, they ask them, you know, a sample of these people questions about who they voted for, why they voted, what their experience was, other attitudes. And so those can be a rich source as well. Finally, I think another useful set of questions to ask yourself is before you jump into looking at any data sources is, you know, a series of questions to help you narrow this. And these are probably obvious, but I think they're worth repeating. First, you need to know, you know, what office are you looking for? Is this the state office? Is it a federal office? Your sources are going to be really different depending on the answer there. What years are your patrons interested in? Do they want one year? Do they want multiple years? What jurisdictions and at which geographic level? And this is the one that I usually get the most frequently, which is, you know, people coming in and they say, you know, they want presidential election data for whatever, you know, 2000 through 2012 or something. But then you have to dig a little bit deeper and try to figure out, you know, well, what is it you're going to do with this data? How are you going to analyze it? And depending on what their answer is, you know, they may just want sort of more summary level kinds of data or trends, which you can get from a lot of different sources, but often they're going to want to go, if they're doing real sort of data analysis, they're going to want to go at a much narrower geographic level. So often at the county level, and some people will want to go down even further than that, down to the precinct level if they can get it. And finally, another question to ask yourself is, you know, what format will be useful to your user, right? I mean, some of these sources that I'll show you are specifically put out in, you know, a straight sort of data format for use in some kind of statistical software, like Stata or R. And for many users, depending on what it is they want to do, that's really going to be overkill or not typically useful to them because they don't know how to use those programs or work with it. And so, you know, you may want to look for something else that comes maybe just in the straight sort of spreadsheet format like Excel or a CSV file that would work for them. You know, on the geographic level as well, I just want to add that it's the same question there of sort of overkill, right, in terms of specificity, like, you know, people may want stuff at a county level and some users might get excited to be able to get things at the precinct level, but it's such a huge volume of data that for many people that's going to be overkill as well. So, jumping into the actual returns, we're going to go through presidential first here and then we'll kind of go down the ballot as it were. So, I'm not going to cover any state-level sources. There are lots of sources online that do a good job of presenting this sort of tabular form state-level summaries of vote returns. And I've listed a couple here. You can probably find many more online. So, really the heart of this starts happening down at the county level and that's primarily what most people are interested in. And that's the most prevalent form of kind of narrow geographic reporting that you can get. You can get some stuff down to the precinct level, which we'll talk about in a minute, but primarily you're going to find things at the county level. That tends to be how it's gathered and mostly until recently that's almost always how it's been reported. One of the first sources that's really worth looking at, and many of you at working at academic libraries will have access to this, the CQ, the Congressional Quarterly CQ Press has put together a lot of great collections on American politics and they have a collection called the Voting and Elections Collection. And it's quite good. This is a subscription source and it's fairly pricey, so some of you may not have access to it or may not want to have access to it. But to show you what this looks like, and I'll just say this is my elections guide and so there's my handsome mug right there if you're really interested in what I look like. But that just logged me out. So to show you the CQ Voting and Elections Collection here, there's a lot of information in this collection. It's not just electoral data. In fact, most of this collection is a lot of very rich contextual information about elections. And there's a lot that we could talk about that we're not going to because we're really focused on the data sources today. So if you're interested in the data in this database, you have to kind of click up here. I always find this a little bit counterintuitive, but you click up here on download data. And there's a give you the option of selecting which office you want. We'll talk about something that was in a minute, which electoral cycle. And then you can choose either get a national number. Most of the times you're going to want this kind of county level detail. These come out in a sort of nicely formatted spreadsheets. The one challenge with CQ Voting and Elections Collection is that they limit you to downloading only 10 states at a time. It's a little bit of a hassle. I guess they just want people stealing all the data and then posting it online or something. But anyway, you can deal with it if you have to. But it's a good source and usually the best source for kind of the most recent elections in nicely formatted kind of county level returns. The next source that's a really good one that many people know more popularly is Dave Leap has put together this wonderful US elections atlas. He makes actually a lot of his stuff, especially the state level summary data as I mentioned up above here, available free online for anybody to use. He's got a lot of great maps as well. But you can actually get a site subscription as an academic institution to his site. It's actually really affordable and will give you access to county level data for presidential elections going back to 1912. In some cases he's got some states done all the way back to 1884. This is a great source. It's not quite immediately as useful as the CQ voting elections collections in the sense that the information you get out of here from counties, and I'll show you this in just a second. You have to go sort of state by state and you get it in this tabular form. You can cut and paste this into a spreadsheet and it formats really nicely. It's a little bit more tedious to get the information out here if you're going to get a lot of states all at once if you want to take all the whole country. So it's a little bit more tedious than CQ voting and elections, but it's a lot cheaper too. So if you're, if you can't, if your library can't afford something like the CQ voting and elections collections, you should look into Dave Leap's atlas. It's a great site. Plus he has a lot of other information that he sells, a couple of sources that he sells as spreadsheets already sort of pre-formatted and some of those things are not actually fully included here in the online sort of web version. Now the one note I'll make on this and Dave's kind of funny about this, but he's been doing this for a long time and you know interestingly though the sort of digital representation of elections data with Democrats in blue and Republicans in red is a relatively recent artifact. So Dave's been doing this for a number of years and he and a couple other places I know of when they first started putting up electoral data in maps online, they actually had this reverse right. So the Democrats were represented in red and Republicans in blue and some of them like they have stuck to their guns and said look you know we're not going to go, I'm not going to go back and reformat everything in my whole entire database to change to match what everybody else has sort of coalesced around the opposite scheme. So just note that when you're looking at it that's a little bit backwards from what you're used to. The another level not level but another project that is interesting is this open elections and most of you probably haven't heard of this one but this was a project started about two years ago I want to say two or three years ago with a night news challenge grant and it was a great project. I've been following it with a lot of interest over the last couple of years. One of the lead developers is Derek Willis who used to work for the New York Times, the data journalist and now he works with ProPublica and it's all entirely volunteer driven and you know they had a big grant from night to get stuff started but most of the work is now happening by volunteer and so they rely on volunteers to gather and then to process the data and so this is sort of an ongoing project and I'll just sort of put a plug in for this that if you have any interest I really recommend going and signing up with open elections and seeing what you can do to help out this project because it's great it's entirely free and open and they are now posting data on their site it's a lot of it still sort of in process but they have raw files for many of the different types of things that they collect and so you can go sort of state by state and see what they have generally these are running from 2000 to the present but they get stuff down at the counting level but a lot of this is it can be actually at the precinct level which we'll talk a little bit more in a minute but a great source and has a lot of great things to offer so if you have any time to spare you're sort of the elections junky go help them out another good source for presidential election returns are a series of data collections and ICPSR so you have to obviously be an institution that is an ICPSR member but this is a more sort of heavy data format so if you want somebody who really wants to get a lot of elections all at once already formatted it's directly for use in like state or R this is the source to go to and I want to just mention on this one one thing that I've figured out working with this over time is that number one it's a it's a big collection of files so this is just the 1950 to 1990 portion you can see they're doing files every couple of years every two year cycle or so some of them are fairly large files most of them there are not too unwieldy since there are multiple years in here but it's a lot of files you could take them all out at once but you get all the states all at once so that's sort of the nice trade off there and they come out in multiple formats as you can see these data are cell whatever but the one caution that I'll say on that is that when you first download it takes a little bit of reshaping for most users to be able to use this in any kind of a helpful format so when you first download one of these collections so this is the 1984 presidential election you can see over here on the left you know we have the state code have accounting names of districts and kind of stuff you'd expect but then you get into here this is the reported vote by party now this is based on a long-standing coding system that ICPSR uses for elections data which is one of the first things that ever went into ICPSR and it's based on a heavily extensive collection of political parties that have contested elections in the united states of which there are many hundreds and so you can see these are all codes for individual political parties that contested the 1984 general elections so it's quite extensive and if you're interested in just republicans and democrats and maybe like you know the reform party in 2000 or you know the libertarian party or something this is going to be a bit overwhelming for most of you uses to use so it takes a little bit of work but you can reshape it and you have to go refer back to the code books for the most recent years in the collection so like 1984, 1988, 1990 I think actually the earlier code books don't tell you what all the codes stand for which is a little bit of a challenge so if you look in those later code books you can find out which variables represent which parties and so you know I created here sort of a reshaped version of this that had I dropped all the parties I didn't want I just wanted to look at sort of the major parties and coded in here and that's a lot more easy to work with so somebody wants to come in and get you know the electoral terms they can get them rather easily so that takes a little bit of work beyond the county level occasionally I get requests and you may too for presidential elections reported by congressional district now this is a bit of a challenge because relatively few states actually compile results by congressional district and they tend to be only those states where the congressional districts actually line up really well with the county boundaries in a state that's not true in many cases there at least 15% of all counties in the United States include portions of more than one congressional district right so it's it's a little bit challenging and most of the states then don't opt to try and figure that out for you so there's only really one good source that I know that reports this really consistently across the entire United States and it's a man by the name of Clark Benson who works at a company he found and called Polydata and he's been doing this for a number of years and he has data going back to about 1992 and these are things that are only available for purchase that they're fairly cheap they're like 100 bucks a spreadsheet or per electoral cycle and he does great work it's a very time intensive labor intensive work so I really encourage you if you get these kinds of questions to go to his site and and buy the spreadsheets from him help support his work but that's the only real source that gets a state leap has some presidential elections data at congressional district level but like I said they're select states and they go back to about 2000 now for presidential elections on the primary side you can get these out of some of these main collections so CQ voting and elections collection has it for all states but the years really vary quite widely depending on a number of factors and mostly that has to do with when states actually started introducing primaries which really is a relatively recent phenomenon mostly since the 1970s and it's really variable so you know primary data is messy it's not super comparable because in most years until the last maybe 15 to 20 years you don't have most states even contesting primaries it's only in recent years that a majority of states do it and obviously not all states even field primaries many of them still run on a caucus system so you don't get representatives for a statewide returns in the same way you would with a primary and one of the interesting things as well as you start looking into this is that many states have really bounced back and forth between primary forms between primaries and caucuses between different types of primaries some states have held primaries and then you know the next electoral cycle they don't do a primary at all and then they pick it back up again a couple of years later so it's really messy data and and that's reflected in the different sources they try to report it so CQ is a good source for that for presidential primaries Dave Leib has a good collection of this as well down to the county level and the primary data in CQ is down to the county level as well days actually can also be purchased separately as spreadsheets which is sort of handy to have it all in one spot and and they're not too pricey you know if you have some room in your budget leftover it's a worthwhile purchase I think but actually there's this is one of the times when I'm going to mention some print sources there's a kind of two key print volumes done by Rhodes Cook who did a lot of work with CQ press on collection election statistics over time and so these cover sort of 1960 all the way up through 2004 and it's very well documented and I've actually over the summer I started working on a project to convert these into a data collection and it's kind of slow going about halfway through maybe so maybe in when if I do this sort of recap of this again in four years I can tell you that we've got it going now moving into congressional elections again CQ has a good coverage of House and Senate general elections for House obviously going back a long time back to the 1800 early 1800 the Senate obviously Senate general elections didn't start until around 1912 1914 so they don't go any earlier than that they're always done sort of by the state appointed by the state use the state legislature before that and as far as Senate elections they're down to the county level starting in the late 60s this collection here this is a set of spreadsheets that Dave has made available from his elections at West at the county congressional district level and again this is one of those ones where the boundary misalignments is kind of interesting right so usually House elections are reported the constituency which is a congressional district but sometimes I have patrons come who are interested in getting House elections data reported by county right so again it's a little bit of a messy thing and but Dave has done some reporting and gotten these from different states and has them back to the early 90s for the House elections both so the spreadsheets have it both at the state county and congressional district level and I should say with the with those spreadsheets from Dave Leap those are not actually in the web platform at all these are only available in the spreadsheet because it's kind of a lot more data and it's hard a lot harder to represent visually so he hasn't chosen to do that again the ICPSR data series have a very comprehensive collection of congressional elections data it says you know this collection data goes back to 1788 that's primarily for presidential the congressional stuff goes back to the early 1800s and a little bit before some of it's a little bit spotty you know obviously the further you go back into the early part of the republic some of that data is a little bit spotted this and missing data but that's a great source as well but again in that sort of more heavily statistical data format and you know another one that many people haven't heard of is this constituency level elections archive this is actually primarily a collection for cross national elections research and it is one that is primarily used for foreign elections but you know the United States is included in there and it's all based on this same ICPSR series of data one of the nice things and the reason I mentioned it here is that they have a really great sub-setting feature in their website so you could actually go in and just pull out a couple of years you want instead of having to go in and download sort of electoral cycle by electoral cycle from ICPSR you go and select you know certain number of years and this gets you a whole spreadsheet all at once or you could choose selective years so that's kind of a nice feature and again Dave Leap has sent in general elections in his platform by county back to 1990 congressional elections at the primary level these are only reported really by congressional districts so very few places in fact nobody that I know of actually reports it at the county level for primaries so CQ has some of this going back a ways primarily for the Senate elections for House primaries they didn't start those till the mid 90s but the second link here is from a scholar named Stephen Pettigrew who's done a nice collection from the America vote series so many of you are familiar with that print series well he went through and compiled them all from 1956 to 2010 and put them into a nice data set which is in data verse and so that's freely available and he's also included a number of other variables at the district level as well including some stuff about candidates in their background and gender for some of the more recent years when you could collect it open elections project has primary elections for both the House and the Senate and again that's sort of in process but worth a look for the more recent years especially if you don't have access to like the CQ voting elections collection and then you know there's some one-off things in ICPSR like this one from the south covering a select series of states set 11 states down in the south that has both primary general election data but mostly it's an emphasis on sort of the primary elections gubernatorial elections these ones the one thing to note about gubernatorial elections is they happen all different years right so as opposed to congressional elections or presidential elections it's happened in this nice regular two and four year cycles gubernatorial elections happen at all sorts of weird times and so you know any given year you may have a whole bunch of gubernatorial elections you may have three so that's one thing to know when you go in searching for this you need to know sort of what the cycle is for the elections in that particular state so you know which years to look in so some of the key sources here cover gubernatorial elections as well so I won't really go through those now at the precinct level so this is the really really disaggregated election data so down to the individual voting precinct where you actually went to you know the local school or whatever to cast your vote you can look at these things down at that very granular level it's a huge amount of data these are giant election files but there are some good sources primarily for more recent years to be able to get this precinct level election data so the most recent one is Harvard election data archive which covers sort of 2000 mostly 2002 and primarily 2004 up to 2012 this is a big project done by some scholars out at stanford and a few other places and it's freely available in the data verse also includes some gubernatorial elections and a few statewide offices in some cases as well one of the interesting things about this today tried really hard to align the precinct boundaries to the voting tabulation district boundaries as reported in the U.S. census so that you could compare things to socially demographic data that's in the U.S. census that's not an easy task those align those boundaries don't always align and many times they had to sort of you know basically make a guess they had to do sort of imputation of certain districts where they couldn't find the actual shapefiles or some cases they didn't exist which is one of those scary things about having more than 3,000 election agencies administer elections in this country is surprisingly frequently they lose data from previous elections so all the more reason I guess to collect it now for the 2000 election out of I want to say this is out of George Washington University there's a federal elections project that covered that entire election again does some matching up to the 2000 census you can do some analysis really granular analysis down to a small geographic level in conjunction with demographic and other kinds of social data that are included in the census that's kind of an interesting and helpful one the sort of early precursor this by Gary King out of Harvard and a bunch of collaborators was the road the record of American democracy that covered all elections sort of at and above the state legislative elections for all the states that 170,000 precincts nationwide from 1984 to 1990 so this was a huge undertaking at the time and was obviously you can see this was a gap there of 10 or 20 years before people really felt up to trying to do this again but it can be downloaded in a variety of formats and is freely available online as well and finally like I mentioned early open elections they're trying to for more recent elections keep up with precinct level stuff and the hope for them is to eventually automate this so that as things happen in the future they can more easily just kind of pull this all directly in and sort of machine read it and machine clean it up but that's the sort of a pie in the skydream and we're a long way off of that because many elections agencies still only report these if you request you know like precinct level election data or even any kind of election level data from some of these places they'll still only give it to you like in a PDF format or even sometimes a print out so maybe someday we'll be all machine readable now on the statewide level you can get there are number of sources you can get this from several of these are ICPSR data collections this first one is state legislative election returns it's done by Carl Clarner he used to be at Notre Dame and I think he's sort of gone sort of independent now an independent consultant and this includes state legislative general election a few primary terms mostly just general elections down to the legislative district level and has a lot of other sort of features in it coding for candidates and the type of districts and the type of elections and the like but that's a great collection probably the best one for state legislative elections there's an older collection in ICPSR that tried to collect statewide offices so even these things like lieutenant governor or attorney general secretary of state and whatever down to the county level it only covers a handful of states states obviously that made this more accessible and easily collected but it's a good collection if you're looking at sort of more historic elections down at the state level which are really hard to find this collection here is really interesting this was by Philip Lampy who worked for the who was a research at the American Antiquarian Society covering the early history that are public and he did extensive research over decades and kept all these hand written notes this was before like you know well it wasn't before type writers but it was before computers when he started all this and in conjunction with Tufts University they've gone through and digitized his extensive collections of notes coded it all made it into data formats as well so it's an interactive format that you can go on to I think I have this one up I'll show it to you real quick and so you can come in and you can kind of browse by state or you can go by type of office and you see there's all sorts of things in here both state and it goes down to the local level as well you know so like councils county city councils you know sheriff's often in the county level city councils mayors city assemblies state assemblies so there's a really rich set of data in here and you can actually go here to this link and he they've made available all of these data sets as sort of tab separated files that you can download by state so that's a pretty cool source for many early years where some of that data was fairly spotty so that was a life labor now wonderfully converted into digital format another source for more recent ones this site our campaign tracks state offices statewide offices so primarily governor and secretary of state and an attorney general I believe this one the results are all on HTML table so it's a lot harder to kind of extract that data I suppose you might be able to scrape it right to some kind of a JSON or Python script to scrape it but the one drawback with this one is that it's all done by volunteers so it can be fairly quickly up on the web after elections happened but they generally don't source anything so I'm a little leery of using it but there may be some cases we run into sort of statewide stuff that you want to get across multiple states and it may just be easier to get here at least as an initial cut of the data and finally open elections does have some stuff down at the state legislative level now at the local level local elections are the most prevalent in the United States obviously we have a lot of local election jurisdictions in this country but they are one of those areas that has generally been understudied in political science primarily because of the difficulty of getting access to the data so like you think getting presidential elections down at the county level is hard you know try working with municipal elections data it's just a nightmare and so this first one is a project that's been working out a rise for a couple of years it's still in process and they don't actually have any of the data fully up online although they have an interactive database to show you some of what they've collected and it covers various years some stuff goes back to 1970 but you know it's mostly going to be more recent years probably the last 20 to 25 years or so I guess but I did get some contact with them it looked like it had gone sort of dead because I hadn't seen any change in that website for more than two years but I reached out to them earlier this year and it turns out that they're still working on it they don't have everything up yet and it looks like they're eventually going to sell the data I'm not sure in what format or what pricing structure but they did say that for academic use that people are welcome to contact the principal investigators directly and that they would be willing to share some of that data so that's a good source to know about and then for other local elections data you're really sort of going to have to go state by state hopefully not municipality by municipality but you know you will in some cases but there are a couple places that have done a good job so like California has done a wonderful job some different institutes in California that have created this California elections data archive goes back to 1995 but has a lot of really useful data kind of the local level including local ballot measures which is pretty interesting and hard to find sometimes you'll even find these from state boards of elections surprisingly enough Kentucky being a really good example this is one that I've found in recent years that for you know roughly the years kind of 2010 to the present last I looked they had they included local elections in addition to their statewide elections and federal elections and they had it down to the precinct level in multiple formats now this is sort of like as far as I can tell the gold standard for a state board of elections and not one that happens very frequently and again a new nation votes for that older sort of early republic time does have some of that local elections data I do want to mention some stuff about voter turnout so just to illustrate this you know in terms of the long sweep of voter turnout in the United States you know we have a history of being sort of mediocre voters in this country in the sense that for the long term most of the 20th century into the day we really fluctuated around 60 percent of eligible voters actually show up to the polls so the higher presidential years much lower in congressional midterm elections and we've never really had more than 80 somewhat percentage you can see by this this nice diagram and I was looking we have this digital archive of Stefan our special collection this is a piece of art by Thomas Nast who is American sort of political satirist and I really love this one and it's sort of the line on the lamb the lion growling about his rights as you can read the little handful he says before the election let the truth come out if heaven if the heavens fall so people get really exercised about their political rights before the election and then on election day you can see the lamb when duty calls his paper says election day probabilities a little rain with cloudy weather and so he's looking outside thinking I'm in my nice smoking jacket I don't think I'll venture out to actually vote today I think it's a nice illustration of too often what happens and obviously it's a little bit more complicated than that it would certainly help if we had a national holiday for election day might encourage some turn out of the polls but the example of much of the developed world suggests that unless you institute compulsory voting you most of the developed world is stuck somewhere around 70 to 75 percent turnout so in terms of turnout data the primary one comes from the census bureau and it's their voting and registration supplement now this is part of the current population survey which is primarily an economic survey but every November at least in election years they have a supplement that they do that covers voting voting and registration and so they ask their large samples about 50,000 people they ask these people about you know whether they voted and they ask them how I have this on that slide I think they ask them you know whether they registered whether they voted if they didn't register or vote you know what was the primary reason for not doing that and so you can get a lot of information directly from the census of the site but if you want to get sort of data extract the best way to do is through ifms and they have a whole collection I dedicated just to the CPS and that goes back to the early 1960s now the sample size is 50,000 people which is a pretty large sample size and allows you to analyze breakdowns of reported voting and registration down to the state level which usually it's hard to get that much further down than that you can't go really any more detail than that because the sample size is not big enough but it does provide us the best source of sociodemographic characteristics and in conjunction with voting because you know voting statistics voting you know electoral turns or whatever registration in nearly all states there are a couple there are a handful of states that do collect you know like racial data other types of demographic data about voters but primarily most states don't and so we don't really have any good ways of knowing you know how young blacks voted in an election or how older females may have voted in an election unless we use survey data and so the CPS has been one of the primary sources for gathering that for a long time because since this is an economic survey they ask a lot of these other types of demographic questions in conjunction with it other sources of data just for pure sort of registration and turnout data Dave Leep has some good stuff on registration and turnout these come primarily directly from the elections agencies at the state level as well and they go down to the county level some towns in the case of New England rather than counties several states in New England mostly run their elections through municipal agencies one of the challenges with voter turnout is trying to figure out how to measure it right so often we talk about there are two primary sources for turnout in terms of the denominator like who counts as eligible to vote so often these are based on voting age which is just kind of an estimate from census figures of who's old enough 18 or older to be able to vote in an election and that serves as your denominator that's come under a lot of criticism because it includes a lot of people that are not actually eligible to vote even though they may be old enough and they're going to be counted in census so primarily two sources there which would be immigrants or people that are not actual citizens of the United States and then more controversially but also very importantly felons right so we have a huge felon population prison population in this country as you may or may not be aware and in most cases most of those people lose their right to vote when you get convicted of a felon so that's done governed at the state level there are a few states and there's been a push for reform on this more recently so that a few states have pushed to restore voting rights to felons as soon as they have finished their serving their time in prison there was sort of a controversial case with this in Virginia just recently in Terry McAuliffe and people looking at that as you know an attempt to sort of push things in the Democrats favor because obviously the or maybe not obvious to everybody but clearly documented the fact that the felon population is predominantly African-American and so they're differentially affected by the loss of voting privileges right and since they vote overwhelmingly like 95% vote for the Democrats that was seen as maybe an overtly political move by Governor McAuliffe but anyway there's a nice source from Michael McDonald who is a professor at the University of Florida who's been doing research in this for a long time a lot of years and provides some data on his website about how to recalculate census figures to take into account these different populations that wouldn't actually be eligible to vote and when you do it that way one of the things you find is that turnout is still not wonderful in this country but it's not quite as abysmal as it would appear at first blush finally there are I'll put these last two up at the same time these are both actual they were print reference sources but on the Sage Knowledge platform as an e-book and actually I worked with them when they first transferred this over a couple of these used to be like separate subscription sort of interactive sites you could download the data from CQ Press and then when CQ was bought by Sage and then they converted this all over to the Sage Knowledge platform these e-books just turned into PDFs well I threw a fit and maybe some other people did too I don't know but they went back in and added in for these primarily statistical publications there are just lots of wonderful tables of statistics on turnout more generally and then turn out specifically for the African American electorate and they've gone back in and added in the tables in both Excel and CSV formats so if you have access to those as e-books on the Sage platform you can actually download the tables themselves right directly out of the publication so I'm going to wrap up here just with a final kind of brief mention about measuring elections and how elections are administered and performed so there are a couple of sources for this this first one is a collection by Charles Stewart who's a professor at MIT studies a lot of congressional elections more recently has been interested in studying how elections run in the United States and he has a huge data verse that he's put together pulls together several of the major sources for measuring these so the Pew Research Foundation has what they call their elections performance index that they now are measuring a number of things 17 indicators that look at things like voter registration rates turnout percentage how long people waited at the polls whether provisional ballots were rejected in different states and what percentage and the like to kind of up with an index of how well states are actually running their elections how open they are how well administered so that's an interesting source the U.S. Election Assistance Commission is also something that administers a number of federal programs primarily sort of the National Voter Registration Act and the UOCAVA Act which is Uniformed Overseas absentee civilians something or other voting basically it's the source for making sure that U.S. citizens that live overseas primarily are active duty military and their spouses but also other residents who may live abroad have access to absentee voting privileges and they have done series of surveys on their work over the last couple of years try and measure how well that works so you get a lot of really interesting things primarily in terms of data that people use in research it's often questions about like how long did you wait at the polls how easy was it to vote how much information were you able to actually get and understand from elections websites and things like that so the Census Bureau's BRS supplement is a good source for some of that as well in terms of whether people registered and whether they voted and what mode they used to vote like absentee or in-person or honor before election day same with the survey of the performance of American elections now this is a more recent survey but it's meant to go down to the state level so it's a large sample of about 10,000 people 200 drawn from each state so you can have good comparable survey data across states finally the federal voting assistance program again is meant to help ensure that people overseas have access to absentee balloting and can get the information they need to vote and they've done a series of very extensive surveys actually asking primarily the military but other overseas citizens as well about their experience with voting outside of the country and the final two sources they're actually from a couple of surveys so the cooperative congressional election study which is a very large sample 30 to 50,000 people every they do it every year but it's primarily every other year with the congressional midterms they have a couple of questions they ask about people's experience of voting on election day or what mode they use to vote same thing with the national Annenberg election survey which is primarily about sort of political communication and political knowledge stuff but they usually do incorporate a couple of questions about people's voting experiences as well so that's a lot I know I just like talk like the motor man for the last hour but I hope that's been useful and like I said you can go to the election guide that I put up and I have a lot of other information there as well including sort of more referencing materials stuff about campaigns which is not so easily quantified party platforms and nominations as well as some interesting stuff on kind of election maps and visualizations and the last source that I didn't get to mention that is actually data source is sort of data from prediction markets so primarily betting markets about U.S. elections which have turned out to be fairly accurate sources of predicting who the winner primarily presidential elections but who the winner of the election will be so I'm going to leave it there and if there are any final questions people have if they want that they've stuck around for then I am happy to answer questions thank you so much Jeremy this is fantastic