 Thanks for having me. So it's about the ideal CSV template. So my name is Roger Fisher. I'm working on a project called Datamap and we focus right now on elections worldwide. We mapped the US in different ways, electoral college, states, counties and we also started working on the counties themselves on the precinct maps for the counties. This is Amador in California for example. This is Jackson, Michigan and this is where I live in Berkeley, Alameda County. So blue is for Democrats obviously and the red part is for Donald Trump. And the problem was while we were working on this, we encountered many, many problems because there is so much variety and so we more and more we went to the point where we said well this is kind of a big thing, much bigger than we thought at the beginning. We just wanted to map some stuff, I mean some data and so we called this now the election project or the LAP and we, the format is CSV and we thought about kind of a system of modules where we have the election results, maps, that's what we are doing right now but also some more experimental stuff like color schemes and political positioning, candidates, parties, I mean money in regards to candidates for example, something the federal election commission does, also hardware vendors, voting hardware vendors and stuff like that. So it's also next to the modules, it's also a way of how to work with data, basically how to name data and we kind of coined some terms like raw, X for external data, then the data when it comes together with the map, the data when it's transformed and I will explain that with an example. So the idea is to have a standard way of accomplishing things so that you can talk about it and yeah, it should be easy to compare election data over time and space, that was what we really wanted to do and it's not. And the problem is especially for county data, it's really hard to find the data, you have to search for every county, also for some countries it's easy, like France, you have it in the night of the elections, you don't have exit polls but in other countries like the US it's the official data comes two months later, basically. And I think that's really a big problem in the US because this kind of Trump saying there is election fraud, this comes because we don't have this data in the open, it's not published, it's not nobody sees it, there are maps all over the place but these maps are most often incorrect, so even the New York Times had a map that was incorrect for about three months. And many publications have still maps online or people will work with data have maps that are incorrect, so I think that's really has to change. So the focus of this talk is to have some names so that we really know about what we are talking, which we call the minimal set and then some issues like writings which are really tricky, the residual vote and then the process and then also percentages which you need to encode color and then some mapping issues. And at the end something more experimental which is colors and positioning and the idea of a global color scheme. So make the implicit explicit, that's the idea of the minimal set. So it's very a few terms but these terms we have to get them right and interestingly they change and or people use them differently. So we have in the US mostly registered voters or we have only registered voters in other countries we have eligible voters so the difference is basically in a country like the Netherlands everyone can vote which is 18 more or less like 56 people can't so they are eligible voters in the US you have to register to vote and so it's a much smaller portion of people. Ballots cast that's really when people vote the ballots they send in I mean if they're electronic or in paper so that's all the votes the total votes if you want and the turnout is the ballots cast divided by registered times 100. And then sometimes you also encounter kind of turnout for eligible voters turnout voting age so that's often something you can see because you want to know how few people basically voted in an election and in the US that can go down to 30% or 35% in some states and when you have turnout registered that all looks much prettier so you could have 70% turnout with the registered voters but only 30-35% turnout voting age. And the next thing that's really important I think and gets overlooked a lot is ballots cast you have valid votes and you have a residual vote and the residual vote is all the invalid votes that are blank votes over votes under votes all that stuff. And yeah that's the residual vote I will get to an example for Florida is something that can be very really interesting to track. So tricky the writings are really tricky because it changes from state to state so you have states where every every writing is a valid vote so basically someone can win the vote if if there are enough people voted this writing in then you have to once you have to be official writings and then you have other states where you can't have any writings only official candidates and so in Florida for example you really needed to know which ones are the official writings if you would vote for them that they would count and if you would take Bernie Sanders that would be an invalid vote so we have to differentiate that when we count that writing either in the valid or in the residual vote. And now to the residual Florida which is a state which is super important for in every election basically it can decide the election this time the difference between Trump and Clinton was 112,911 votes and we already know about 1.6 million Florida residents are bared from voting so already there are a huge part of people who can't vote and suddenly we also see and that's the official document from Florida the residual vote jumps from 0.75% to 1.69% from 2012 2016 interestingly the paper which talks about this from the Florida election commission only operates with percentages and it's really easy to get the numbers so and they say the reason for this is because we had mail the mail vote basically is responsible for this increase and then I thought well let's look and make a comparison with another country and the Dutch elections came up as well and so I just took the numbers from the Dutch because they pretty much fit with the Florida case so we have about 12,860,090,000 eligible or registered voters for Florida and we have only if you go down to the residual that's like the third block residual equals votes and you see that Florida jumped from 64,085 to 160,450 whereas the Netherlands have only 47,415 residual vote so the difference if you subtract the Dutch vote residual vote from the 2016 Florida vote is 113,035 and that's higher still than the margin so I find that really problematic and also if I see up there basically the difference from basically almost the same amount of people vote but one country is 3.5 million has a population that is 3.5 million higher so I think it's really worse to see these numbers and that's something we don't see if you don't have the name for it the residual vote so two weeks ago I was in Yosemite and it was like well we do all these maps and how did Yosemite National Park vote that could be interesting I knew it's I mean I already knew that it's a Republican County even though it's in California but people voted Republican but I wanted to know just the national park is that also Republican dominated or is it maybe the precinct is different so when we encounter a raw election data we have the good we have so people who already use CSV text Excel we have the bad that's PDFs but they're still extractable with Tableau for example and we have the ugly basically a photo as a PDF so photo of the of the data and you can't extract it so the only way of doing that is by hand so far unless someone comes up with an idea how to do this with machine learning kind of very intelligent way of doing that so in Mary Poser County where Yosemite National Park is located that looked like this at first so and then we went from the raw so we call this the raw we went from this we brought it to CSV we cleaned it one header no void lines we reduced it so all the male votes we brought we brought them together with the election day votes you can also separate them but in this case we bring them together we structure it along the minimal set we use the white format if you know like tidy data from Hadley-Wickham where the long form it is used to but we used the white form because we want to map it afterwards so we went to this and then further down to this and so you see registered ballots cast turn out valid Stein Clinton residual very simple and then we bring it together with with kind of the the map part this is the map part which we call the ref the x part again and then we have all the data together so that's that's the process basically and now we still it's still impossible to map that so we need further data transformations and there again we have a problem so you want to normalize to to basically color the map we need to normalize the data and have percentages and so what percentages is it candidates ballots cast is it's is it can't candidate divided by by valid and so I looked at maps and I looked again at Florida and checked newspapers so the in Florida officially it's 49% the New York Times is 48.6% Cook has 49% I think this is political CNN they have even and that was like two weeks ago a 14 40.6% 49.1% 40 yeah so so you have all kind of data and all kind of percentages and so we just try to define these things and say well it's candidate divided by valid times 100 it's as easy as that and you get to the numbers but I think it's really worth doing that once for all so that everyone everyone can follow that and so basically yeah Yosemite is democratic precinct in Republican County so it's at Clinton had 67.12% and Trump had 17.81 and then I just go over that we have some edge cases with maps maybe one interesting one is about beauty County also in California which is again Republican County you can have some issues where we have basic the data for seven eight precincts in instead of just one and so you have to kind of solve this problem too so now something a little bit more experimental in this election project it's about color and yeah all entities move and nothing remains still things change and so how do you do that with color how do you have consistency over time here you have the Netherlands election 2006 2010 2012 and 2017 and you see if you have consistent colors it's really nice you can see what's happening over time how parties kind of fade away new parties come take over and or some parties again come up or go back and yeah with we also do things like color scales in four countries then you have even so you have even more problem with with color because you have if you have more parties and you want to also show the differences you even need more colors and and the problem was could you I mean if you're not from the Netherlands could you read this map or could you read this these maps here this french maps they are all the same but basically you don't know which party is what and so we thought it would be really interesting to have a kind of a global color space where we have like left right progressive conservative so that's one idea here a more subjective one here and just go over that and if we combine that with this compass idea like here and we see also how parties change uh we could color our maps in a better way and we could probably see some things better than before so I found the part Trump Wilders Le Pen they're all in this dark conservative part really interesting and here also another one where you see basically a prediction what would happen with the democratic and republican party and what really happened and here also I mean obviously there are also people put politicians differently in this color space so yeah that was it just a quick recap the ALAP the election project is a set of modules we looked at election election results at the precinct level the stages from raw to xed from data to the map the minimal set which really tries to define things very very clearly so that we can use them everywhere in the world and then some more experimental features general global political color space so that everyone can see when he looks at the map what's really happening and also what's happening over time with a compass nearby to position candidates and to see how they change thanks