 I am kind of flattered that so many of you came at this time of night to hear about politics. Just very quickly to introduce ourselves, my name is Lucy Chambers. This is... Tony Budden. You can do yourself. And we're going to talk to you today a little bit about a project that we've been working on. Tony for a very long time in various guises. Me since about June, sorry, earlier May this year, to try and get huge amounts of political data into wiki data, specifically about national level politicians. Hopefully though, while we start talking about politicians, this talk should be relevant and useful to people looking for tools to promote kind of data quality, consistency, things like that. Because a lot of the tools which we're going to talk about in this presentation are hopefully much more generally applicable. So yeah, it starts a bit politic specific, but then you'll get to see lots of other things on which we would love your feedback. Because a lot of these things, this is the first time that people are really actually seeing them and they've only just been finished. So we would love a load of people to try them out very soon now. So very quickly, just to give a quick lie of the land, when Tony started working on this a long time ago, I think there was an assumption that there was a lot more data available in the world about who politicians are in individual countries. It's actually kind of shocking what is available, or more specifically not available about which politicians, who actually represents the people in individual countries. So that actually means that answering even really basic questions, like this was actually published last week I think by Spiegel online, really basic questions like what is the gender breakdown of houses of parliament around the world, how representative are people's governments, even questions like that are incredibly hard to answer at the moment and that is pretty much one of the most basic types of data. We're hoping, so aim one of a project like this is to make sure that we can answer questions like this. But going forward, obviously we're interested in more interesting things like how did people vote based on whether they were representative or not of the people that they were representing. Sorry, it's late. So yeah, just things like this that Spiegel online are doing actually take a huge amount of effort at the moment and this is the type of thing that we think should be easily answerable within a Wikidata context within a matter of minutes. Sorry PhD students, like we think that the vision of Wikidata is to put you guys out of a job or at least to make sure that you don't have to spend three years collecting data for your PhD thesis because a lot of these things should be able to be built collaboratively and used a lot faster. So I'm going to hand over to Tony to tell you just very quickly some of the specific problems that we faced about political data. So I think a lot of people assume that one of the reasons that Wikidata doesn't have lots of this is because the world is a complex and messy place and every country does this differently so it's hard to model in a sensible way etc. I actually don't think that's that true. Most of the core underlying concepts are generally the same everywhere and we have all the right properties and qualifiers to say someone is a member of a certain legislature representing a certain area as a member of a certain political party or group, start dates, end dates, legislative terms, things like that. I actually think having been wrestling with this sort of data for several years now that actually one of the reasons that Wikidata isn't as advanced in this area as in lots of others is that the power user tools don't work well with political data. It's relatively easy to put a lot of this data in by hand but we all know that the kind of dirty secret of Wikidata is that actually loads of this stuff is put in by bots or by sort of mass imports with quick statements and things like that. But most of those tools don't work well because political data is slightly different. Most of the important information is in qualifiers. People hold the same political position more than once so someone can be president for a while, leave office, come back for a second time. Quick statements for example can't cope with that. It adds all the qualifiers from the second one on to the first one. Legislative office people hold way more times and they can be re-elected six, seven, eight, twenty, thirty times. So a lot of the people who have tried to put a lot of this data in go to these tools discover that they didn't really work very well and are like I'm not going to sit and type this all in, that'll take me years. So actually we've ended up having to rewrite a lot of these tools to cope much better with political data which is a lot of what we're doing to see. Can I just get a quick feel? Can you just put your hand up if you have used quick statements before? About half of the people. Okay, so that just gives you an anchor for how much you should go into detail there. And I'm just going to talk about how we started because I think this gets into the nitty-gritty of some of the problems. One of the other things which obviously isn't unique to political data but it's kind of, it obviously throws some, sorry I've got German words in my head, some little barriers in the way is that we need to find some kind of way to make entering stuff into WikiData accessible for people who are domain experts but who aren't necessarily familiar with WikiData. And some things we've had a huge amount of success about, things which are for example relatively general knowledge. I think we ran at what we called a mission early on in the project and a mission we defined as essentially setting people a challenge, getting people to enter data into WikiData. And the topic was help us to find all of the heads of government in the world and we walked people through all of the different steps that they would need in order to put that data into WikiData. And that was incredibly successful. We ran kind of a staggered challenge, there were five steps to it and each of them were done within about two hours, which was much more successful than we thought it was going to be. We then attempted to repeat that with slightly more abstract concepts such as, okay so tell us how many seats each of your houses has, your upper house and your lower house. Surprisingly not quite as much interest and I think that that kind of reflects a bit some of the different levels that we need to tap into, things that are kind of general, well my theory on why it was more successful, things which are more general knowledge, high profile politicians, people are very happy to enter that stuff, but more nerdy things which get into the nitty gritty of how to actually model things, missions weren't quite so successful for. So I'm going to hand back over to Tony to talk about country pages. So what came down to doing things at an individual country level, we didn't think it was as easy to replicate what we did with missions because we'd have to run like 200 of them in parallel all the time, which is outside our ability. But we discovered that one of the big problems that people face is even knowing what the quality of information is like in their country. We've gone through a variety of different ways of teasing that out and this we've largely inspired by the way some of European things works. So they've got a really nice model of you go in in a country where you just quite easily slot in here is an art gallery in the country and it instantly gives you a whole series of questions to see what paintings, what you did already knows are located there and lots of things about it. So we seamlessly stole lots of the ideas from that and basically for a country you can just set up for each house of your national parliament or even all your regional ones and you actually go forward to one. And it's really simple. You literally just drop in a single row that says what the item is for the legislature and what the item is for being a member of that legislature. And if you go back... Oh sorry, go back. Spoiled it. You get this kind of row and we're still experimenting with how many things we can fit in here because we've lots more that we'll show but it gives you lots of really basic queries that let you see things like... Every constituency that's currently set on being a member of that or a list of all the members it gives you two lists, one of the ones for the current members and one of the ones for all the members ever. So that seems like it should get you loads of results but in lots of places it gets you a very small number of results because even though the people exist nobody's ever set up the P39 position held statement of being a member. We also do ones for looking for gaps in data. So this is a report that finds all the people who have been a member who we don't have a given name filled in for and sort of it sets you up. I think there's probably about ten or a dozen queries like that that it just gives you out of the box that you can go through and kind of look for gaps without doing anything more than dropping in those two identifiers. But what we really want everybody to do is actually set up reports for these things rather than just running queries. Because what we find is that if you turn the sparkle query into a report, how many of you have played with Listeria? Is this a meaningful thing? Yep, so Listeria basically takes any sparkle query turns it into a table that you drop in on the page. That has several really good advantages. One is that you can add it to your watch list. So then when the data changes underneath you actually find out so somebody comes in and accidentally deletes something or changes something or a core modeling thing changes. If it's just a sparkle query that every now and again you come and run it might take you a long time to notice. This way you get to see every day or however often you look at these things. The other really useful thing that lots of people it seems don't know about is WD edit. How many people here have played with that on Listeria? Yeah, considerably less. Two people. So this is built in in Listeria. If you say WD edit equals true in the config of it you get the same table but when you hover over cells you get a little add button where you can type straight into the cell which is on the next slide. It pops up at a new value with the dropped on and instead of having to go to the item page and work out how to do the addition so particularly add events and hackathons and things like this it's a much lower barrier to entry to people to go to a list of go fill in all these people's birth places or giving them more simple data like that. So again turning your sparkle queries into structured tables actually has loads and loads of really good advantages. I wish you could film from this side people's faces. That was great. Excellent. Okay so the other much more complex one that we did was so reports are really good at showing you what data exists but obviously what they can do is show you what data doesn't exist. So because we've been working with political data for a long time before we were doing all the wiki data stuff we've built like an army of I think we've about 1200 scrapers now that go out and sort of scrape every parliamentary website in the world and lots of third party sites and we've basically built up loads of external sources of data so what we want to be able to do is compare those to wiki data. So we built a tool that you give it a sparkle so you give it a URL to an external CSV file and then you make a sparkle query that you expect to give you the same results that are in that CSV file and then it basically shows you the differences between them. So we did one for the members of whether that's the senate or the lower house in Argentina and we discovered there was loads of really good data except none of them had the feel to say what political group they were in. Wiki data group here is political group. So it's basically showing you the diffs if the two are the same it just shows you them in one but if there's a different that splits them in two shows you what it got back from sparkle what it got from the CSV. So you can see diffs quite simply. So we discovered well none of these have these groups okay so we need to get that in. So then we have a slight digression of how on earth do you get all that data in. As I already said quick statements doesn't really help. A it doesn't really work very well with information like being a member like this because someone will be a member for multiple terms and it'll put it on the wrong one and B it's not really very good at just adding a qualifier to something rather than creating a whole new statement. So we ended up having to write essentially a replica which we call position statements which works much much better with position held statements and instead of trying to be clever and attach everything to the existing statement it just creates new ones every time. So the input format is identical. So anything that you would have been able to fit to pass to quick statements but quick statements doesn't work with you can pass to position statements and it'll hopefully do the right thing. So then we can go through all the things in and suddenly yep yep go ahead. So the new result from the comparison shows us that the only difference is that WikiData believes someone is in a different party to what the official website scraper says they're in. So that's perfect that's exactly what you want to know. You're never going to discover something like this within WikiData unless that's knowledge you have it's not going to trigger any constraint report it's never going to come up empty with the wrong information. But if you can compare to an external source the external source might be wrong we've discovered way too many official parliament websites aren't actually up to date so we don't want to do anything fancy and automated to keep these up to date we want a human to resolve this but it's a tool that will actually show you that and it works even better if your external CSV file is one that refreshes regularly because it's from a web scraper or whatever and had this live for a day or two suddenly it changed to be this where the external source now has three new members and suddenly the watchlist notification triggers hey this has changed again and bam you can go ah three people that we need to go make sure they're actually real and add the data. So obviously you can do this for any kind of external source that you can get a CSV file from it works best if you have shared identifiers because then you can know that they're the same people rather than just diffing on names etc but it also works really well if you run it against Wikipedia pages for example because you can resolve the links that way etc and it's a really really nifty tool for making sure that information that changes doesn't get stale in Wikidata. I think that's a good point if people are looking for things to do that are very useful and you know that your government data catalogue or whatever has a unique identifier but there isn't a Wikidata ID for it please register one that makes things a hell of a lot easier we really want properties for the official member IDs of every legislature in the world there's only about twelve to fifteen of them at the minute I think we need a lot more it's one of the most useful things you can do and it doesn't take a huge amount of time okay so still me the other really interesting tool that we've talked about is Listeria but again it falls into the same trap of not coping with political data very well so if you wanted to get a list of all the prime ministers of your country again if someone has held that office more than once Listeria ignores all but the first one which actually is really annoying if you're trying to build a chronological history of that so again we built a replacement for it that copes with that again very simple very straightforward except if we're going to go to the trouble of having to write that anyway we might as well make it do more interesting things so it actually generates feedback on things that look wrong so as it goes down the list you can see on the first one there's an inconsistent predecessor on this so quite often it says this person was replaced by this one or follows this one etc so it looks for gaps omissions or those being out of sync with each other which again is very very difficult to see if you're just looking at a single item because unless you're a real expert on that field you're probably not going to notice that it says they replaced someone who actually they didn't who was two before or something but when you see them in a list like this and it highlights oh this doesn't match with this or you can see the date overlap there where one person says he continued until January 45 but the previous person says he started three months earlier than that it might be true occasionally you can have overlap in these things but actually it's much more likely that one of those is wrong again we can't know which one something needs to go and investigate that and discover that but again it's something you're never really going to notice looking at an individual item unless you're a complete expert on that person but seeing it in a list and going ha seems to be an overlap here that probably doesn't make sense usually all the things that we think need investigated in that list so any position that expected that only one person will ever have had at a time it's a really nice way both to see what data is there but also see what's probably wrong with it so the other big thing is people have no idea where they're starting from lots of people in lots of countries assume that the data is probably pretty good and actually unless you're in about four or five key countries all of whom are probably represented in this room by the key people who made the data good it's really really not but we need a really simple way of visualizing that so we built little simple progress bars for key things so on the qualifiers of being a member of it how many of them have started parliamentary group when they were elected things like that and the biographical data of the people and obviously you can customize that for each country the really neat thing about this is this is using Listeria as well those people think Listeria can only output tables but it really can't you can specify a row template for each thing that comes back so a sparkle query that gets each of these things then just calls the template as soon as there's a little progress bar so again this will be updated sort of every night and you get to see the progress in your country okay and just by way of illustration one of the things one of the things that we have been doing and again things that we would love to source from you in order to check how good the data is we were trying to do all kinds of random things where we knew the answers to particular questions and then just to show us what what wiki data could produce and if the if wiki data could produce the fun queries then we were pretty sure that it could produce the serious ones as well so here's just a couple of examples this is the first time that the UK Parliament had more women in it than people called John that was 1992 which is somewhat shameful and yeah so relatively yeah kind of indicative but useless query and this was a map of the UK for MPs born outside the UK or Ireland so you can see all kinds of things it's interesting this one because you can tweak it and most of the countries in the world don't have quite this distribution Finland I think most people were born in Finland but yeah it's just a silly thing I think Andrew did you do this one or was that Tony did this one anyway Andrew okay yeah anyway yeah we also we ran an event in London and there's an event page for that which actually isn't linked from here but there is thanks to Andrew who sat at the back a very interesting list of some might say trivial queries but very very interesting things including political dynasties including people who sorry parliamentarians that killed each other all kinds of things like that and so we're just trying to see like how descendants of slave owners things like that so just using those as a kind of lever to see okay if we know what the answer to these are how good is our data really so one of the interesting things about this as well is that once somebody has built this in one country ideally it's really trivial to run exactly the same query for any other country if we all standardize how we add this data part of the problem at the minute is different countries have put the data in a slightly different ways so the queries don't work the same way and one of the things we're trying to resolve by doing interesting queries like this these people are going to look at them go ooh how did I make that work for Germany oh I get no results is actually say ah but if we all agree on this all this stuff you get for or if we're consistent enough we don't have to all agree on everything sufficiently consistent great so just very quickly a couple of ways I think I can do this in a minute a couple of ways which you can get involved on so between November 20th and November 30th there is what is called Global Legislative Open Week and we are trying to run a series of events around the world to get people's data about legislators obviously better we have a small amount of funding so that we can support people if they are interested in running events in that week so if you are interested please get in touch there's details of how there the presentation is linked from the page or you can talk to us and it will be fun, there will be lots of people and lots of noise going around and Tony I think you should do this one as well so we basically there are three or four countries that have set up dedicated Wiki projects in most countries that's going to be overkill especially if there is only one person working on this data so we want there to be kind of mini ones like I said earlier shamelessly some of all paintings does it come along set up the report that I saw earlier that will generate lots of other reports that you can fill in just add your country to this list see what's there flesh it out over time and hopefully we can get lots of people adding lots of really good political data for the whole world Yay and then I guess the last thing I should say is that in this room after we wrap up we do have a Wiki data and politics meet up it's going to be very relaxed organizing it with us and we're just going to have some drinks try and work out what people are struggling with in individual countries whether we can help each other out I know it's late so it's going to be very lightweight and delightful yeah that's everything and yet we can I think get drinks from somewhere Thank you very much So we should also explicitly thank the Wiki Media Foundation for giving us a grant to provide us to build lots of these tools Thanks Wiki Media Foundation Yay