 Testing, testing. Is this heard in the room? Testing. Hello, everyone. This is a gentle introduction to Wikidata for absolute beginners. If you're an absolute beginner, if you've never heard of Wikidata, or if you've heard of Wikidata, but don't quite get it, don't know what it's good for, have only used it for inter-Wikilinks, if you're anywhere on this range, you're in the right place. My name is Asaf Balthou, if I work for the Wikimedia Foundation. And I am a Wikidata enthusiast. So the first thing I want to say is that you are lucky. You are lucky because Wikidata is already and is quickly becoming even more of an important research tool for anyone who's trying to ask questions about large amounts of information. It will become more and more used across the humanities in particular because of the things it is able to do, some of which we will demonstrate shortly. And you are lucky because you get to find out about it now before most of the world. So by the end of this talk, you will be a Wikidata hipster because you'll be able to say, oh, yeah, I knew about Wikidata before it was cool. So before we actually visit Wikidata, I want to share two key problems that Wikidata seeks to solve and which would help us understand why it exists. The first problem is that of dated data. That is data that is out of date. And this is apparent on Wikipedia, across our free knowledge encyclopedias. Data on Wikipedia is not always up to date. And the more obscure it is, the more likely it is not to be up to date. So the Polish Wikipedia may have an article about a small town in Argentina. And that article will include information about that town, like population size, name of the mayor, et cetera. And that information, ideally, was correct at the time the article was created on the Polish Wikipedia, maybe translated from another Wiki. But then how likely is it to be kept up to date? How likely is it that the Polish Wikipedia would give us the correct and latest numbers or data about the population size of that town or the mayor? So this is the kind of data that does go out of date. Every few years, five, 10 years, there is a census. And now there are new population figures. Now the census in Argentina will be made available in Argentina, in Spanish, probably, which brings us to another component of the problem of data data, which is there are no obvious triggers for updating the data. So the Polish Wikipedia is not sent an email by the Argentinian government saying, hey, we have a new census. There are new population numbers for you to update on Wikipedia. No such email is sent. So it's kind of hard to notice when, and of course, multiply that by all the different jurisdictions around the world, there's no easy way to notice when your data goes out of date. So that's difficult to keep up to date. And even if we were to receive some kind of indication, oh, there's a new census in Argentina, so a whole bunch of population figures have now gone out of date, updating it on the Polish Wikipedia and the French Wikipedia and the Indonesian Wikipedia and the Arabic Wikipedia is a whole bunch of repetitive work that a lot of different volunteers will need to do just for that one updated piece of information about Argentina. So I hope this is clear and resonates with some of your experience editing Wikipedia. Data that is out of date or that needs to be updated manually, menially on a fairly frequent schedule across the different countries and data sources. The other and I think maybe more interesting short coming or problem that I want to discuss is what I called inflexible ways of lateral queries, cross cutting queries of knowledge. So if I want an answer to the question, what countries in the world export rubber? That's a reasonable question. That information is on Wikipedia. Do you agree? If you go to Wikipedia and read up about Brazil, about Peru, about Germany, somewhere in there, maybe in a sub article called economics of Brazil, you will find the main exports of that country. And you can find out whether or not that country exports rubber. But what if I don't want to go country by country looking for the word rubber? I just want an answer, what are the countries that export rubber? Even though that information is in Wikipedia, it's hard to get at. It's hard to query. Now you may say, well, that's what we have categories for. Categories are a way to cut across Wikipedia. So if someone made a category called rubber exporting countries, then you can go to that category and see a list of countries that export rubber. And if nobody has made it yet, well, you can create that category with a kind of one time effort, populate that category, and you're done. Well, yes, that's still not very convenient, but also is still very, very limited. Because what if I only want countries that export rubber and have a democratic system of government, or any other kind of additional condition that I would like to add to this? Or take a completely different example. What if I want to know which Flemish town had the most painters born in it? There's a ton of Flemish painters. Most of them were born somewhere. We could theoretically just look up all the birth places of all the Flemish painters and tally up the numbers and figure out what is the place where the most Flemish painters come from. I don't know the answer to that. It would be nice to be able to get that answer. Again, the data is in Wikipedia. Those birth places are listed in the articles about those painters, but there's no easy way to get that information. What if I want to ask, who are some painters whose father was also a painter? That's a thing that exists. Some painters are sons of painters. Boygo comes to mind as an obvious example, but there's a bunch of others. So who are those people? What if I want to ask that question? That's the kind of question that not only Wikipedia doesn't answer today. If you walk to your friendly university library reference desk and say, hello, I would like a list of painters whose father was also a painter. How would that librarian help you? There's no easy way to get an answer to a question like that. What if you only want a list of painters who were immigrants, painters who lived somewhere else than where they were born? There's no kind of book, I guess maybe there is, but it's not obvious that there's a ready resource that says, list of painters who are immigrants. And the librarian would probably refer you to a book on the shelf called, I don't know, the Complete Dictionary of Flemish Painters, and go look up the index. And if you see a similar surname, maybe their father and son, and kind of cobble together the answer on your own. The reason I'm comparing this to a library is to show you that this is a kind of question that is not readily satisfiable today. Now, these questions may sound contrived to you. You may say to yourself, well, painters who are also sons of painters, that never occurred to me as a question I might care about. But I want to invite you to consider that this kind of question, questions like that question, may well be questions you do care about. And I also want to suggest that the fact it is so nearly impossible. The fact there's no obvious way to ask that kind of question today is partly responsible to your not coming up with those questions. We tend to be limited by the possible. Until a human flight was made possible, it did not occur to anyone to say, oh yeah, by this time next week I will be in Australia, because that was just impossible. But when flight is possible, there's all kinds of things that suddenly become possible. There's all kinds of needs that arise based on the availability of resources to fulfill those needs. So many of these research questions, compound, lateral, cross-cutting queries, are not being asked because people have internalized the fact that there is no way to get an answer to questions like, what is the most popular first name among British politicians? I just made that up. Is it John? Maybe. Maybe it's William for whatever reason. These are the kinds of questions we don't routinely ask because we know that it's like, who are you going to ask? How are you going to get an answer to that? So this problem of not having very flexible ways of querying the data that we already have in Wikipedia, in WikiSource, elsewhere is a significant limitation. So these two key problems have one solution. And that is an editable central storage for structured and linked data on a Wiki under free license, which is a very long way of saying Wiki data. That is Wiki data. Wiki data is an editable central storage for structured and linked data on a Wiki under free license. So let's take this apart and unpack it. First of all, it's a central storage. This relates to the first problem. If we had one place containing data like population size, we would be able to update that one place and then have all of the different Wikipedia's draw the data from that one place so that we wouldn't have to manually, repetitively update it across our hundreds of projects. So having central storage makes, I hope, kind of immediate intuitive sense. But what do I mean by structured and linked data? So structured data means that each datum, each piece, individual piece of data, is managed on its own, is identified and defined on its own, as distinct from Wikipedia. Wikipedia has articles. The article about Brazil includes a ton of data, all kinds of information, and it's presented as text, as several paragraphs, several pages of text. Now, we do have an approximation of structured data on Wikipedia. If you've browsed Wikipedia a little, you've noticed that we often have an info box, what we call an info box on Wikipedia. That's the table on the right side, if it's a left to right language. The table on the right side that has information that is easy to tabulate. So birth date, birthplace, death date, death place, nationality, or if it's about a country, area, population, anthem, type of government, whatever you are likely to find. If it's a movie, then starring, genre, box office receipts, whatever pieces of data are relevant to an article about a movie. So we do already group pieces of information on Wikipedia into this structured format. Those of you who have ever looked at the source, at the page, at what the wiki code under that looks like know that it's only semi-structured. It looks neat and organized in a table, but really it's just a bunch of text that is put there. It is not centralized. Every Wikipedia has its own copy of that data. And if I go and update the population size on Spanish Wikipedia of that Argentinian town, it does not get updated automatically on the English Wikipedia or the Arabic Wikipedia. So the structured data that we already have on Wikipedia is not managed centrally. The other thing about structured data is when you have a notion of an individual piece of data, that is the cornerstone of allowing the kinds of queries that I was talking about. That is what will allow me to ask questions like, what is the Flemish town where the most painters were born? Or what are the world's largest cities that have a female mayor? I could come up with other examples all day long. These are all questions that you can ask once you break down your data into individual pieces, each of which you're able to refer to each of those programmatically. The computer can identify, isolate, and calculate based on each of those pieces of data. So that's why the structure is important. Now, Wikidata is also a linked data repository. What does it mean that the data is linked? Well, it means that a single piece of data can point at, can link to another whole bag of data. So if we are describing, for example, a person and we record the single piece of data that this person was born in Salem, Massachusetts, that single piece of data links to the item about Salem, Massachusetts. Because of course, we know a lot of things about that place, Salem, Massachusetts. So it's not just the text, S-A-L-E-M. It's not just that's where they were born, but it's a link to all the data that we have about Salem, Massachusetts. If we say someone's nationality is French, that is a link to France. That is a link to everything we know about the country, France. The fact that the data is linked and structured allows not only humans but also computers to traverse information and to bring us different pieces of relevant information programmatically, automatically based on those links. Because it's not just text. It's an actual link to another chunk of data. If this sounds a little abstract, it'll become much clearer in just a second when we see it in action. But the other components of this little definition are, of course, this central storage of structured and linked data needs to be editable, of course, because we need to keep it up to date. We need to correct mistakes. And we want it on a wiki under a free license. The free license is, of course, essential to enable reuse of that data, to enable all kinds of reuse of the data. And wiki data, unlike Wikipedia, is released under a different free license. Wiki data is released under a CC0 waiver. That means, unlike Wikipedia, where you have to attribute Wikipedia when you use, when you reuse information from Wikipedia, you do not need to attribute wiki data. And you do not need to share your work. It's an unencumbered license to reuse the data in any way you want, including commercially. You don't have to say that it comes from wiki data. I mean, it could be nice. But you don't have to. You're under no obligation to do it. And that is important to allow certain kinds of reuse, where, for example, if you're building some kind of device, you may not have a practical way to give attribution. And had we required that to use wiki data, we would have made wiki data less reusable. So wiki data is unencumbered by the requirement of attribution. And of course, because it's on a wiki, we get all the benefits that we are used to expect from a wiki. So it's a wiki, which means, yes, it has discussion pages. It has revision histories. It remembers everything. So if you screw it up, you can always go a version back. Or if someone else vandalized the content, we can always go back, just like Wikipedia. So we get all the benefits we're used to, user talk pages, group discussion pages, watch lists, all the features that we expect in a wiki. In short, wiki data is love. I hope you agree with me by the end of this talk. So let's zoom in and see what this structured data looks like. So structured data on wiki data is collected in statements. And statements have the general form of this triple, this tripartite description. Items, properties, and values. Now an item is the subject, is the topic that we are trying to describe. It can be any topic that Wikipedia can cover, and many others that Wikipedia wouldn't. So the topic, the item, can be Germany, or it can be Salem, Massachusetts, or it can be the concept of redemption. It can be anything at all. Anything you could imagine describing in any way with data can be the item. So the item, consider it like the title of the rest of the data. And then what do we say about Salem, Massachusetts, or about Germany? Well, that's a series of properties and values, properties and values. The property is the kind of datum, like birth date, or language spoken, or manner of death. These are all real properties. Or national anthem, if I'm trying to describe a country. These are properties. And then they have values. So this person, this imaginary person's place of birth, the value of the property place of birth is Salem, Massachusetts. So you can think about it as like a government form, or not government, just any form that you're filling out, where there are field names and then empty spaces for you to fill out. That's the value. So the field names, or the categories, are the properties. So name, language, occupation, date of birth, these are all properties. And the values are the actual piece of data, the actual information that we have. And of course, different kinds of data are relevant for describing different kinds of items. And the key in the value is it can be either a literal value, like if we're describing the height of a mountain, we might say just the number 8,848. That's the height of which mountain? Not everyone at once? Oh, because it's meters, the metric system. Yeah, Mount Everest is 8,848 meters. Yes, get with it, America, the metric system. All right. So it can be a literal value, like an actual number. Or it can be a link to an item, to a pointing at another item. But in this statement, it is the value. So if I'm talking about Germany, the item is Germany, and the property capital city has the value Berlin. But the value is not B-E-R-L-I-N. The value is a pointer to the item Berlin. That's the link. So a single item is described by a series of such statements. There's hundreds and hundreds of things I can say about Germany. There's hundreds of things I can say about a person. And these will generally take the form of a property and a value. By the way, some properties may have more than one value. Consider the property languages spoken. People can speak more than one language. So if I'm describing myself, we can say languages spoken, English, Hebrew, Latin, whatever. So a property can have more than one value. So if the item is about a country, it would have statements about properties like population, land area, official languages, borders with, anthem, capital city, et cetera, et cetera. If I'm describing a person, I have a whole mostly different set of properties that are relevant. Date of birth, place of birth, citizenship, occupation, father, mother, religion, notable works. Now, are all of these relevant for all people? No, of course not. It depends. And different items about different people will either have or not have these fields. So we wouldn't record religion for absolutely every person. Some people manage to do without. And also, it's not relevant for a lot of people, like what their religion happens to be. Date of birth is generally relevant for most people that we're documenting. So some properties crop up more commonly than others. A person's height, for example, is not generally considered of encyclopedic value. We don't, for example, if we have an article about even a really well-documented person, like Winston Churchill, does Wikipedia mention his height? I don't think it does, even though I'm sure we could probably find a source somewhere that lists his height. It's just not a very relevant piece of information about Churchill. With everything else that's written about him and that we know about him that we want to include in the article, a person's height is not really something of great value most of the time. But if we are describing Michael Jordan, it is relevant. I'm dating myself. People still know Michael Jordan, right? You know, a basketball player, that's when height is very relevant, right? That's one of the first things you say when you're describing a basketball player is list their height. So even within the class of person, some properties may be more or less relevant, depending on the context. So let's look at some examples. These are examples of statements. Each line is a statement. So here's the first one. I want to state about the item Earth, our planet. And what I want to say about Earth is that the property highest point on Earth has the value Mount Everest. Would you agree with that? That is the highest point on Earth. That's a statement. It says something specific, one piece of information about Earth. Now, of course, there's a lot of other things we want to say about Earth. Circumference, average temperature, I don't know, all kinds of things we can describe the planet with, right? Density, et cetera. It's a galaxy. It belongs to all that. But here's one piece of information, one very specific field in the detail form about Earth. The highest point is Mount Everest. Now, here's a second statement. This time, Mount Everest itself is the item that I'm describing. The topic has changed. Now I'm saying something about Mount Everest. And what I'm saying about Mount Everest is elevation above sea level sounds the same, but it isn't, right? Because the highest point on Earth answers the question, where? Like, what is on the planet, what is the highest point? It's Mount Everest. But how high is that highest point? It's a different piece of information. Do you agree? It's the actual altitude. It's not where on the planet it is. So it may sound similar, but these are actually very different pieces of information. So that highest point, how high is it? Well, it's 8,848 meters high. Now, the third statement gives another piece of information about the first item, same item. I could have grouped them together. Another thing I know about the Earth is that the deepest point on the planet is the Challenger Deep, right? Part of the so-called Mariana's trench, right? Mariana trench in the ocean. So that is the deepest point. And how deep is it? I again use the elevation above sea level. That's the name of the property. Even though it's not above sea level, I have a negative value, right? Because the elevation of the Challenger Deep is minus 11 kilometers, more or less. All right, so these are statements. These are four individual pieces of data. And I could also look at it this way. Maybe that's closer to the government form example that I was giving, right? So I want to say something about Earth. What do I want to say? Two things, highest point, that's the field, that's the property, and this is the value. The highest point is Mount Everest. The deepest point is Challenger Deep. And then I have things to say about Challenger Deep, the property of elevation above sea level. The value is minus 11 kilometers. Now, here's yet another view of the same data, once more with numeric IDs. So this is the same information, the same four statements, but this time, in addition to using words, I'm also including weird numbers following either Q or P. So P stands for property. So the highest point property is P610. And the deepest point property is P1589. What do these numbers mean? They don't mean anything at all. They're just numbers. They're just sequential numbers. And if I create a new Wikidata item right now, it'll get just the next available number. So they're just numbers. So P stands for property. What does Q stand for? Does anyone know? It's a trick question because it's hard to guess. But the principal architect of Wikidata, a Wikipedia named Danny Vrandecic and data scientist, is married to a lovely lady named Kavarniso, spelled with a Q. And this is a loving tribute. And she's also a Wikipedia and an admin of the Uzbek Wikipedia. So Q2 is just the numeric identifier of the item Earth. And Q513 is the identifier of Mount Everest. You notice that we use that ID across the statement. So from Wikidata's perspective, this is actually what the database actually contains. What we were saying with words, write the Earth, highest point, whatever, never mind that. Q2 has P610 with a value Q513. That's what Wikidata cares about. Now that, you will agree, is a little inaccessible. Just these lists of numbers, that's a little hard. So Wikidata understands and allows us to continue using our words, but actually it gets translated into numeric IDs. Now why is this a good idea? Why can't we just say Earth or Mount Everest? Any thoughts? This is an open question. Why is this a good idea to use numbers instead of the names of things? Yes, because more than one thing can have the same name. What do you mean? There's only one Mount Everest. Well, yeah. But there's also a movie, probably more than one, called Mount Everest, or a TV documentary, literally called Mount Everest. And of course, if I'm describing a person named Frank Johnson, not the only Frank Johnson on the planet. But wait, you say, on Wikipedia, we deal with that problem. How do we deal with that problem on Wikipedia? Does anyone in the audience know? The standard way to deal with the fact that there is more than one Frank Johnson in the world on Wikipedia is to use parentheses after the name. So there's Frank Johnson actor and Frank Johnson politician, for example, if that's the distinction we need to make. So you put in parentheses the minimal amount of information you need to tell apart these Frank Johnsons. What if there's two politician Frank Johnsons? Well, then you would say Frank Johnson, Delaware politician versus Frank Johnson, California politician. You just put in that bit of context to tell them apart. So that's a solution that Wikipedians came up with years and years ago because they did need a unique name for the article. You can't have two articles literally called Frank Johnson on Wikipedia. So that's a solution on Wikipedia. But Wikidata was designed much later, more than a decade after Wikipedia, and was able to kind of learn from the experience of Wikipedia, which has tremendous experience with multilingualism, much more than most sites and projects, as we know. And so the Wikidata team understood from the get-go that this will be an issue, and it's better to use numbers that are unequivocally different from each other instead of labels, instead of the actual name, the actual text. Because names are not unique. Names can change. Just last year, there was a big naming reform in Ukraine, and a whole bunch of towns and districts were renamed. Does that mean we should change all the data that we have, lose all the data that we have about the old name? No, we ideally just want to change the name without breaking links. So having the links actually refer to the numbers is one way to ensure the integrity of the data, of the links when renaming happens. Another reason is, well, even if the name doesn't change, not all humans call everything the same. So Earth is Earth in English, but it's art in Arabic. It's elits in Hebrew, and so obviously Earth, even that, is not as unambiguous or unequivocal as you might think. And so that is the reason Wikidata, which is built to be multilingual from the start, talks about numbers rather than labels. OK, I had a whole slide about that, and I forgot. Yes, so even London, again, is not just London, England, which is what you were thinking about. It's also a city in Canada, and it's also a family name, like Jack London. It's also a movie company. There must be some hotel named London somewhere, et cetera. This is a good opportunity to remind everyone that the vast majority of humankind does not speak a word of English. That's a statistic worth remembering. The vast majority of the planet does not speak English at all. That does not contradict the datum that English is the most widely spoken language. And yet, in aggregate, a majority of people speak other languages and not English at all. So moving swiftly on, this is a pause for questions about what I've covered so far. Any questions in the audience? If not, we move to IRC. If there are any questions, any questions? No, IRC, any questions? OK, we will have additional pauses for questions later. But enough of my hand-waving. Let's go explore Wikidata. So Wikidata lives at wikidata.org, and Wikidata already has more than 25 million items. That is, it collects statements about more than 25 million topics. It has many, many more than 25 million statements, because many of these items have dozens or hundreds of statements. So it documents 25 million things, people, books, rivers, whatever, just to give us a sense of how big that number is. How many articles do we have on English Wikipedia? More than, yes, more than 5 million articles, and that's the largest Wikipedia. So Wikidata is already describing more than five times or about five times as many items as even our largest Wikipedia. So obviously, Wikidata contains data about things that have no article on any Wikipedia. It is a much, much larger, more comprehensive project. All right, the second thing we might notice is, well, this looks kind of like Wikipedia, if we've never visited. It looks kind of like Wikipedia. It has this sidebar. It has these buttons at the top. It looks like it's from the 90s. Yeah, so the reason it looks like Wikipedia is that it is a Wiki running on media Wiki software. It is running on software very much like Wikipedia. But it is running on a kind of modification of the standard Wiki software. It has an additional, very important component named Wikibase, which gives it all of its structured and linked data power. So let's start exploring Wikidata. Let's take something local. Harvey Milk. Harvey Milk. What does Wikidata know about Harvey Milk? For those on YouTube who may not be local, is a San Francisco politician and gay rights activist who was murdered in the 70s. It was very significant in the history of those struggles in this country. So what does Wikidata tell us about Harvey Milk? Well, the first thing is, it knows that Harvey Milk is Q17141. That's the most important piece of information is, first of all, that is the identifier. That is the item number of all the data that we will collect about Harvey Milk. The second thing you see right under the title is this line, this very, very brief summary, American politician who became a martyr in the gay community. This line is the description line. So the name of the item, this is the label. We call it label on Wikidata. That's the label. And this line is the description. Now, why is this description important? This is the description that helps us tell this Harvey Milk from any other Harvey Milk that may exist. So again, this would be useful if I'm looking up someone with a slightly more generic name. That line will help me tell apart the item about Harvey Milk, the gay activist, rather than Harvey Milk, the film actor. And where is it coming from? Well, Wikidata has this whole table, as you can see, with descriptions and labels in other languages. So Wikidata is able to refer to Harvey Milk in Arabic, which, don't panic, is written from right to left. It also knows what to call him in Bulgarian. I mean, it's the same name, but it's in a different script in French and Hebrew. And that's it. Does it not know a name for Harvey Milk in Italian? Of course it does. It actually has labels for this person in many, many, many languages. Doesn't have descriptions in every language, as you can see. So why was Wikidata showing me these languages and not others? I mean, why this somewhat arbitrary collection, English, Arabic, Bulgarian, German, French, and Hebrew? Because I told it to. So if we briefly click over to my user page. Again, like every Wiki, you have user accounts, you have user pages. This is my user page. And as you can see, there's this little user information box here called a Babel box by Wikipedians, where I list the languages that I speak. And Wikidata uses this box just to helpfully show me these languages. Of course, all the other languages are still available, as you saw, by clicking the More Languages. But this is just a useful little way of getting the languages I care about up there first. By the way, this is a lie. I don't actually speak Bulgarian. That stayed on my user page, because I was demonstrating this in Bulgaria. And I wanted that label to show up there during the talk, just in case you were going to tell me a really good Bulgarian joke. OK, so for example, Hebrew is my mother tongue. And we have a Hebrew label for Harvey Milk. But we don't have a description. So let's fix that right now by clicking the Edit button. Right here, I click Edit, and this table became editable. And now I can very briefly type a description online in about 20 seconds. Are we back? OK, sorry about that. So this was all about what to call him in different languages and scripts and how to tell this person apart from other people with potentially the same name. Let's scroll down and see what else does Wikidata know about this person? So as you can see, this is a list of statements. This is a list of statements. And the properties are on the left. The values are on the right. So the first thing Wikidata knows about Harvey Milk is a very important property called instance of. Instance of. And the property instance of answers is a very basic question. What kind of thing is this that I'm describing? Is it a book? Is it a poem? Is it a mountain? Is it a theological concept? No, it's a human. It's a person. The item about Mount Everest will say instance of mountain. This is a very important property. Why is it important? Wouldn't anyone looking at this know that this is a human being? Yes, anyone looking at this will know. But if I want a computer to be able to pull information about people, I want to be able to easily exclude all the mountains and poems and other things that are not people from my query. So this single datum, this single piece of data is what tells computers and algorithms very clearly. This is a human. Things that aren't instance of human aren't other things. So it may sound very trivial, but it's not. It's very important to have an instance of field for Wikidata items. All right, what else do we know? Well, Wikidata knows about an image for Harvey Milk. Again, we can find a ton of images, or maybe not a ton, but we can find dozens of images of Harvey Milk on commons, on our Wikimedia, multimedia repository. So why should we have a single image here on Wikidata? Again, this is mostly for re-users. If I'm building some kind of tool that pulls information from Wikidata, it's nice if there's at least one representative image to use as the default or immediate image for Harvey Milk in some other reused context. All right, sex or gender, male, country of citizenship, United States of America. Given name is Harvey, the date of birth is so and so. The place of birth is Woodmere. The place of death is San Francisco. The manner of death is homicide. Wikidata knows that. Now, again, every little datum like that is the basis for later querying and answering questions. So the fact that we record the manner of death of people, at least of some people, will allow us later to go, you know, who are some people from Belgium who died by homicide? That's a question Wikidata can answer. OK? Thanks to this field. The other thing I mentioned is that things are links. So the place of birth is Woodmere. I don't know where Woodmere is, but I can click that and find out here is the Wikidata item about Woodmere, right? It was the value in the statement about Harvey Milk, but now I'm looking at the item about Woodmere and it turns out it's in Nassau County, New York, right? And of course, Wikidata has a whole bunch of information for me about Woodmere, right? What country it's in and the coordinates and the population and the area, right? All the things you would expect about a place, OK? Let's get back to Harvey Milk. So the manner of death, the cause of death. Now here, Wikidata gives us excellent information. The actual cause of death, right, is ballistic trauma. That's a professional term. And it has, this statement has qualifiers. So until now, I was talking about triples, right? The item has a property with a certain value. Actually, each statement can also have a number of qualifiers which add aspects of information still about that one question that we're answering, right? So this property answers cause of death. It's not discussing anything else. It's not discussing languages. It's not discussing date of birth, right? It's talking about the cause of death. But we're not just saying ballistic trauma. We're saying ballistic trauma with the quantity attribute being five. What does that mean? Five bullets, right? There are five ballistic traumas. He was shot five times. And he was shot by this person named Dan White. And this ballistic trauma, like this actual shooting, is itself the subject of this other thing. This is a link to a whole other Wikidata item about the Moscone milk assassinations, right? Moscone was the San Francisco mayor at the time. We will see slightly better or easier to understand examples of qualifiers in a bit. So if this was confusing, hang on. So he was killed by Dan White. He spoke English. His occupation, here's an example of a property with more than one value, right? So milk was a politician. But he was also a Navy officer, at least for a while, right? That was another thing that he did during his life. And he was a human rights activist, right? So some people are writers and translators, right? So people can have more than one occupation. People can speak more than one language, et cetera. Here's a better example of a qualifier. So the property award received has the value presidential medal of freedom. And that award has an attribute called point in time, like when was this, right? This was in 2009. Do you see that this piece of data, 2009, is a substatement or is subjugated to the context of this award was the presidential medal of freedom. It can't just kind of free float in the article. It's not that 2009 is itself a meaningful thing, right? This medal was awarded in 2009. If Wicked Data doesn't tell us, for example, when he was a Navy officer, but if we were, for example, to look that up right now and find out that Milk was a Navy officer between 1962 and 1964, we could go back here to the Navy officer bit and click Edit. This is how I edit this particular little piece of information and add a qualifier like this. I click Add Qualifier. And I could pick start time and end time, right? And then I could type, you know, 1962 to 1964. And that would be teaching Wicked Data. Oh, I'm sorry. I meant to do that for Navy officer, OK? But that is the exact, the accurate time span of that statement. So it's true to say about a person he was a Navy officer, even if, of course, he wasn't a Navy officer his entire life. But it's better and it's more accurate to say he was a Navy officer between 1962 and 1964. Don't worry. I'm not saving this. No vandalizing of Wicked Data in this session. OK. Moving on, what else does Wicked Data know? He was educated at this university. He was a member of this political party, right? He has, of course, a very relevant property for a politician. Religion, military branch. What is the category on commons that discusses this item is something that Wicked Data can tell us. And that's it. Now, is that everything that we could possibly say in a structured way about Harvey Milk? No. We could probably find at least a few more things to say. We will see how to contribute new information to Wicked Data in just a minute with a different example. But all this was a set of statements, right? This was the title statements here. But at the bottom of the list of statements is another section called identifiers. And I want to spend a minute talking about what that is. So identifiers is a collection of keys, a collection of IDs or codes that are keys to other information sources. And a lot of Wicked Data items have whole series of keys to other databases, other sites, other repositories that help you or a computer be able to access not just some database and look for information about Harvey Milk, but access the exact record relevant to Harvey Milk. And again, if you imagine someone named John Smith, that is really valuable, right? If you're not just told, oh, yeah, you can look at the Library of Congress for John Smith, good luck with that. Or if I tell you, go to the Library of Congress to this record for this John Smith, you see the difference. So Wicked Data tells us that on VF, which is the virtual international authority file, it's an aggregated master index built by bibliographers, by librarians of people. It tries to aggregate information about people across library catalogs everywhere. So the VF ID for Harvey Milk is this number. And conveniently, if I click that, I'm not taken to some Wicked Data item. I'm actually taken to the relevant site. So this took me right to vf.org, the virtual international authority file, directly to their record about Harvey Milk. And that itself leads me to national catalogs of national libraries all over the world. We won't get into the things you can do with VF. The point is, Wicked Data contained the piece of thread that I could tug on to arrive directly to that information in other databases. Yes. And it has that for many, many kinds of databases. The BNF, for example, that's the National Library of France. And that will take me to that index card. IMDb. We all know IMDb, right? So here I have the key to Harvey Milk in IMDb. And this is what IMDb says about Harvey Milk, right? They have their own piece of information about him, of course, with filmography and everything else. And see, I did not have to search IMDb for it. I just had the key right there waiting for me. Now, again, this is very convenient for me, as I just showed you the human use case for this. But it's even more powerful in aggregate when we allow computers to traverse this network of links between, not just within Wikidata, but between data storage facilities and repositories. This is sometimes referred to as the linked data open cloud. Cloud, because it's multiple different repositories that are interlinked. And Wikidata is already, and to a growing extent, the nexus, the connection point between a lot of these different databases. So IMDb, for example, it's a good example because it's a site almost everyone knows, IMDb has information about Harvey Milk, but that information does not include a link to the French National Library, right? Do you see what I'm saying? So IMDb is a data repository with IDs and allows linking, but it does not give you what Wikidata gives you, which is this kind of collection of, it's like a junction of all these different data sources. So Wikidata is the place where you can document these interrelationships or equivalencies, right? So ID 587548 on IMDb is discussing the same topic as French National Library ID whatever. Wikidata contains that piece of information that this ID and this database is about the same person as that ID in that database. Okay, so that's what identifiers are about. Still scrolling down the Wikidata item about Harvey Milk, we have the site links. The site links are links to Wikimedia projects that are related to this item. So of course there are Wikipedia articles about Harvey Milk in many, many different Wikipedias, quite a few language versions. And there are pages on Wikiquote, one of the sister projects. There are pages on Wikiquote with some quotes from Harvey Milk and there is even a page for Harvey Milk on Wikisource, right? So this is a collection of those links and those of you who have maybe only dealt with Wikidata for interwiki links, which we used to do in the old days manually within the article text, now we do it through Wikidata. So maybe that's the only thing you did know about Wikidata is how to update these interwiki tables on Wikidata. All right, so that concludes our little tour of the anatomy of a Wikidata page. I will just remind you that it's a wiki page, which means it has a discussion page, a talk page. This one happens to be empty, but if we have concerns or arguments about some of the data here, that is what we would use to discuss this and to arrive at consensus. It also has a history view just like every Wikipedia article. So you can see here a list of edits. Maybe some of you have never looked at a history page on Wikipedia, so this looks overwhelming, but every line here, every entry here is a single edit, a single revision, single change to this Wikidata item, just Harvey Milk. And you can see at the very top, this edit that I just made, this is my volunteer account, and I just made this edit and in parentheses you can see what I did. I added an HE Hebrew description and this is the text that I added in Hebrew. So we can see who added what to the Wikidata item just like we can do the same on Wikipedia. So we have the revision history, we can undo edits, we can revert just like on Wikipedia. And what else did I want to show here? We can add an item to my watch list using the star, just like on Wikipedia. So we have all these standard Wiki features that we would come to expect. Let's pause for questions. Any questions about what we've covered so far? Are attributes of statements preset for the specific value? No, they're not preset and generally Wikidata does not enforce by default logic. So I mean, there's nothing to prevent you from editing the item about Brazil and adding the property height. Now height is not a relevant property for a country. I mean, maybe average elevation maybe, but not just height, which is used for humans or for physical things. So you could add that property to Brazil and save it and the Wiki would not complain. Now in the background there are kind of extra Wiki outside the Wiki processes for constraint validation. So there are bots and other processes that run and occasionally, for example, identify non-living things with a date of birth field. That's nonsensical, that should not exist. If someone mistakenly added that, there are processes that would flag that to be fixed. But the Wiki itself, Wikidata, will not prevent you from adding that and that is by design to keep things flexible so that people don't run into, oh wait, but I can't add this because nobody thought that I would need this maybe. I hope that answers your question. You say helpful answer, question mark. So was it a helpful answer or? Okay, yes, Elena. Excellent question, I'll repeat it. You ask how do I find the Wikidata item number from Wikipedia? If I'm reading about Harvey Milk and I want to look at Wikidata, how do I do that? That is an excellent question and let's skip to Wikipedia. Conveniently I have the link right here on English. So this is the Wikipedia article about Harvey Milk and every item on Wikipedia should have a Wikidata item associated with it. But it doesn't happen automatically so if I just created a page on Wikipedia I also need to create a Wikidata entity for it if it doesn't already exist. It could already exist because it was already covered in a different language for example. So that was parenthetical. But every article on Wikipedia should have here on the side, on the sidebar under tools a link called Wikidata item right here. That Wikidata item is a link that takes you to Wikidata to the entity and there you find the number. You don't even have to click it. The URL itself tells you the number. You see it's wikidata.org slash wiki slash q17141. So that was an excellent question. Other questions? Yes. About the additional attributes. The qualifiers. Yes, I answered more generically but just like the properties themselves are not limited per item the qualifiers per statement are also not entirely preordained but there is some structure to it. I don't want to go into it at great length right now. If we have time in the end we can get back to that. Some qualifiers are again relevant for some things start time end time and others won't be. Wikidata does try to offer you, you may remember when I clicked add qualifier it gave me kind of drop down of some relevant qualifiers. So it does try to help you in that way. Other question? Are the values for instance of already mappable to external ontologies? That is a complicated question. I'll help people understand the question first. So an ontology is a structure some kind of hierarchy or cloud of entities and their interrelationships. An ontology would say for example a person is a living thing so is a dog. They're both living things but they're different things and then you know say things about those entities and their interrelationships. Now there are many many competing or coexisting models of ontologies. Many of them were created for specific needs. Many of them want to be a universal ontology but of course it's a possible to quite agree on one complete and simple ontology and so there are many ontologies which brings up your question can we map across ontologies? Can we say that when WikiData says instance of book that is equivalent to some other ontology saying instance of bibliographic record and the answer is yes there are some such mappings they are incomplete and there's no kind of automatic thing happening in the Wiki vis-a-vis those other ontologies. There are many other ontologies that are left as an exercise for those dealing with those other ontologies and for tool builders and other platform improvements beyond WikiData itself. Okay other questions? Yeah we have one from the YouTube stream someone asks why can't I link Howard Carter's occupation to archeologists when I use an info box that fetches its info but I link it from the info box. Someone on the stream answered saying because it's an improper connection because the target is not about the subject only. The target is not about the subject? If I understand the question correctly what you would want to be able to do is from within Wikipedia be able to say occupation and link to WikiData entry about archeology that doesn't quite work that way we will get to a little discussion of that in an upcoming section of this talk so I will defer the rest of my answer to then. Okay so we're done with questions for this phase and my browser got tired of waiting for me so alright so we took a look at WikiData and we took questions so now let's teach WikiData some new things some things it doesn't already know let's look at this item here so this item is about one of my favorite writers an American writer named Helen DeWitt WikiData of course fondly refers to her as Q546374 but we can call her Helen DeWitt and what can we contribute here so WikiData has far less information about Helen DeWitt most of you probably haven't heard of her that's okay what does WikiData know about her well instance of human we have a photo of her she's female she's an American her name is Helen date of birth place of birth she's an author novelist writer she was educated at the University of Oxford and WikiData knows what her official website is that's useful but that's it now we can contribute information here for example she's an American author writing in English so we could add that information we could click the add button here and this is a good moment to acknowledge that the user interface of WikiData is a work in progress it's not as intuitive as it might be so you need to understand that click to add a completely new property you need to click this add button if you want to add an additional value to the property official website you need to click this add button it makes a kind of sense with a shaded box but you know you need to kind of pay attention and it's not as friendly as it might be excuse me so let's add a property here click the add button again WikiData tries to be useful by suggesting some relevant properties for humans a bit morbidly it suggests how about date of death that's not cool WikiData Helen DeWitt is still alive so I will not add date of death but I can add languages spoken written or signed so I click that and she writes in English not in Hebrew don't panic I type English here and of course WikiData has autocomplete so it tries to help me along but you will notice that it has all kinds of things called English I mean it turns out that there's a place in Indiana called English Indiana did I mean that no of course I didn't mean that she writes her books in English Indiana but WikiData gives me the option of linking to that I also don't mean the botanist Carl Schwartz English I mean the West Germanic language originating in England that's what I mean so I click that and I click save and that's it again I have just made an edit to WikiData I have just taught WikiData that this author speaks English now again this may be very obvious she's American of course not all Americans write in English it may be obvious if you look at her books the important thing is that now WikiData knows this as a piece of data and again think ahead to queries which we will demonstrate in a little bit without this piece of information that I just added if I were to ask WikiData five minutes ago give me a list of novelists writing in English WikiData would have returned thousands of results but Helen DeWitt would not have been among them because up until two minutes ago WikiData didn't know that Helen DeWitt writes in English and not in Spanish do you see it is this explicit statement that will now make her be included in any future query that asks who are novelists writing in English okay by the way she's a PhD in classics she speaks or at least reads and writes Latin and Greek ancient Greek I mean I happen to know that but wait wait wait wait you say what about original research I mean add stuff like that to WikiData don't you need sources citations of course I do yes let's add some sources to this so on WikiData just like Wikipedia things should generally be supported by citations by references and just like Wikipedia they aren't always supported in that way okay so I mean I can just add it to WikiData watch me I just did that right I just added English and Latin without any citation and I will not be arrested for it just like I could edit a Wikipedia article and add some information without a citation it may stick it may stay in the article or it may be reverted it depends on the kind of information I'm adding it depends on how many people are paying attention to the article on Wikipedia and it works the same way on WikiData so you can add some things without references ideally when you add information you should include references so let's be good WikiData citizens and add a source here is an article that I prepared in advance this is Helen DeWitt and in this article somewhere it actually says yeah right at the bottom here see, DeWitt knows in descending order of proficiency Latin, Ancient Greek, French, German, Spanish, Italian Portuguese, Dutch, Danish, Norwegian Swedish, Arabic, Hebrew and Japanese this may sound excessive but it's true I met this woman so anyway we don't have to include all of that from a reasonably reliable source this magazine this interview can count as a source for the languages she speaks so I copy the URL I just copied off my browser and whoops, that's not tab, here we go and I can just add a reference here to the information that I just added to WikiData I can click add reference and then just say the reference URL is, and I just paste I paste this URL hit enter and that's it and now the fact that she speaks Latin has a reference if you look at the other things here on WikiData you can see that these IDs for example have references too right, in this case the reference just says excuse me hello in this case it just says imported from English Wikipedia but wait, you say can Wikipedia be a source? not properly, no just like Wikipedia itself doesn't cite itself, right we don't say this person was born in this city how do we know we read it on Wikipedia in another language, that's not a good citation it's not a good citation we don't use WikiData either so why do we put it here well, you can see that the qualifier here is different right, it's not reference URL which is what I put in for Latin here hello yes, it's not it's not reference URL here, it's a different qualifier says, saying imported from so this is not an actual reference it supports this piece of data it just shows where did this data come from it's a slightly different thing because this data was mass imported into WikiData so it wasn't input by hand by some volunteer it was imported into WikiData on mass by a script, by a program and we want to know where did this number come from well, it came from English Wikipedia so again, that's not a proper reference for the validity of the information but it does at least tell us it came from English Wikipedia we can click and look on English Wikipedia and find out maybe there's a footnote there that says where it did come from okay so this was an example of teaching WikiData something that it didn't know something about the languages and of course I could add this reference for English, I could add all the other languages that she speaks and I won't bore you with that, but that is basically how it's done so you click this add to add a completely new a completely new statement now by the way, the fact that these are the only two suggestions that WikiData can think of doesn't mean these are the only options okay, you can just type anything that may be relevant, we could add for example award, just start typing award and here I have a bunch of properties that are relevant for awards award received together with, conferred by there's all kinds of properties that I could rely on and of course there is a list of all the properties of WikiData and that list is also sorted by type so yes, there is a list of properties relevant to people so that you don't have to guess but a surprising amount of the time and get the right property suggested to you okay so we taught WikiData something new and now let's teach WikiData something completely new right, so how do we create a new WikiData item so like I said, if I created a Wikipedia article about something that was not previously covered on any other Wikipedia chances are there would not be an already existing WikiData item, sometimes there might be because WikiData does have 25 million entities but sometimes there wouldn't be so first of all I could search for it so I could go to WikiData to the search box here and just start typing and search for what I want so if I'm searching for Helen DeWitt I just say Helen and I can see whether or not it exists and there's a detailed search results page, et cetera where I can find out if the item does exist or not this reminds me of a very important thing I wanted to demonstrate and that is the multilingualism of WikiData so remember all these labels in other languages WikiData knows what to call Helen DeWitt in Hebrew and it will show it to WikiData users whose language is Hebrew mine is set to English for your sake but if I change this, I go to preferences here and change my language and I hit save WikiData will start talking to me in Hebrew now brace yourselves are you ready don't panic it's right to left oh my god everything is topsy-turvy so this is the same article in Hebrew so the sidebar has switched direction and I know most of you cannot read it bear with me this is the label that you previously saw in the label box this is how you spell Helen DeWitt in Hebrew and here is the description in Hebrew it's not the description in English this description American writer which I was shown previously now I'm shown the Hebrew description appropriately but more interestingly oh my god all the statements are suddenly in Hebrew how did that happen this tiny word here is the very concise way to say in Hebrew instance of and this word here means human so these are links to the same things this still links to Q5 Q5 is the WikiData entity for human these are still the same things but because WikiData has multiple labels for everything it has multiple labels for items it has multiple labels for property names so WikiData knows how to say instance of and award received in other languages that is why it is able to show me all this data in Hebrew even if none of that data was actually input into WikiData by a Hebrew speaker the data could have been input by English speakers but thanks to the fact that someone once translated the word photo into Hebrew I can see this field in Hebrew so one of the things you can do to help WikiData right now without any special knowledge is to help translate those labels every label only needs to be translated just once so you can see that all of these properties date of birth name etc they all have Hebrew labels no they all have Hebrew labels doing pretty good and I'm able to search in my own language I'm able to click add this word is add so I click this and now I have the add screen it all speaks my language and it's awesome and now for your sake I will switch back to English but it is important to know you can edit WikiData in any language and it is far more multilingual and multilingual friendly than for example commons which is also a project we all share but commons has some limitations on how multilingual it is for example the category names etc okay so we were beginning to discuss creating something completely new quick questions if that's okay so there's two questions on IRC show search for something like getting a list of things I want to learn how to search for something properly like show me all the items with this value of this property yes that is part of this talk but I'll get to that in a little bit later there's a whole section where I will demonstrate the very very powerful query system of WikiData where I will cache that check that I gave at the beginning of this query etc so I will demonstrate how to do that other question how does WikiData deal with link rot in other issues streaming from bare url refs url's break we call that link rot WikiData doesn't have any particular magic around link rot just like wikipedia so if you do use a bare url it may well rot you can add qualifiers with backup urls on the internet archive or another mirroring service and potentially that could be a software feature for wikidata to automatically save or ensure that something is saved on internet archive but I don't know that it is doing so now so just like wikipedia if it is a bare url it may rot and may need to be replaced possibly by bot other questions so let's talk about how to create a completely new item it's very simple you go to wikidata and you click here on the side there's a link create new item which gives you this screen and let's create an item about that I'm reading right now by this Bulgarian writer so we have an article about this writer but we don't have an article or a wikidata item about one of his famous books called Circus Bulgaria that's the book I'm reading as you can see it's not a link on wikipedia there's no article about it and there's not even a wikidata entity item about it but we can totally create it even without a wikipedia article so let's create this new item let's create it in english for the purposes of our demonstration the name of the wikipedia article so let's create this new item let's create it in english for the purposes of our demonstration the name of the item is Circus Bulgaria Circus Bulgaria that's the name not Circus Bulgaria parenthesis book or anything you may be used to from wikipedia it's the actual name of the book and the description again remember the description field is just to kind of help tell apart this Circus Bulgaria from any other potential Circus Bulgaria maybe there's a film or something right so it's enough to just say something like short story collection I might add by deon enneve just in case again some future other short story collection by some other author happens to have that same name that should be disambiguating enough ok short story collection by deon enneve I could have aliases for this the aliases assist findability this particular book has just this one name so that's fine and I click create that's it I just start with a label and a description I click create ta-da I have a brand new Q number for my new wiki data item and wiki data knows what to call it and a description in one language at least and that's it and I can start populating it as you can see it has no statements no sight links but it's ready to be taught so for example I can start by teaching it the name of the book in another language that I happen to speak now it has two labels in English and Hebrew I could also look up the Bulgarian the original Bulgarian label for this book seems relevant again I do not speak Bulgarian but I can go to the Bulgarian Wikipedia through into wiki this is this gentleman and I could find I can read Cyrillic so I could easily find what I say easily when I say easily maybe not so easy but I can I can somewhere in here here we go that is the name of the book as in circus no problem so I just copy this right here and I go back to my new item my new item which is here and I edit the Bulgarian field and here it is awesome alright but I still haven't told wiki data anything about this I know I'm talking about a book wiki data doesn't know that yet so let's start by adding some statements first of all I click add wiki data sensibly says how about we start with instance tell me what kind of animal not animal what kind of thing are you trying to describe here well it's an instance of a book not in Hebrew please it's an instance of a book I could even be a little more specific and say it's an instance of a short story collection there we go short story collection I hit save awesome so now we know what kind of thing it is it's not a human it's not a mountain it's not a concept it's a short story collection now I can add some other things see wiki data is already working for me because it's a short story collection it's offering me to populate these properties and not other ones publication date original language genre country of origin these are all relevant so let's start with original language of the work is Bulgarian not Bulgaria Bulgarian this is the item I want to link hit save and whatever author let's let's identify the author so the author the main creator of the work is that gentleman Dan Enneve and remember he has a wikipedia article he also has a wikipedia entity so wikipedia does know about him so I hit save and I can add something about the translator translator and what was that lady's name Kapka Kasabov now it so happens that wikipedia already knows about this lady yeah see so I can just start typing and then just link to it awesome but what if it didn't what if it was translated by someone who isn't already covered on wikipedia well I could just type the name as a string but ideally I could create a wikipedia entity about this translator so that there is a possibility to link to her now I might actually add a qualifier here because she's not the translator of the book she's the translator of the book into English so the language that she translated into is English this book remember I'm describing the book the item is about the book so the book would have a different translator into Polish so this is an example of a property or a statement that doesn't make sense without one of those qualifiers it's just not correct it doesn't make sense to say the translator is the English translator or even this English translator in 50 years maybe there would be an additional English translation that's an example of needing that qualifier and of course I could go on populate the other fields we don't have to do that right now publication date country of origin etc so this is already beginning to look like all those items that we already saw but just a moment ago it didn't exist just a moment ago wikiData had no concept of this work this happens to be one of his notable works so I could actually go to the item about Dejan Eniv which has all this information already information languages and add a property remember I'm not limited to these I can add a property called notable works and mention my new item Circus Bulgaria see my new item is showing up and thanks to the description that I wrote short story collection it's already appearing here in the drop down very conveniently so I linked to this I hit save ideally again I should find some reference showing that this is a notable work by him but we won't spend time on that right now but the point is we created a new item we populated it a little bit we linked to it so that it's more discoverable by mentioning it in the author name and of course the book item itself mentions the author and links to the author so that's all good one last thing we shall do is give it some useful identifier so let's add say the Library of Congress record for this book so I have prepared this in advance just in time with 80 seconds to go before it's giving up on me oh it has already given up on me that is very unfortunate okay so I go to the Library of Congress and I find this book I find this entry right in the Library of Congress database about this book and it has a permalink it has a kind of guaranteed to be permanent link I can just copy that link go back to my little book and say Library of Congress yeah LCCN that's what they call their IDs the call number and I paste it here I actually don't need the URL I need just the number and there we go I have added it and now Wikidata knows how to find bibliographic information about this book and any re-user of Wikidata some program, some tool that connects books to authors or does statistical analysis or whatever some future yet to be imagined tool could automatically find additional metadata on the Library of Congress site thanks to this connection that I just made and of course I could add many other IDs to other catalogs around the world and we won't do that right now but you can see that it's now showing up under identifiers so this is how we created a brand new piece of data questions about this about creating new items I meant yeah, alright so so we've seen how to contribute to Wikidata on our own, kind of through directly through Wikidata now you may be thinking but Asaf, this sounds like a ton of work recording all of these little tiny bits of information about every person and every book and every town and if you think that, you would be correct that is a ton of work it's a lot of work however it is centralized so it is reusable on other Wikis and we will show in just a moment a little information from Wikidata into Wikipedia or other projects excuse me we will show that in just a moment but here is an awesome little game that Wikidata volunteer Magnus Manske has authored called the Wikidata game in which he tricks people helps people make contributions to Wikidata in a very very easy and pleasant way let's look at the Wikidata game so the first thing you need to do in the Wikidata game is to log in because the Wikidata game makes edits in your name so you need to authorize it it's perfectly safe and after you do that you can go to the Wikidata game so this is the game now I'm logged in and the Wikidata game actually includes a number of different games let's start with the person game so Wikidata shows you shows you an item and asks you a very simple question person or not a person that's it so Wikidata goes through Wikidata entities that don't even have the instance of property which is why Wikidata doesn't know literally doesn't know if this is a person or a mountain or a city or a country or anything else so it asks you because this is the kind of question that Wikidata cannot decide on on its own but for us humans it's generally trivial to say whether something that we're looking at is a person or not it gets slightly trickier when the information is in Javanese as it is here rather than English so this item happens to be described in Javanese my Javanese spoken in Indonesia is very weak however I can tell that this is not a person how can I tell without understanding Javanese I see that it mentions a thousand kilometers and square kilometers see so this is about a place or an area or a region or whatever but not a person so this is an example of how even without understanding language you can sometimes make a determination however of course you should be sure this is definitely not what the Wikipedia article about a person looks like so this is not a person I just click it and I'm shown the next item this item is in another language I do not speak and I just don't know I do not know if this is about a person or not so I click not sure this is in Swedish and it's about Sulawesi still Indonesia and it is not about a person I have enough Swedish for that so I click not a person now you may say well do I really have to deal with all these languages that I don't speak the answer is no you don't have to here at the bottom of the Wikipedia game there are settings you can click that and tell Wikipedia you know I cannot even read Chinese or Japanese so please don't show me items in those languages because I wouldn't even be able to guess I prefer these languages in which I can relatively easily make determinations and I can even tell Wikidata to only show me these languages you see this was not selected which is why I was shown some other languages I could say only use these languages and save and now I can try this game again however that can slow it down a little so here we go here's a Spanish which is one of the languages I told Wikidata game it can use this is a Spanish item now is it about a person or not um it is not about a person is it about a person no yes it is right amongst the Sturgeon Pedro de Oviedo and Falcone that sounds like a person Fra Pedro Nacio yeah he was born in Madrid 1577 this is a person I click person again if you're not sure click not sure the point is just by clicking person and as you can see this would work very well on mobile which is why I said you can contribute on your commute you can just hold your phone or tablet or whatever and just tap person not a person person not a person the amazing thing is that just tapping person has actually made an edit to Wikidata on my behalf which I can find out like every wiki by clicking contributions and as you can see in addition to the stuff about Circus Bulgaria my latest edit is in fact about this Pedro de Oviedo and Falcone person and the edit was I hope you can see this created the claim instance of human so I added I mean Wikidata game added for me the statement instance of human now the awesome thing is that it was super easy to do I didn't have to go into that entity click the add button choose the instance of property choose human hit save instead of all these operations I just tapped on my screen person not a person and I can do you know hundreds of edits during my daily commute there are other games like the gender game so this is about this is when Wikidata already knows that this item is a person but it doesn't know the gender of this person which is another one of the more basic items and this is taking a long time because of the language limitations that I said on it I guess the less exotic languages have already been exhausted in the game we don't have to wait all this time we can hello yes we can try something else how about occupation the occupation game here we go this is in Russian and what is the occupation of this gentleman well he is an Archimandrite he's a church person however so the occupation game is where Wikidata game will automatically pull likely occupations from the article text and ask for confirmation so if this person really is a deacon I should click that but I'm not sure I'm not clear on the Russian churches distinctions between I mean Archimandrite is pretty senior but I don't know if that automatically also means he's a deacon or not and Archimandrite is not listed here so I will click not listed also the guesses are not always correct so this guy for example is in Russian I can read this he's a philologist he's a linguist so I can confirm it and click linguist and again if we look at my contributions we can see the Wikidata game on my behalf created occupation linguist just by typing linguist there if it's taken from the article why would it ever be wrong well Jesus was the son of a carpenter the word carpenter appears in the text that doesn't mean it's correct to say Jesus was a carpenter just a trivial example so many articles will say born to a physician and so the word physician could be guessed but it shouldn't be correct unless the son is also a physician so I hope this gives you the gist of it there is also a distributed Wikidata game which is pretty awesome here we go which has additional games so for example the Keyan game gives you maybe it gives you some items to play with yes no so it gives you this little card and asks you to confirm is this instance of human settlement that is is it a village town city is it a kind of human settlement or not or maybe it's a book maybe it's a poem again so is it an English settlement and you can click the languages here to see the information so I can click English and indeed the article I mean the actual Wikipedia article says Kameiji is a town and territory in this district in the Congo so yes this is an instance of human settlement so I click yes and just clicking yes again went to that item and added property of human settlement now the point of all these games is these are tools written by programmers making kind of semi-educated guesses about these fairly basic properties and they are meant to semi-automate to assist in the accumulation of all these important pieces of data now every single click here helps Wikidata give better results richer results in future queries again but right now Wikidata can include Kameiji if I ask it what are some towns in Congo until now it could not because it literally didn't know so every time we click male, female, person not a person make these decisions we help improve Wikidata and enrich the results that we could receive how about this, about micro-contributions through the Wikidata game if that looks appealing I encourage you to go and visit the Wikidata game and start contributing in that way there is a question here if I make an article about Circus Bulgaria how should I correctly connect them that is an excellent question so now there is an item about that book but there is no Wikipedia article anywhere now that suppose I write one in Bulgarian maybe you go to Wikidata you find the item by searching you find the item and then in the empty site links section right at the bottom there we still have this Circus Bulgaria let's demonstrate this so here is the item about the book and now there is an article because I just created it I can go here to the empty Wikipedia links section click edit type the name of the wiki let's say English right and then type the name of the page that I just created Circus and again it offers me autocomplete for my convenience now we don't actually have the article that I just created but I could let's just say this was the article I can just click this hit save and that would associate the new wikipedia article with this wiki data item that is the beginning of the inter wiki list for this item I will not click save now because we didn't have the article yet so I hope that answers that question was there another question that I missed here no this idea of micro contributions if not then we can move on to embedding data and after that we can discuss queries how to get at all this data from wiki data so the short version of how to embed data from wiki data is that there is this little magic incantation curly brace, curly brace hash mark property curly brace it looks like a template but it isn't because of that hash and that is magic take a look at this little demo that I prepared this page which is off my user page on meta but it could be on any wiki okay says since San Francisco is item Q62 in wiki data and since population is property P1082 I can tell you that according to wiki data the population of San Francisco is this and this bolded number here was produced with this incantation curly brace, curly brace property P1082 that's population pipe from what item I'm pulling an arbitrary number I could put any property in any item here and kind of include it embedded into my text you notice this is my user page this isn't even the article about San Francisco I just want to pull that number into this thing that I'm writing so it's fairly simple I identify the property I identify the item to take it from and wiki data will I mean wikipedia wiki I'm on in this case meta will go to wikipedia and fetch it for me likewise since Denny Vrandecic the designer of wiki data is item 18618629 he's a notable person so he has a wiki data entity and since occupation is property 106 and date of birth is 569 and place of birth is 19 because of all that I can tell you that Vrandecic was born in Stuttgart on this date and is a researcher, programmer and computer scientist if you look at the source for this page click edit source you can see that the word Stuttgart does not appear here because it came from wiki data right I did not write this into my little demo page here see place of birth is where is it here born in property 19 from Q number so and so that is how easy it is to pull stuff into a wiki from wiki data okay now there's some nuance to it and there's some additional parameters you can give and you can ask wiki data to give you not just the text of the values but actually make it links so for example if I change this from property to values no that did not work at all values what was it values the magic word is statements statements so going back here if I change the word property to the word statements here then this same value that did not work at all oh because I'm on meta so because I'm on meta meta doesn't have an article named researcher, programmer or computer scientist but wikipedia does if I included this same syntax in wikipedia like English wikipedia for example so let's go there right now and go to my sandbox if I just brutally paste this on my sandbox here so see these became links because wikipedia has an article called programmer and computer scientist so like I said there's some additional nuance to the embedding the important thing is that this is the key to delivering on that first problem that I mentioned how to get data from a central location onto your wiki in your language basically using property and statements magic incantations and of course usually this would be in the context of an info box some wikis English wikipedia is not leading the way there some smaller wikis are more advanced actually in integrating wiki data and embedding like this into their info boxes so that instead of the info box just being a template on the wiki with field equals value, field equals value that template of the info box on the wiki pulls the values the birth date, the languages etc pulls them from wiki data basically I just demonstrated like single calls to this but of course an info box template would include maybe 20 or 40 such embeds and that is not a problem of course before you go and edit the English wikipedia's info box person and replace it all with wiki data embeds you should discuss it with the English wikipedia community these discussions have already been taking place there are some concerns about how to patrol this, how to keep it friendly etc so there are legitimate concerns with just moving everything to be embedded from wiki data but the communities are gradually handling this I mean this ability to embed from wiki data is not very old it's been around for about a year so communities are still working on integrating that technology but that is just the basics of how to pull data individual bits of data that's not asking those sweeping questions that I was talking about yet we'll get to that right now this is how to pull a specific datum, a specific piece of data from wiki data okay so here's another quick thing to demonstrate before we go to queries and that is the article placeholder is a feature that is being tested on the Esperanto wikipedia and maybe another wiki I don't remember and it is using the potential of wiki data to offer a placeholder for an article and automatically generated wiki data powered replacement placeholder for an article for articles that don't yet exist on Esperanto so let's go to the Esperanto wikipedia I don't speak Esperanto but let's look for Helen DeWitt, our friend in Esperanto wikipedia now Esperanto is not one of the wikipedia that have an article about Helen DeWitt and so it tells me that there is no Helen DeWitt maybe you were looking for Helena DeWitt no I was not you can start an article about Helen DeWitt you can search you know there's all this stuff but there is also this little option here hiding which tells me that the Esperanto wikipedia is what's happening here yes Esperanto wikipedia is ready to give me this page this page as you can see it's on the Esperanto wikipedia but it's not an article see it's a special page it's machine generated you can see the URL as well it's not you know slash Helen DeWitt it's slash specialio about topic and then the wiki data ID of Helen DeWitt and what I get here English description by the way because there is no Esperanto description wiki data can't make it up but what it can do is offer me these pieces of data in my language in this case Esperanto I'm on the Esperanto wikipedia okay so it tells me that she's American for example and it tells me that in Esperanto okay and it tells me that she speaks Latin remember we taught wiki data that it tells me that she was educated in Oxford and gives me the references to the extent that they exist I mean this is not an article it's not paragraphs of fluent Esperanto text but it is information that I can understand if I speak this language and it's better than nothing and remember Helen DeWitt was not a very detailed article if I were to ask about I don't know some politician or popular singer that has more data in wiki data then this machine generated thing would have been richer so this feature is available and is under beta testing right now but generally if this sounds interesting for you especially if you come from a smaller wiki that is missing a lot of articles that people may want to learn about you can contact the wiki media foundation and ask for article placeholder to be enabled on your wiki and again this is a placeholder of course it exists only until someone actually writes a proper Esperanto article about Helen DeWitt so I hope this is clear this is all coming from wiki data on the fly in real time as you can see it includes my latest edits to Helen DeWitt okay questions about the questions about the article placeholder if there are try and put them on the channel and this brings us to one of the main courses of this talk which is querying wiki data so I've explained how wiki data works we've walked through it, we've added to it we've created a new item we learned how to contribute during our commutes and all this was you kept promising us that this would enable these amazing queries so time to make good on that the URL you need to remember is query.wiki-data.org and that will take you to a query system that uses a language called Sparkle Sparkle spelt with a Q this language is not a wiki media creation it's a standardized language used for querying linked data sources and because of that there are certain usability prices that we pay for using Sparkle for using a standard language it's not completely custom made for querying wiki data we'll see that in just a moment the principle to remember about wiki data query is that wiki data will tell you everything it knows but no more I have anticipated this several times already right until this moment when we talked wiki data that Helen DeWitt speaks Latin she would not have appeared in query results asking who are American writers who speak Latin she would not have appeared but as of this afternoon she will appear because I've added that piece of information so a result of that principle is that you can never say well I ran a wiki data query and this is the list of Flemish painters who are sons of painters the list these are all the Flemish painters who are sons of painters that is never something you can say based on a wiki data query because of course maybe not all the Flemish painters who are sons of painters have been expressed in wiki data yet wiki data doesn't know about some of them or maybe it knows about all of them but doesn't know the important fact that this person is the son of that person because those properties have not been added and so they cannot be included in the results so the results of a wiki data query are never the definitive sets what you can say about a wiki data query is you know here are some Flemish painters who are sons of painters here are some cities with female mayors whatever it is you're querying about is never guaranteed to be complete because wiki data like wikipedia is a work in progress and of course the more excuse me the more we teach wiki data the more useful it becomes okay so let's go and see those queries so this is query.wikidata.org it's not the wiki alright so this isn't like some page on the wiki itself this is kind of an external system so it's not a wiki you can see I don't have a user page here I don't have a history tab you know this isn't a wiki page this is a special kind of tool or system and it invites me to input a sparkle query now most of us do not speak sparkle it's a technical language it's a query language some of you may be thinking about sql sql the database query language sparkle is named with kind of a wink or a nod to sql but I warn you if you are comfortable in sql don't expect to carry over your knowledge of sql into sparkle they're not the same they are superficially similar so they both use the keyword select and they use the word where and they use things like limit and order so again if you know this already from sql those mean roughly the same things but don't expect it to behave just like sql you do need to spend some time understanding how sparkle works so by all means I invite you to go and read one of the many fine sparkle tutorials that are out there on the web or to click the help button here which also includes help about sparkle but I also know that most of us when we want to do some advanced formatting on wiki for example we don't go and read the help page on templates we go to a page that already does what we want to do and adopt and adapt the code from that other page so we just take something that does roughly what we want and just copy it over and change what we need to change that is a very pragmatic and reasonable way to do things which is why engineers know this which is why they prepared this very handy button for us called examples we click the examples button and oh my god there's a ton of 312 example queries for us to choose from and we can just pick something that is roughly like what we're trying to find out and then just change what needs changing so let's take a very simple one the cats query maybe one of the simplest you could possibly have and let's run it first and then I'll kind of walk you through it the goal here is not to teach you sparkle but to get you to be kind of literate and sparkle to kind of understand why this does what it does so let's run this query first we click run and here I have results at the bottom the item which is just a wiki data item and of course is a number remember wiki data thinks of items as Q numbers and a label because we're humans and we prefer words to numbers so these 114 results are all the cats that wiki data knows about is this all the cats in the world no of course not remember it's all the cats wiki data knows about which means they're somehow notable to describe them on wiki data and wiki data was told this item is an instance of cat right so these are these are those cats and we can click any of them I don't know pixel for example click the wiki data item and here is the wiki data item about pixel with the Q number and he is a tortoise shell cat and as you can see instance of cat okay and he's five inches high and he is apparently documented in Indonesian in Bahasa right here this is pixel and he is apparently somehow related to the Guinness World Records book I don't speak Bahasa so I don't know exactly why this cat is so notable but of course cats can become notable for all kinds of reasons maybe they're a YouTube sensation maybe they were involved in some historical event I like this cat cat named Gladstone this cat this cat named Gladstone is he has position held chief mouser to her majesty's treasury this is an official cat with a job and he has been holding this job mind you since the 28th of June this past year that's the start time and there is no end time which means he currently holds the position of chief mouser to her majesty's treasury his employer is her majesty's treasury he's a male creature and wiki data knows that this cat is named after William Gladstone the Victorian prime minister of course if I don't know who this person is I can click through and learn that he was a liberal politician and prime minister he even has a twitter account and wiki data sends me right to it the treasury cat twitter account and he has articles in German and English and of course Japanese because he's a cat alright so this was a very simple query let's find out why it works we actually tell wiki data to do for us we said please select some items for us along with their labels okay along with their human readable labels because if I remove this label what I get is see just a list of item numbers that's not as fun so that's what this little bit did I just said give me the items but also their human readable label and I want you to select a bunch of items but not just any random bunch of items I want you to select items where a certain condition holds what is the condition the condition is that the item that I want you to select needs to have property 31 with a value of q146 well that's helpful if I hover over these numbers I get the human readable version so I'm looking for items that have property instance of with the value cat right because that's literally what I want right I want all the items that have a property a statement that says instance of cat that's the condition I'm not interested in items that are instance of book or instance of human I'm interested in instance of cat is the only condition here in this query this complicated line I ask you to basically ignore this is one of those sacrifices that we make for using a standard language like sparkle but the role of this complicated line is to basically ensure that we get the English label for that cat so don't worry about that just leave it there and we run the query and we get the list of cats English labels and that is awesome by the way if I change EN without really understanding this line if I change EN to HE for Hebrew I get the same results with a Hebrew label of course these cats nobody bothered to give them Hebrew labels unfortunately so I get the Q number but if I changed it to Japanese JA I would get still a bunch of Q numbers for where there isn't a Japanese label but I would get the labels in Japanese okay so this is an example of how you don't even need to understand all the syntax of this query to adapt it to your needs if you want this query as is but you want the labels in Japanese you can just change the language code here okay so that is that is all this query does again just give me the items that have property 31 instance of with a value 146 which is cat let's take a question just about this very simple query before we advance to more complicated queries any questions just about this did anyone kind of really lose me talking about this simple query again this query just tells Wikidata get me all the items somewhere among their statements have instance of cat that's the only condition no questions okay feel free to ask if you come up with one so let's complicate things a little let's ask only for male cats okay remember this cat Gladstone is male and we know this because we call sex or gender and the value is male creature right so let's add another condition right here under the first condition okay this is a new line and I'm adding a new condition to the query I'm saying not only do I want this item that you return to be instance of cat I also want this same item to have another property the property sex or gender right and I need to refer to the property by number but don't worry Wikidata will help you so you start with this prefix Wikidata WDT again just ignore that prefix it's one of the features of sparkle that we need to respect WDT colon and then I can just type control space to do a search to do an autocomplete so I can just type sex and Wikidata helpfully offers me a dropdown with relevant properties so I click property 21 which is the sex or gender property and then I say so I want the sex or gender property to have the Wikidata value again control space and I can just say male creature see there's a different a different item for male as in human and a different one for male creature so for the reasons that we won't go into let's pick male creature because we're talking about cats here and add a period here at the end and click run and instead of 114 cats this time we got 43 results including our friend Gladstone who is a male creature cat so that means all the rest are female right wrong not mean that at all what it means is of the 114 items that have instance of cat only 43 have explicitly sex male creature the rest of them do not maybe because they have sex female creature but maybe because they don't have that property at all I'm emphasizing this to kind of help you train yourself to correctly interpret the results of queries from Wikidata don't jump into this kind of simplistic conclusion okay there's 114 total 43 male therefore the rest are female that is not correct okay but 43 of those explicitly had another statement sex or gender male creature so I just added another condition and now my query is asking what are the separate things about the results they need to be a cat and a male creature maybe we should see how many cats have twitter accounts but there is a question from youtube which is will you talk about the export possibilities of the results of the query absolutely I will in just a little bit I mean there's in addition to just getting this kind of table I can get these results in formats and I can also download these results I can click the download button and get them as a comma separated file tab separated file a json file which is useful for programmatic uses I can also get a link so I can get a link to this query I mean I spent all this time designing this beautiful query I can get a short URL that was generated especially for me right now with a tiny URL I can just paste this into twitter hey people look at all the male cats that wiki data knows about okay this is not a very exciting query but once I get to a really complicated exciting query I can totally share that very easily through this and we will get to more interesting queries in just a second any questions on this kind of basic querying so far okay so so that was a very simple example let's spend a moment exploring so this cat this cat Gladstone was named after this dude William Gladstone who was an important British politician I'm sure he's not the only thing out there in the universe that's named after Gladstone right I mean there's gotta be I don't know park benches planets asteroids something other than the cat named after this guy you know so we can ask wiki data to tell us all the things that you know without saying instance of something like I don't know anything named after William Gladstone so how do I do that same principle instead of asking about the property instance of property 31 instead of that I will ask about the property named after named after I don't need to remember the number I have autocomplete named after is property 138 and I want anything at all that is named after this person William Gladstone here we go which is you know one six zero eight five two whatever okay you notice I removed instance of cat I removed the male creature I'm only asking get me all the items that are somehow named after that particular politician and I run the query and it turns out the wiki data knows about three such things does that mean that's the only these are the only three things named after him in the world of course not but these are the only three items that are in wiki data and explicitly have the property named after Gladstone for all I know there may be a village in England called Gladstone named after this person but if nobody added the property named after linking to the person he wouldn't show up in the results to my query so wiki data knows about three such things one of them is something called the Gladstone Professor of Government I can click through and see that it's a it's a chair at Oxford University right so it's a position and another is the William Gladstone school number 18 William Gladstone school number 18 where is that that is in Sofia, Bulgaria again so that's a particular school in Bulgaria named after William Gladstone and finally the third result is of course our pal Gladstone the chief mouser right if I click through that's the cat alright so that was an example I mean you saw how easy it was I just named the property and the value that I care about and I get the results again I mean it's kind of a silly example but think about it this is how else can you answer that question there's no reference desk even at the great University of Oxford where you could walk in and say give me a list of things named after Gladstone answer that unless you happen to have a very large structured and linked data store like wiki data alright so that was a silly example there's a bunch of stuff on there oh okay can you show easy query on the video and somebody needs to know how to just do property exist without giving a specific value and then once you show easy query you reload the page and I don't know easy query so is that a gadget I don't know what easy query is I don't use it so someone can maybe send a link or something oh it is a gadget I don't have it enabled that is nice so now what I just did by hand was formulating the query named after Gladstone I guess this is the is it I just clicked the ellipsis here right after the name you see this this was just added by enabling easy query which I just learned about so you just click this and it automatically this kind of trivial query of course if I want a more complicated query like I don't know give me all the things that are named after Lincoln but are a school I will still need to kind of edit a custom query but this is a super easy and very nice way of just doing a super quick query for exactly this right like what other items have exactly this property and value named after William Gladstone so thank you whoever made the suggestion to demonstrate that and I'm glad I learned something too today let's move to another sample query here's a fun example popular surnames among fictional characters think about that for a second popular surnames among fictional characters so I'm asking Wikidata to go through all the fictional characters you know and of those look through their surnames group them so that you can count them the repetitions of the surnames and give me the most popular surnames among them additionally I want you to awesomely present the results as a bubble chart oh yeah Wikidata can do that I run the query and check it out the most popular names among fictional characters Wikidata knows about our Jones, Smith, Taylor etc I mean for all we know the most popular name among fictional characters actually in the world maybe woo or something in Chinese for all we know but if that has not been modeled in Wikidata we're not going to get that so Taylor, Smith, Jones Williams seem to be the most popular names and again I could limit this I could make the same query but add only among works whose original language was Italian for example to get more interesting results if I only care about Italian literature but this is an example of how I got awesome bubble charts for free and I can just plug this into an awesome presentation that I make of course I can still look at the raw table so the query still resulted in a bunch of data so Smith repeats 41 times Jones 38 times, Taylor 34 times etc etc down that list and again I could export this into a file and load it up in a spreadsheet and do additional processing on it, I can link to it I can do all kinds of awesome things with it so that's another awesome query we don't have to go into every line by line analysis here of why this works the way it does I want to show you some other queries first let's look at this is just fun overall causes of death again a bubble chart just looking at people who died of things and have a cause of death listed and we learned that the most commonly listed cause of death is myocardial infarction humanitis, cerebrovascular lung cancer etc etc again in a bubble chart so how does that work so just very briefly the important parts of this query are I'm looking for something for some person who is instance of Q5 which is human so a human just to kind of limit the query I'm not interested in books or mountains I'm looking for humans who have that same person that same variable PID should have a 509 meaning hello I have the 509 which is cause of death and that cause of death is another variable that I'm calling CID now previously we were saying you know I want things that are named after Gladstone specifically only things that have that particular value here I'm saying I'm looking for things that have some cause of death not a specific one I just want you to get everything that has a statement with some value about property 509 cause of death and then this other bit of magic here the group by tells wikidata I'm not actually interested in every individual thing I want you to group those causes and then count them and give me the top ones so that's how this query works here's that query I promised painters whose fathers were also painters I can only think of you know a couple I mean Monet and Boygel but I'm sure wikidata knows many more so let's run this query and I have 100 results by the way I have limited it to 100 results just to keep it kind of snappy I mean we could maybe try removing the limit and see if wikidata can tell us the total number in wikidata yeah that wasn't too bad so 1270 results wikidata already at this early date in its progress already knows about more than 1200 painters who are sons of painters sons of male painters like their father is a painter there may be additional painters who are sons of female painters not included in this query again always remember what exactly you were asking in this query I was asking about the father I'm leaving out any possible painters who are sons of mother painters so how does this work I'm asking for the painter along with the human label and the father along with the human label so Michel Monet is the son of Claude Monet and Domenico Tintoretto is the son of the famous Tintoretto whose label is just Tintoretto like Michelangelo you don't always have the full name in the common label Paloma Picasso is the daughter of Pablo Picasso so wikidata knows about all these results of course Holbein the Younger son of Holbein the Elder and how did we get there well we asked wikidata to look for something let's call it painter which has 106 which is occupation with a value this value hello painter right this unwieldy number 102 8181 that's painter so I'm asking for any item that has occupation painter and let's call that item painter I also want that painter to have a property 22 which is father okay father and I wanted to have some value okay I'm putting it into another variable called father I could have called it you know frog that doesn't change anything just to be clear what matters is that this is the property father right I could have called it anything I want so and then I have a third condition that father like whatever it says here in property 22 I want that father to have himself a property 106 occupation with a value painter okay these conditions combine to give me a list of people who have a father and that father has occupation painter as well of course if I suddenly or if you suddenly are consumed by curiosity to know who are some I don't know politicians who are sons of carpenters you could just change that right change the first value from painter to politician change the third lines value from painter to carpenter maybe that list will be very short because carpenters don't tend to be notable so they wouldn't be represented on wiki data that's why this works relatively well with painters right because most of them are notable but generally you could do that right that's an example of how you can take a query and just replace one of those values or even the language right so again I could ask for these same painters that's limited again these same painters but with Arabic labels same query but I have Arabic labels for these painters and of course where there is no Arabic label I get the Q number okay so that's that query that I promised you painters who are sons of painters can be done by wiki data in under one second how awesome is that we can also get some statistics so how about counting total articles in a given wiki by gender this is what we call the content gender gap as distinct from the participation gender gap right this is the gender gap in what we cover on wikipedia so let's take one of these so this is a query articles about women in some given wikipedia alright so let's take I don't know let's take the Tamil wikipedia that's language code TA so I just put TA here and I click run and I get this count that's all I wanted I'm not actually interested in the items like in the list of women on the Tamil wikipedia I just want the number so I selected the count here and this number turns out to be 2159 so there are 2000 articles about women on wiki on the Tamil wikipedia that wiki data knows to be female right I'm asking about the gender field property 21 again remember if there's some article about a woman in Tamil wikipedia but wiki data doesn't have a statement about the gender that person will not be counted here so again be careful about kind of stating that is exactly the number of women articles on Tamil wikipedia that's probably not true I'm sure some of those articles are missing a sex or gender property but for raw statistics that's probably good because some men are also missing the sex or gender property so we could take the same query for men it's essentially the exact same it just has this unwieldy number for male 658 1097 I can change this language code again to TA for Tamil and how many men are covered on Tamil wikipedia 14,649 okay so women 2100 men about 7 times as many so that's approximate size of the content gender gap on Tamil wikipedia and again I can complicate this query as much as I want for example I can try and find out if the this gender gap is wider or narrower among musicians just as an example right I could just add a line here that says you know occupation musician and then I'm only counting articles on Tamil wikipedia about musicians who are female versus articles on Tamil wikipedia about musicians who are male and I can kind of compare the gender content gender gap across occupations on Tamil wikipedia do you see the important point here is that this is not just kind of a one-purpose query I can just with a single additional conditional suddenly make it a much more interesting query because I break it down by occupation or I break it down by century you know do we have more of a coverage gap in 19th century people than in 21st century people I mean I sure hope so right the patriarchy is weakening somewhat so I wouldn't be surprised you know if there are many more notable men covered about the 19th century but if we are also covering I mean if the gender gap is just as wide for 21st century people that would be a little disappointing again that's something I can fairly easily find out on wiki data query any questions so far or are you just sharing yep there is one so somebody is wondering if you can demonstrate or at least give a short answer of the latter of this question is it possible using wiki data sparkle to find specific wiki data articles e.g. featured articles of a certain language which do not exist in another language I know it is possible to find category based results using pet scan tool but can we specify that by selecting e.g. featured articles yes excellent question it is possible indeed and I will demonstrate one such query another query that I already mentioned largest cities in the world with a female mayor this query let's close some of these tabs before my browser chokes so this query lists the major world cities run by women currently and the answer is Mumbai Tokyo city Tokyo a bunch of others and wait that is not it at all I clicked the wrong one that is the map of paintings let's demonstrate that for a second so this is the map of all paintings for which we know a location with the count per location and the results are awesomely presented on a map again under the hood this is a table of course of results but awesomely I can browse it as a map so here is a map of the world with all the paintings that WikiData knows about not just knows about the paintings but knows about their location in a museum not surprisingly Europe is much better covered than Russia or Africa there is a huge gap in contribution to WikiData from these countries and some of it can be fixed and of course there is much more documentation and much more art in Europe but if we zoom in I don't know Rome probably has a few paintings right hello Vatican city sounds like a good bet I can zoom in here you know I can click one of these dots and see in this point there are two paintings and in this one there is one and it's the Arch Basilica of Saint John Lateran let's see this is the actual Saint Peter right Sistine Chapel has 23 paintings what the Sistine Chapel has way more than 23 paintings correct but 23 of them are documented on WikiData have their own item for the painting not the Sistine Chapel the painting has an item that lists its being in the Sistine Chapel there are 23 of those okay there is definitely room to document the rest of the artworks in the Sistine Chapel so again this is just not the kind of query you were able to make before WikiData and it's a fairly simple query as you can see there are using maps like airports within 100 km of Berlin again using the coordinates as a useful data point and here is a map showing me only airports within 100 km radius from Berlin but I wanted to show you the mayor's query let's click the oh I just have the wrong link here but I can still find it here typing mayor here we go largest cities with female mayor so this is a slightly more complicated query but if I run it I get the top 10 because I said limits 10 I get the top 10 cities in the world by population size that are currently run by women Tokyo, Mumbai, Yokohama, Karakas etc and one interesting thing that you may want to notice here is that I am asking for cities I mean items that are instance of city and that have a head of government that have some statement about who is in charge and that statement has sex that's listed up here as female don't worry about the syntax right now I just want to show you some specific angle here if I am further filtering these results I only want those those items where there is not the property and the qualifier end time why is that important because if a city once had a female mayor but that mayor is not the mayor anymore because mayors change I don't want them in this query I don't want cities currently having a female mayor and of course wiki data may have historical data with start and end time as we have seen the documents this person was the mayor of Tokyo or San Francisco between these years but if there is no end time that means they are currently the mayor so that's an example of asking about a qualifier of a statement again to get the results we actually want if we want current mayors in this filter if we don't we will get historical female mayors as well alright so these are some example queries questions about that no the featured article example so let's let's look at that so I have prepared such a query recently here we go so this is a query I just saved it here on my user page this is not wiki data query right on my user page containing the query usefully and let's run this so this query it's actually not very complicated it just has a long list of countries because I'm asking about African countries I'm looking for human females from one of these countries that have an article in English that's what this line means but not in French that's what this part means this part these two lines together but not in French and this is what's called a badge that's wiki data's concept of good and featured articles it's called a badge so I want them to have some badge on English wikipedia so again this query I'm just asking for the top 100 women from Africa who are documented on English wikipedia in a featured or good article status but not in French wikipedia so this is a query that's a to-do query that's a query for French editors to consider what they might usefully translate or create in French and if we run this see we have three results I mean we have many women from Africa covered on English wikipedia but only three articles have featured or good status among those that do not have French wikipedia coverage let me rephrase that among the English wikipedia articles about African women that don't have a French counterpart only three are featured or good do you see this the badge is good article this little incantation here is what allows you to ask about the badge this here and by the way the slides will be uploaded to comments and we will how should we make it available on the YouTube thing as well no no but I mean for people who will later watch this video oh yeah we can add it to the YouTube description and the comments description so if you're watching this video later in the description we will add a link to this query specifically because it's not in the slides right now it will be okay so questions so far we're almost done we have a few minutes left so questions about queries I mean I'm sure there's tons of things to do yet and maybe you didn't really get the sense for sparkle it's something you need to really do on your own on your computer see how it works fiddle with it change something see that it breaks and complains but very importantly oh I had this in the other question slide remember wikidata project chat that's kind of the wikidata equivalent of the village pump it's the page on wikidata where you can just show up and ask a question in my experience the wikidata community is very nice very welcoming and very eager to help newer people integrate and learn how to do things there's also an IRC channel if you know what IRC is and how to use it by all means go to IRC channel wikidata there's people there all the time and you can just ask a question if you're trying to do a query and you don't quite understand the syntax there's people there who will gladly help you do that there's also a wikidata newsletter published by the wikidata team it's centered in Germany and they send out a newsletter in English with wikidata news new properties, new items, new things in the project but also sample queries so once a week there's kind of an awesome query to learn from if you want to learn that way instead of reading like a whole manual on sparkle so I'm just encouraging you to get help in one of those channels of course you can write to me just reach out to me and ask me questions as well I hope by now you agree that wikidata is love and wikidata is awesome if there are no questions we do have a tiny bit of time to demonstrate one more tool but that's no questions okay so let's talk about well the resonator is kind of nice but it's a little like the article placeholder so this is not wikidata this is a tool again built by Magnus Manske there's also one final question too in case there's a question which advantages and disadvantages to create an item before an article is done on English wikipedia well I mean this example that I just made right I'm reading this book by a notable author okay I want this to exist on wikidata and to be mentioned on wikidata so that when people look up that author in wikidata they will know about one of his notable works but I'm not prepared to put in the time investment to build a whole article on English wikipedia either because I don't have the time or I don't have good sources or maybe my English is not good enough but it is good enough to just record these very basic facts and point to the Library of Congress record etc so that it's better than nothing so that's one reason to maybe do it another reason is to be able to link to it so remember that translator lady already had an item on wikidata but if she hadn't we could have just created a very very basic rudimentary item about her just saying you know this name is a human country Bulgaria occupation translator even just that would have been something and would have enabled me to link to this person so these are legitimate reasons to create wikidata entities without or at least before creating a wikipedia article if you are going to create I mean if you're at an editathon or something and you have come to create wikipedia articles by all means first create the wikipedia article then create the wikidata item and link to it I hope that answers the question so the reasonator is simply a kind of a prettier view of items in wikidata so you can just type the name of an item or the number let's pick just a random number 42 say 42 which happens to be maybe you've heard of this guy Douglas Adams he happened to have received the Q number 42 I'm sure it's a cosmic coincidence of infinite improbability and this is a tool that is not wikidata it's a tool built on top of wikidata called reasonator and it gives us the information from Q42 that is from the this item in wikidata which looks like an item in wikidata but it gives it to us in a slightly more rational kind of layout it even kind of generates a little bit of pseudo article text for us you know Douglas Adams was a British writer and author he was born on this date in this place to these people he studied at this place between these years that's all machine generated nobody wrote this text that's all taken from those statements in wikidata and generates this reasonable reading summary paragraph and then it gives us this little table of relatives it's all taken from wikidata but as you can see this is already then the essentially arbitrary ordering of statements on wikidata and that's okay kind of by design wikidata is the platform there are going to be many new applications and platforms and tools and visual interfaces on top of wikidata to browse wikidata in more friendly or more customized ways for example one of the things that wikidata does for us is give us pictures and maps and a timeline check it out this timeline machine generated just from dates and points in time mentioned in the relatively rich wikidata item about Douglas Adams right so this timeline for example again completely machine generated but he was educated between these years you know so I can put it on a timeline and this is the year he was nominated for a Hugo Award so I can put that in a timeline etc so that's just a super quick demonstration of that tool the resonator links are all here in the slides and the final tool I wanted to mention very quickly is the mix and match tool you remember my explanation about wikidata as nexus as connection between many databases many data sources those depend on these equivalencies on wikidata being taught that this item is like that ID in this other database and mix and match is a tool again by Magnus Manske maybe you're detecting a pattern here it's a tool by Magnus that is designed to enable us to kind of take a foreign and external dataset put it alongside wikidata and kind of try and align them so this item in this external dataset is that already covered in wikidata if so by what Q number by what item if not maybe we need to create a wikidata item to represent it or maybe it's a duplicate or something so the mix and match tool has a list of external datasets as you can see the art and architecture thesaurus by the Getty research institute or the Australian dictionary of biography all kinds of external datasets here oh I had a specific link yeah to the Royal Society it can also give me some statistics so there is an external dataset of all the fellows of the Royal Society the oldest academic learned society in England and the internet is tired here we go nope did not work fellows of the Royal Society here we go so this one is complete I mean people have manually gone over every single item there and either matched it to wikidata or declared that it was not in scope or a duplicate or whatever but let's look at site stats this is a fun kind of aspect of this tool but that is not working or it's taking too long so let's just demonstrate how this works maybe Britannica is that done already here we go encyclopedia Britannica so the encyclopedia Britannica has 40% of the items there are not yet processed so let's process one of them for example there is an item in the encyclopedia Britannica called Boston England England as you know all American place names are totally stolen from elsewhere so there is a Boston in England though it's no longer the famous one and the mix and match tool has automatically matched it based on the label to Q100 which is Boston big city in the United States and that is incorrect it's kind of a naive computer going well this is Boston and this other thing is also Boston and it is asking me to confirm this match or not you see so this is the Boston England from Britannica and the tool is asking me is this the same as Boston Q100 in America the answer is no I remove this I remove this match and the end is unmatched and I can match it to the correct one in England I can do this by searching English Wikipedia or searching Wikidata it has these handy links so the English town is in Lincolnshire Boston Lincolnshire so I can go there right and get the Wikidata item number so see this is not Q100 Boston in the States this is Q31975 town in Lincolnshire I can get this Q number go back to the mix and match tool where was that here we are and set Q I can tell the tool that this is the right Boston and click OK now this town in Lincolnshire you can see this here this item Q31975 is linked to Britannica what does this mean well if we go there if we actually go to the Wikidata entity you will see that in addition to the few statements that it already had it now has thanks to my clicking it now has another identifier here Encyclopedia Britannica online ID with this link and if we click it we will indeed reach this page in the Britannica online which is indeed about this town in Lincolnshire you see so I have contributed one of those mappings one of those identifiers into Wikidata and I didn't have to do it manually this tool kind of prompted me to either confirm if it was correct I could have just clicked confirm since it wasn't correct I corrected it manually but it made this edit on my behalf so that's another tool that encourages us to systematically teach Wikidata more things and we're out of time go edit Wikidata now that you have the power you know the deal use it for good and not for evil if you're watching this video not live the description will have links to the slides and to a bunch of other useful pieces of information any last questions on IRC if not thank you for your attention and if you like this and if you feel that you now get Wikidata and you get what it's good for and you're inspired to contribute I have only one request from you using it for good and not for evil I ask that you spread the word show this video share this video with other people in your community or around you teach this yourself once you're comfortable with these concepts feel free to use my slides yeah and edit Wikidata thank you very much and goodbye