 So, hello everyone. This will be a tutorial session about querying Wikidata, the Wikimedia Foundation's structured and linked data project. For beginners, we'll start with a super quick tour of Wikidata itself to make sure we're all on the same page in terms of terms, and then we will proceed to learn how to query Wikidata to tap into its fast and awesome power. Because I'm teaching you how to do something technical, how to query Wikidata, I will encourage you to look at the shared screen most of the time, where I will be typing and inviting you to follow along. There won't be much to see other than that on the video. So, let's get right to it. You can switch to the shared screen. So, this is Wikidata. For those who haven't met it, it looks kind of like Wikipedia because it is based on the same software as Wikipedia plus an additional layer of software called Wikibase. It has all the features we expect from a Wiki, user pages, talk pages, history, watch lists, etc. But inside it does not have articles. It has items and we look at a particular item. Let's take a random number like 42. Oh, look, that happens to be the item about Douglas Adams. We can see that Wikidata knows a lot of things about Douglas Adams. First of all, it knows what to call him in English, which is the current language I'm browsing in. It knows a Q number for the item. So, all items in Wikidata are identified by a single Q number. And it has some aliases for him in English with his middle name, with the middle name anti-critics, or with the middle initial. It also knows what to call him in other languages. As we can see in this box here, this is just a selection of languages. If you click down here on all entered languages, you can see Wikidata knows what to call Douglas Adams in many, many, many different languages. And then scrolling down, we see the heart of the structured data on Wikidata, which are the statements. The statements are of the form property and value. With the left side, the shaded area is the properties. Instance of and image are properties. And the right side is the value. In this case, Douglas Adams is an instance of human, meaning the item Q42, the item about Douglas Adams, has a property called instance of with a value, human. This may sound trivial to us to record that Douglas Adams is a human, but it's not so trivial because it helps tell apart this item from all the other kinds of things that Wikidata can cover, like mountains and rivers and poems and religious concepts in countries. So, Douglas Adams is none of those things. Douglas Adams is a human. It also helps tell him apart from fictional characters. Arthur Dent from The Hitchhiker's Guide to the Galaxy is also a human, but he is a fictional human. So, in Wikidata, he would not be instance of human, he would be instance of fictional character. Moving on, Wikidata records all kinds of other details about Douglas Adams, sex or gender mail, country of citizenship, UK, etc., etc. We can see that Wikidata can record more than one value per property. For example, Douglas Adams has two given names, and both of them are recorded here, even with information about their order, which goes first. This here, under the value Douglas, forgive my unruly mouse, this here is called a qualifier, because it is an additional set of a property and a value. Here, the property is series ordinal and the value is one, but this is a property and value that don't stand on their own. It's not correct to say Douglas Adams' series ordinal is one. It is in fact only describing the particular value. The name Douglas has the series ordinal one in the series of names of Douglas Adams. I'm going to skip a lot of further details here, because an introduction to Wikidata can be had in a different talk, so we will just skip right ahead to query Wikidata. To query Wikidata, we need to go to the query service, which conveniently is always linked here from the left in Wikidata. There's a link here called query service. You can click it to arrive at this URL, query.wikidata.org. It's also easy to remember, and that is the query system for Wikidata. Now, queries is where you can really see the power of having gone to the trouble of collecting all of this piecemeal data about Douglas Adams and about the millions of other items available and recorded in Wikidata. By the way, Wikidata has, as of this recording, 48.9 million items. That means it has some structured data about 48.9 million things. For comparison, English Wikipedia has 5.6 million articles as of today, 5.6 million articles. So, Wikidata is already much more comprehensive than even the largest Wikipedia in terms of the things it covers. It, of course, covers them in a different way. You won't find narratives on Wikidata like the causes for World War II, which you could find on English Wikipedia. Wikidata will only store what can be expressed as structured data, but that is surprisingly a lot. Getting back to the query engine for Wikidata, if you type along with me and click the query service, your screen may look more like this. With this query helper on the side, I'm going to encourage you to close that query helper. For now, we will get back to it a little later. For now, I want you to focus on this. So, this window invites us to input a Sparkle query. Sparkle is the query language used to query Wikidata. It is a standard language, a standard technology that is endorsed by the World Wide Web Consortium for querying any linked data information source. Because of that, because it's not a custom tool we have built for Wikidata, there are a couple of tiny inconveniences that we have to bear with, but we gain the benefit of being a standard language, meaning people with existing Sparkle experience from other linked data sources can use Wikidata query easily. And also, we, once we learn Sparkle, could apply this to other linked databases. The problem, however, is that you don't know Sparkle, presumably that is why you are watching this video. So, I have two pieces of good news for you. First is we're going to learn some Sparkle right now from the ground up. And secondly, the Wikidata developers love you and want you to succeed. So, they have provided you with examples. So, to solve the problem of the horror of the empty page, the yawning void of the Sparkle box here, we are going to follow Picasso's advice when he said, good artists imitate great artists steal. And we're going to steal, sorry, adapt an example by clicking the examples button here. And picking the first example, the example of cats. There are 400 other examples here, as of this recording. So, having picked the cat's example, apologies, something got stuck in my browser. Having picked the cat's example, we should be faced with this piece of code that we don't understand yet. But that has already solved our first problem of having an empty box with only Sparkle. So, we look at this query, it's called cats. And what it does is it tells us all the cats that Wikidata knows about. Why does Wikidata know about any cats? Well, some cats are notable. If we click the play icon here, that executes the query. And scrolling down, we can see we got a list, or table rather, with a bunch of item numbers, inscrutable queue numbers, and mercifully also human readable labels, names, right? That's Wikidata calls names labels. And we see that there are cats like Mr. White, and Hamilton, and Nutmeg, the cat, and all kinds of cats like that. And these are the cats Wikidata knows about, because they're notable for one reason or another. Maybe they used to belong to the Clinton family, or maybe they are an internet sensation. And someone chose to document them. Let's take a quick visit with one of my favorite cats, which is Gladstone. So to learn something about Gladstone, we can click this queue number, open it in a new tab, and that takes us to the actual Wikidata item about Gladstone the cat. And Gladstone the cat is an instance of house cat. We have an image for him, and he's male. And the reason he's documented on Wikidata is that this cat has an employer, and the employer is Her Majesty's Treasury in the United Kingdom. Yes indeed, this cat, in fact, holds the position, according to Wikidata, of Chief Mauser to Her Majesty's Treasury starting June 2016, and he's still holding that position as far as we know. He's of course named after William U. or Gladstone the Victorian Prime Minister and politician. All right, so we have some information here about this cat. We are satisfied that this query indeed returned a bunch of cats. Now let's try and understand how it did that. Let's scroll back up to the query itself and take it line by line. First of all, first line begins with a hash mark, and that makes it a comment. That means the computer does not care what we say here. We can say anything we like. We said cats to remind us that this piece of code is about cats, because as you can see, nothing else about it particularly screams cats. So the comment is useful, but it could have said anything else at all. It could say for example, I don't know, moose instead of cats. And if I run this query again, I still get the same 121 results with the same cats. This here is the number of results and how long it took. So we got 120 results in about half a second. So having satisfied ourselves that the comment is ignored, we can restore it to cats just to prevent confusion and move on. The second line is the select line, and this line is telling the query engine what we would like to see returned from this query. And as you can see it names, excuse me, two elements item and item label following question marks. These question marks denote them as variables. So we're asking the query engine to return to us the value inside the variables item and item label. And indeed, if we look at the results down the page, our table has two columns, one of them called item and the other item label. As we see item returns a Q number and a link to the item identified by that Q number, whereas item label returns the squishy human speak name for that item. So let's keep this in mind when we end up wanting additional data in our results more than these two columns, we will need to change the select line. Moving on, the next line says where and is followed by a block. The highlighted segment here is a block, it starts with a curly brace and ends with a curly brace that's called a block. So every query from the for the wiki data query engine is essentially saying out of all the items on wiki data, I want only those where a certain condition is true. The condition is expressed within these curly braces. So let's zoom in and see how did we express the condition I only want cats. First of all, let's add a space line here before line six like so. This scary looking line we will discuss a little later. It's a bit of a helper. It's what helps us get labels easily, but we don't have to understand how it works just yet. So actually the condition what tells the query engine that we want cats is just this simple line. This line that has three elements in it and ends with a full stop. I'm going to add a space here so you can see the full stop a little better. The space is not mandatory. So it has three elements this line and to those of you who are familiar with wiki data, who have watched the wiki data introduction, these three elements remind us of the triple of item, property and value. Item, property and value and that is exactly what these three are supposed to remind you of because this line is essentially a pattern that we are asking the query engine to match or match items against. So it goes through all of wiki data and it returns to us only those items where there is a match between what we specified here and what the item contains. So let's add here a little comment to help us. The comments start with this and we'll say item, property just to remind us what is what. So we're asking the query engine to match any item that's this part. We're not asking the query engine to match a particular item ID because remember we want all the cats on wiki data. We don't want only cats with queue number so and so right. We want any and all cats on wiki data. So under item we don't want to specify a very specific value. In fact whatever it is whatever the qid we don't care we want it. That's when we use a variable here question mark item. This is where the value sorry this is where the variable gets its value this is what populates it with the value. So we say look match whatever the qid number is just put it into the variable item but I do have conditions about the property and the value. So I want items that have specifically property 31 specifically property 31 which is instance of and with value specifically u146 which is house cat. How do I know? Well you can always hover over these and you get the translation from wiki data numbers to squishy human speak by instance of house cat and you don't have to memorize this. You never have to memorize the numbers for wiki data IDs. There is an autocomplete function that I will show in just a moment but to understand why this line finds us the cats we need to rephrase our question from human terms to wiki data terms. So in human speak we might say get me all the cats right like this but in wiki data terms all the cats can be translated to me all the items that have instance of with value house cat right because that is what a cat looks like on wiki data what a mountain looks like on wiki data is an item that has instance of mountain a cat has instance of cat so saying I want all the cats on wiki data in technical terms is saying I want all the items that have the instance of property with the value cat. So that's what we're saying here get me any item I don't care about the ID that has specifically the instance of property property 31 with specifically the value house cat q146 all right again we ignore this long and scary looking line so that's really it that's the condition I simply gave it a pattern to match against and the pattern says I don't care about the item ID but it has to have instance of and it has to have value house cat under the instance of okay so now we know how to find all the cats on wiki data that's moderately useful but really we would have liked to do more than find cats on wiki data how about all the dogs did we find all the dogs on wiki data yes we can and what that means in wiki data terms is again all the items that have instance of with the value dog so all I need to do to change this query to find dogs for me is go here and instead of specifying specifically house cat which is q146 I need to specify a different q number before we do that let's say a few words about these extra characters here beside the q146 and beside the p31 these are prefixes and they are just for our convenience technically everything in a sparkle query should refer to URIs but they are a little lengthy to type all the time so we have these handy prefixes so just remember this every time you specify a q number in a wiki data query it has to be prefixed by wd colon wiki data right wd colon and every time you mention a property that you want to match against you want it to be prefixed with wd t colon not wd wd t or proper t remember this funny pronunciation it will help you keep it in mind wd t for proper t so we leave the wd colon but instead of q146 which is cat we need to put in the value for dog but what is the value for dog I don't have to memorize this right after the colon I can press control space control space after pressing that and only after pressing that I am invited to type a human speak name and wiki data will try to help me find the right number so I can just type dog and I have this drop down now and I can pick the right item it's very important to pick the correct item in this case it's the first one domestic animal yep that's the one I meant q144 I did not mean the sign of the Chinese zodiac and I did not mean the heraldic animal hound nor did I mean a painting by Theo van du school so it's important to pick the right item if you look for all the items that are instance of dog the sign of the Chinese zodiac you probably won't get what you expect so I click this one and my human speak is changed into reassuring q numbers in this case 144 and that's it I can run my query and now I got 304 results not 120 and indeed these are dogs like this famous dog Laika so that's a whole bunch of notable dogs we won't go and explore them but you're welcome too and the only thing remaining to do with this query is maybe make it less confusing by changing my comments although again the computer doesn't care but for us humans changing the cat to dog so that nobody gets confused so these this is now a query that finds all the dogs all right so we know how to find all the cats and all the dogs that's still not terribly useful and we expand this a little well can we find all the countries that Wikidata knows about yes we can with exactly the same method instead of looking for instance of dog we are now looking for pardon me instance of country not country music nope this one distinct region in geography that's what we want we change it to country we run the query and now we got a list of countries that Wikidata knows about so in other words we now know not just how to find cats and how to find dogs we know how to find anything well almost we've only been using one property of Wikidata the instance of property so if we're looking for say American journalists um we cannot search using what we've learned so far we cannot say well i'm looking for all the items that are instance of American journalists because Wikidata takes instance of to mean the essence of a thing and people are not born journalists right they're born humans so for all humans as we saw in the case of Douglas Adams earlier all humans should be instance of human all real humans right not not fictional humans so those American journalists will also be instance of human they would not be instance of American journalists which means we need to actually query about other properties in fact we don't just want all the humans and we don't even just want all the journalists we want all the journalists who are also American we need we want to combine more than one condition happily that is very easy to do with Wikidata query all we need to do is add another line to add a condition so first of all let's start by making sense and change the title of this query to be American journalists now i still want this line that says instance of but this time what should the American journalist be an instance of it should be an instance of human human which is just q5 that's easy to remember that's that's off yes just a quick question how do you get that auto complete search thing to show up i tried to do that on mine and it didn't work you know when you just you type in a word and it does that auto search for the yeah so what you need to do is let me do that again so here after the colon you press control and space control and space then it shows up then you can type ah okay cool all right so yes so human q5 great so now i'm looking for all the items that have instance off with value human all right and now second line i'm adding an additional condition not only do i want the item to have instance of human i also want this item to be American right and how how is that expressed in Wikidata terms being American is expressed using a property called country of citizenship and the value would be the united states of america now so for the property i need the property country of citizenship and i need to refer to it by number i don't actually remember its number so i just type wd t for property and you you saw that brief message i hope at the bottom that reminded you that you can press control space to activate auto completion and i can just type the property i want in fact properties have aliases too so for example if i don't remember and i say oh i want to ask about the nationality of this person i can start typing nationality and even though you can see i typed nationality i'm being offered correctly the property country of citizenship that's what it's actually called all right so p 27 and the value of the nationality i'm looking for in this query is usa right so wd colon space and i can just type usa which is one of the aliases of course for the united states of america federal republic in north america yep that's the one i mean not usa network the tv channel not the union of south africa not any one of these other meanings of usa this is the one i mean so i pick it and that end my line with a full stop don't forget the full stop or you will encounter an error all right so this line says and who are citizens of the us right that's what this line added so now i'm not looking for all the humans i'm looking for all the humans who are us citizens but that's not enough i also want them to be journalists so i add a third condition this time i'm asking about the same item about the property occupation that is property 106 happened to remember but i could have looked it up and the value here would be journalists right people whose occupation is journalist or one of whose occupations is journalist i don't remember the q number for journalists so i press control space i start typing here we go person who collects writes and distributes news that's the one i mean journalist now journalism journalist that's the occupation full stop and who are journalists remember this english text i'm adding is for your benefit the query engine absolutely doesn't care all right let's run this query and see what happens queries can take a long time depending on the complexity of the query and on the size of the returned results so wicked data actually knows no less than 13488 american journalists that's why it took a little while and we got a list of these journalists as you can see living and dead right mark twain is included mark twain was among other things a journalist great but we have a list of journalists now i want to reflect for a moment on what we've just learned we haven't just learned how to find american journalists i hope you realize that by changing the q numbers here as we've done from cats to dogs you now know how to find serbian football players or uh south african poets right or any other combination of nationalities and occupations fact you've basically learned how to make a complex query or compound query adding conditions one on top of the other and you could change the p numbers the properties and ask about completely different things instead of asking for say journalists who are u.s citizens i could ask about journalists um born in chicago just as an example let's prove that to ourselves all right so you know what we don't care about the nationality but we want so i removed right the line about the nationality but i do want this person to have been born in chicago so there is a property for that called is it called birthplace oops it's spelled is it called birthplace well not quite it's called place of birth but we could either did the right thing place of birth all right p19 and the place of birth i want is wd colon and i do not remember the q number for chicago but we could either has my back right this one county seat of cook county illinois perfect and let's add a comment on the query this was an easier query because there are fewer journalists born in chicago than american journalists of course right and we got a list of people uh who are journalists and who were born in chicago let's satisfy ourselves on that point let's click one of these at random william h hinton so instance of human that satisfies our first condition and um occupation journalist right here right here that satisfies our second condition and place of birth place of birth here we go place of birth chicago all right so all three of our conditions were met wicked data is working now this already shows us the kind of question it would have been really hard to answer without wicked data right if we walked to even the harvard university reference desk and asked can you give me a list of journalists born in chicago unless the harvard reference desk is already all over query wicked data i don't know how easy that would have been for them to do right the resources standard resources that are at libraries may not give you the ability to answer this kind of question without a lot of paging through whose who or or similar books so we got this answer in under a second 600 milliseconds so i hope i've convinced you that uh a little more than half an hour into our tutorial you already possess the awesome power of combining arbitrary conditions and finding the results you could come up with a list now of people who were born in paris died in moscow who are painters you could run that query now any questions so far no i i was just playing around running queries for rockets already using your awesome power that's already using my awesome power it's quite cool to be where you may be become drunk with power by the time we're done so we our power is boundless we can query for absolutely anything now well almost there are some other useful things to learn for example what if we want what if we want to exclude something from our query so i want all the journalists who were born in chicago uh oh you know what let's let's go with our previous query but leave out those journalists born in chicago okay so let's get back to our previous query by undoing a bunch here we go right the citizens of the us who are journalists but instead of getting all 13 000 whatever it was i want only the u.s journalists who were not born in chicago for some obscure reason the way to do this is actually very simple but i it requires a new command that command is minus minus and minus is followed by a block so we start the block with a curly brace and inside the block we put the undesirable pattern and that's this pattern the the born in chicago pattern and i closed the block right so i'm saying i want only items that match each and every one of these three conditions right they have to be human they have to be citizens of the us they have to be journalists but then i want you to subtract from that collection anything that matches this pattern pattern born in chicago and the fact that it is a block should tell you that i could have added another condition here right i could have excluded i don't know only people who were born in chicago and died in los angeles right let's talk short of that level of arbitrariness and run our query again there's plenty of journalists that takes a few seconds by the way if one of you has tried to run a query for all the humans you will have encountered a query time out that means it took too long to get the result because there are millions and millions of humans on wikidata and wikidata query engine cannot answer that that general a query there shouldn't really be a business use for it right there's never a moment when you want all the humans on wikidata all right so we got 13261 results instead of whatever it was right so so that the 300 or 200 however many was chicago born journalists are now not included in our query so this this is how you exclude people uh or items it doesn't have to be people right i could say i want all the mountains in the world except those in canada um so that's how we exclude things by the way if we're really proud of this query because now it includes some complicated syntax and we want to share it with our grandmother or something we can click on this link here just above the play button there's a link shaped link button if we click it we get a tiny url generated for us on the spot in this tiny url if you send it to someone or post it on twitter and someone pastes it or clicks it will take them directly to the wikidata query system with your query already filled in so that's a way of sharing queries it is also the way to save a query there's no other way there's no save button here um and you don't need to log in or anything so there's no real way to save your work so if you spend a lot of time crafting a very crafty query and you don't want to lose it you need to create a link for it and then save that link somewhere uh you know in an email um on a on a page somewhere um or you could of course copy the entire sparkle text you know and paste that somewhere that's another way of saving the query just remember it won't wait for you here in the query system when you come back next time you will start with an empty page and you will need to reconstruct your sparkle query if you have not saved it remember the old nintendo saying all that is not saved is lost all right so uh we know how to share our query by the way if you're worried your sparkle will um terrify your elderly relatives uh you can also share just the results and that's with this button here by the results this link button here you can click short url to result and this tiny url when shared i'm now pasting it in a separate tab this tiny url will show them the nice wikidega logo and then take them directly to the results without the scary sparkle all right so these are two ways to share your magnificent sparkle queries uh this would be a good moment to check in on the query helper the one i i told you to dismiss uh that's here with the i button here we click this now we see that uh with some slight interface difficulties it is um containing um a representation of my existing query in nice handy uh drop downs so if i wanted to change my query now from american human journalists to let's say brazilian right i i just from the drop down very easily select brazil and from journalist i let's say select politician right just by clicking in the drop down i don't know if you've noticed these numbers have changed in my query so instead of q 155 they are now instead of q 130 they're now q 155 etc and now my query is no longer about american journalists now it is about brazilian politicians uh which there are many so this query may also take a few seconds so this is a nice and friendly way to kind of tweak yeah so now we have a list here of brazilian politicians uh it's a nice way of tweaking the values instead of kind of deleting characters pressing control space and typing over them uh but it is it is a limited thing so it's not it can't write the query um for you um but it's it's nonetheless um useful so let's dismiss this again and we got 7,856 results that's a lot what if i just wanted you know 20 brazilian politicians i just need some examples i'm not really going to go through 7,000 results anyway i can actually tell wikidata query to give me only a limited set of results that's very simply by saying after the where block the query helper has moved things around for me for someone after the where block i can say limit 20 that's it limit 20 uh note the limit has to be outside the where block right after the closing brace of the where block by typing this i run my query and i get just 20 results that is also one of the techniques to reduce the query time if you are encountering a timeout uh just ask for fewer results if that works for you excuse me okay let's move on so we know how to exclude things what about um let's see um instead of politicians let's go with poets poets are great so we want poets and we are still excluding people born in Chicago but that shouldn't be the problem oh because i selected the wrong item so here's a teaching moment for us right i expected to get a list of brazilian poets but i got nothing when that happens to you you need to start examining your query now this is the line i changed this one about the occupation so that is the immediate culprit and if we look at this value you see you can see that the description is song so this is a song called poets because i typed poets and selected the first one that came up uh because i was distracted of course what i meant to do is say occupation poet this one person who writes and publishes poetry that's what i meant to do uh but this could happen to you as well of course right you you thought you were uh picking the right thing and you didn't now that we have picked poets we did indeed get a list of 20 brazilian poets why only 20 because we still have our limit 20 here let's remove this limit and run our query and now we have 11000 uh yes 1100 1132 poets brazilian poets that wiki data knows about always remember the results are to be qualified with well this is what wiki data knows about wiki data will tell you everything it knows and no more that of course can never be taken to mean the number of poets in brazil is 1132 that is of course incorrect just remember that when you interpret the results you got so let's remove this exclusion of people born in chicago and let's restore comments that are deleted by the query helper which is another reason i don't like using it but um so let's remind ourselves this line is occupation poet this line is um citizenship brazil and this line says humans right let's also fix this brazilian poets all right now what if so i have 11 uh 1132 poets what if i want all the brazilian poets or novelists poets or novelists that is actually something we don't yet know how to do we could add a line here of course right that says the item should have occupation p106 with value novelist this one author or writer of a poem right occupation novelist but what would that do that would of course create an and condition right it would look for humans brazilians who are poets and novelists and uh there are some of course if we run this query we get no less than 103 of the 1132 are both poets and novelists and that's great to know that is how i find people who are poets and novelists but if i want not the intersection of poet and novelists but the union right the superset of poets and novelists well then i need some new technique and this this is the keyword union just like in set theory the keyword union and union connects two blocks so we need to to surround the poet line with a block it doesn't have to have the braces don't have to have their own lines this is a little more readable right i want the occupation poet crowd union the occupation novelists right and now how many results do i expect to get i don't know exactly but i definitely expect to get more than 1132 right because that those are just the poets and yes i got 1516 1560 results so some of these people will be novelists and not poets in fact i know exactly how many because we just we just found out uh in the previous query how many are both right um questions about this of four conditions no none okay um all right so are you drunk with power yet not not quite drunk with power but but okay so we need to add some more superpowers but the goal of this talk is definitely to get you drunk with power but remember use it for good and not for evil um okay so we know how to exclude things we know how to or things uh it does look like we can really query for just about anything um what if i want a list of american politicians that's already something we know how to do but what if i want a list of american politicians whose father was also a politician let's think about that for a moment as with every query we need to translate it in our minds or or in writing into wiki data speak all right so let's start a new tab here and by the way instead of remembering the long scary line we can always just start with the cat's query right so that you don't have to remember this line um and let's call this query american politicians whose father was also a right um oh sorry before we do that before we do that let's learn something else uh let's go back to our brazilian poets what if i want um to find out where these brazilian poets and novelists were born i want to find out where they were born not make a condition about it not i'm not looking for only brazilian poets born in south paulo or or elsewhere i just want to know where these 1500 brazilian poets or novelists were born i want more information in the results until now we've been satisfied with just the labels just the names of these people so when i want more results what do i need to change i mean more information per result here in the table you may remember that is the role of the select line right the select line so i can just add the variable i would like to uh output in this case let's call it um let's call it birthplace right just adding another variable starts with a question mark birthplace and if i run my query again low and behold now my results table contains three columns and the third one is indeed birthplace just one problem there's nothing in it it's completely empty all the way through the reason it's completely empty is that we told the query engine i would also like there to be a column with the values inside the birthplace variable but we didn't put anything in the in the birthplace variable to do that we need to somehow mention this variable that we just invented we need to mention it inside our query so how do we do that we include this variable in a statement inside the where block so in this case we want this poet or novelist want this item to have some kind of birthplace remember that was a property right so we can just look for it again birthplace here we go place of birth all right now what what should the value be what should the value of the birthplace be remember i'm not looking for a specific birthplace if i were of course i would just use the q number for that place but i'm not looking for a specific birthplace so just like the first element in these lines where i'm not looking for a specific q id right here i'm not looking for a specific birthplace so i use a variable meaning i start with a question mark the variable i use is the one i declared here above the birthplace variable what this says to the query engine is what this line line number nine in my view what this says is look i want the item to have a p19 property with some value whatever that value is i don't care just stick it into the variable birthplace for me okay that's what i'm asking let's add a comment here um and put the birthplace into a variable let's run our query and wonderful now the birthplace column is no longer empty however awkwardly it includes uh its curable q numbers which is perhaps not what we meant right we probably just wanted to see you know the name of the city or town these people were born in of course i could click through right i could click u174 and discover that it is sau paulo that's great but i don't want to have to do that so the wiki data developers love you and want you to succeed which is why they have created the label service that's the thing this scary line is about um this this the inclusion of this scary line means that we can simply add the word label with a capital l yes it does matter the word label with a capital l to any variable that we define to automatically get the human label instead of the q number if i change my select line not the variable itself but the select line to say don't give me the actual value which is a q number give me the label for the value please i hit play this time i will get the actual names of the birthplaces of these poets and novelists right so this is how we add data to our query by including another variable in the select and making sure some somewhere in my condition block my where block i'm actually putting a value into the variable this is the way to do it any questions on this this is a new level of power for us um yeah just just a quick question so where okay so line nine where you've specified that you want to call the property birthplace is that line will this query result in only showing humans who are both poets and novelists and who have a property for birthplace yes so thank you for this question that was uh if you if you had not asked it i would have asked it myself the the keen among you may have noticed that instead of our 1500 16 novel novelists or poets we now have 928 results we have lost you know five or 600 results and the reason is precisely what you just pointed out the reason is that's what we told the query engine to do it may not have been what we meant to do right we just wanted to see the birthplace but what we've actually said with this line number nine is remember it it says i demand that the item have a p19 property so for those items that do describe novelists or poets from brazil but that don't have a birthplace specified and of course there are such items those items do not meet our criteria our pattern in line number nine is excluding them okay that's not perhaps what we meant right we may have meant you know if there is a birthplace i'd like to see it and if there isn't well you know just just leave it empty right that might have been kind of our intuitive expectation but that is not the query that we ended up writing okay the query we wrote says no there must be a property 19 it's true that i'm not demanding it have at any particular value but it must exist and that as it turns out excluded nearly 600 novelists and poets from my list so maybe that's not what i intended to do right i i do want to include novelist and poets without a birthplace but i also want to show the birthplace for those novelist and poets for whom it is known the way to do this is a new command that we will learn now and that command is you know this pattern it's optional i don't want to make this an exclusionary pattern and the way to express that is the optional keyword that also receives a block like minus and union right so we surround the pattern that we want to be optional with the optional keyword all right and now when we run this query we expect it to return 15 16 well 15 17 maybe someone has just added a poet or novelist as we were speaking because wicked data is a wiki and it changes all the time but certainly we didn't expect there to be less than what we got you know 10 minutes ago so 15 17 result that looks good to me and now you can see if you scroll down these results that a lot of them have the birthplace but here we go sooner or later we find an item i'm highlighting it i don't know how clearly it is seen in the recording this this person right does not have a birthplace and yet appears in our results so this this nuance here of whether a certain line in your query should or should not exclude results is very crucial when you start building your own queries in wiki data you need to be very mindful of what you may have excluded without intending to like in this example of the birthplace does that answer your question yes perfectly all right so we know how to we know how to produce additional data and that's a very important superpower now we're ready for our next challenge which is the american politicians whose father was also a politician let's try to translate that into wiki data terms all right so let's just in a comment here what does that mean in wiki data terms right it means well i want real politicians you know not president barclay so let's start with instance of human right that has to be part of it secondly i want citizenship us right i've seen this before then i want occupation politician that's the easy part that's just a list of american politicians now how do we we express the fact or the condition that the father was a politician well we're not sure that's a good so when you find yourself facing this like you have a question you have a research question or a query that you want to run but you're not sure how to translate that into wiki data terms one way to go about it is to actually browse wiki data and find an item that you know would satisfy your query and then see you know where in this item you can find properties and values to to hang your query on or to express your query with so let's take someone like let's take someone like george w bush right george w bush as we know is an american politician and we also probably still remember that his father was also an american politician right so in this item we expect there to be some kind of expression of that and we could just start browsing but it's very easy actually to find down here there is a property called father simply called father and what is the value of this property well it's an item you know it's a link to wiki data's item about the father george h w bush okay so there is a father property that we can use so let's get back to our query and we say all right so we're looking for humans us citizens occupation politician with a father property with what value what am i looking for here am i looking for all the hue all the us politicians whose father was this person specifically no right look at our problem statement at the first line we're looking for american politicians whose father whoever he may have been was also themselves himself a politician that's what we're looking for we're not looking only for us politicians whose father was specifically person x or specifically person y when we are not looking for a specific value that suggests we need a variable all that variable father right now let's let's pause here and implement this just human politicians that wiki data knows of a father alt before right so let's start with instance of human citizenship usa occupation p106 politician and then we want this same item to have a father property that's p22 with what value oh no specific value just some variable let's call it father who have a father now do i make this optional i don't make this optional because remember my query wants specifically politician fathers so if there is no father property even obviously it doesn't satisfy my conditions so in this case this line will not be optional i do intend to require that the item have a father property now let's also make sure it comes out in the output by including this variable here and running our query this will give us a list of american politicians for whom wiki data knows of a father oh but guess what i forgot to say father label so i got inscrutable qids if i change it to label and run it again i should get human result human readable results so we can see for example that there was an american politician named charles eddison whose father is none other than thomas alva eddison but thomas alva eddison as far as i know was not a politician right we haven't yet filtered for only politician fathers the way we do this this is our next stage of power is we pose a condition like the other conditions we have posed but this time our condition is not about item this is the key now we add a condition about the linked item this one remember our politician will have a father property that links to whoever that father was that father entity that item that we're linked to that's the one we want to also require be a politician so our next line begins with the variable father this is the key here now we are posing a condition and we could pose a whole series of conditions about the linked item this is new power this is something we haven't done before so this father item should also have a p106 occupation property with a value politician which i can copy from above is 82955 right and the comment will say who himself was a politician okay this is this is kind of the the one to move here right number one we put a value into a variable pointing to some other linked item from my original item in the query and then i can now now that i have the item with a name i can refer to well that name can be the item in the pattern to match other properties and values hope that is clear we run this query now we get fewer results of course we don't get mr. Edison anymore but we do get the bushes and we get Lincoln's son for example and lbj's son and Teddy Roosevelt jr etc people we would have expected to find here Roosevelt's etc so this is a nice query to to study political dynasties in America and of course by simply changing q30 to q155 for brazil i am suddenly exploring political dynasties in brazil to the extent as always that wiki data knows about these people and the data is certainly still partial for everywhere but again remember any one of these is tweakable so having spent the time to build this query about political dynasties in the us with with a simple change i can now get political dynasties everywhere in the world i can also get painterly dynasties right by changing occupation politician to occupation painter i can find painters whose fathers were painters um questions about this no okay now this is really the the perhaps the biggest dimension of power that we've added in that we now can really build complicated queries to answer complicated questions at this point i will mention a few other tools and features and then we'll proceed to showcasing rather than teaching some more advanced tools oh but you know what before we do that let's show one more thing here so i went back to the american politicians no longer listening so this query resulted in some 19th century figures right and let's say i don't want that right i want only kind of recent more recent people so to do that to do that i would like to add some kind of additional constraint let's do this okay so let's say we want those politicians i mean the the children right the children of the father politician i want only those born say after 1950 to keep it a little more recent so to do that we can learn another piece of data yes so we want this the child right by the way we're calling the the first item item just because we've been kind of copying it over and over from the cat query but remember it doesn't have to be called that at all we can call this child instead of item as long as we change all the occurrences so we say this is the child and that is the father so this is a little more readable right so we know what refers to what okay and now that we've done that we can pose a further condition and say you know what i want the child's date of birth and of course we have a property for that date of birth here we go p 569 and now again am i looking for a particular date of birth no i'm not right so the moment i say no i'm not looking for a particular date of birth i am looking for it to be in a certain range but i'm not looking for it to have a certain value so because i'm not looking for a particular date of birth let's put it into a variable let's call it dov date of birth all right so the children child should have a date of birth okay let's also add it to our select dov right so that we can see the the dates of birth so now we have 443 results right with these days of birth here on the right remember these are the child's date of birth right so yeah so uh linkin son right was born in 1843 that's the kind of person i want to exclude also lbj's daughter right she was born 1944 i want to only have people born after 1950 so now now that we have the value of the date of birth inside a variable variable dov now we can add another feature and that is a filter a filter and when we say filter only things that meet whatever function we put in the filter will make it so we can say you know what i want that date of birth's year this is a little auxiliary function the function year so that i don't have to deal with the whole day because they don't care about the day in the month right but i want the year to be larger than um 1950 right and i close the parentheses here these are parentheses not curly braces okay only allow children born after 1950 and now instead of 443 results we have 82 results and indeed only people you can see 1951 52 only people born after 1950 again the filter command is extremely powerful you can make all kinds of manipulations with it including you know i want people born in towns that have e as the third letter in their name as as arbitrary as that now you may be saying to yourself well i don't you know i don't know what functions i can put inside the filter you just came up with this year function and how do we know all that and the answer is i understand i sympathize we don't have time in this particular tutorial to go through all of these options but this is a good moment to introduce you to the health resources that are available to you i know i know nobody reads the help that's why you're watching this video but still if you are now convinced that this awesome power is worth exploring utilizing and you will run into cases where you will want to learn some more of these functions under the help button here there is a fine user manual that you could read you know when all else fails there are the example queries the same examples that are available here and by the way you can type here something i'll show this in a minute we'll get back to this you can get help on the sparkle syntax itself right the language itself what command goes where what what goes inside the where what goes outside the where list of prefixes a whole bunch of things there's also a very useful page that i encourage you to use called request a query this is just a wiki page on wiki data where you can request queries that you are not able to produce yourself so under here you will find people saying hey i want to do this thing and the page is watched by old wiki data query hounds people who have a lot of experience with sparkle and wiki data query and whose joy in life is to take up your challenges your human speak query and translate it into a beautiful and highly performant wiki data query so if you have an idea for a query and you don't know how to do it or you have a query that is timing out and you don't know how to reduce it so that it runs within the minute or so that it has to run use this page do not hesitate and request the query likely and usually within 24 to 48 hours you should receive a response with the query help that you need all right so that's the help another thing i will show you about the interface now that we have these results that we're really proud of right recent political dynasties in the us we can download these results over here in the result area we can download it as a json file that's useful for developers or as a tab or comma separated file or as an html table so i can paste it on the wiki so that's very useful and with the remaining time that we have i want to show you some more advanced uses of wiki data that we won't have time to parse line by line but that you can explore on your own or that we may cover in a future training on advanced query so we'll just look at examples and see the results to convince us further of the awesome power of wiki data without interpreting them line by line so let's go back to the examples button and look at for example let's take a sunny topic like death so just by typing some keyword the list which was originally of 400 and let's see 456 examples um we um type something and it filters it automatically to only queries that somehow mention or have something to do with death so let's take um yeah let's take average lifespan by occupation as you can see complicated query that we won't have time to cover using all kinds of functions and things let's just run this query what happens just waiting let's also queue up another query here's use an easier one presidents and their cause of death ranking right so what do us this should say us presidents by the way what do us presidents tend to die of we run this query and hey wiki data query made us an awesome bubble chart i wasn't expecting that well i was you maybe weren't um so actually this is just a query you know the results in the table like all the other queries right in this case you know how many presidents died of stroke how many presidents died of ballistic trauma is the medical term for being shot right etc etc but because of this little line here this comment remember when i told you anything you put in the comments totally doesn't matter well i was lying there is a special kind of comment with a keyword like this that will actually tell the query the wiki data engine do the query and then immediately present it as a bubble chart but even if you didn't include it as long as you have a query that results in some numbers you can always just switch here to bubble chart and you get an awesome bubble chart that you can just uh paste onto your presentation and pretend you have graphic skills um let's go back to the other one this work yes it did here we go so this remember was average lifespan by occupation again an interesting question when you think about it but maybe you didn't occur to you that this is the kind of thing you can get easy answers for at least if you're not in the uh actuarial um line of work but at least based on the data in wiki data which remember always is uh partial incomplete and by the way wiki data covers notable people so that's a bias right there but among the notable people covered in uh wiki data you can uh look at the average age people reach right so poets reach the age 63 on average and of course these are all the poets right going back to ancient times so uh these are not this is not like the the life expectancy of the 21st century poet right but it's just kind of an interesting little query um that is produced with this complicated sparkle but remember uh remember Picasso um don't be ashamed to steal to adapt other people's queries they're all released under cc0 there's no um licensing problem here so you can just take this query and change it to whatever you want you could introduce for example a condition that only takes you to account uh to people born in the 20th century or later uh so even if you don't know how to build a query from the ground up you can always take an existing query and tweak it adapt it to change the country change the language change whatever you need um speaking of changing the language um here's an example right so this president's of uh cause of death of president's query this one doesn't use the big scary line about the service uh wiki label right it produces the labels in a different way that we haven't really gone through but even without understanding how and why this works we see the english language um iso code here and if we wanted to show the same query but use the lyrical uh german terms for these causes of death we just change it to the german iso code de run this query again and what do you know uh most presidents most us presidents uh have have died of schlag unfollow that's lovely to know all right so that's how easy it is to uh leverage wiki data's uh true and profound multi-lingualism you can get the the labels the results uh in any language you want uh going back to our uh what was it poets and novelists query uh from brazil we can look at the labels for those cities in say um hebrew now don't panic hebrew is written from right to left so by just changing the language code to hebrew here what happened well the names of the poets and novelists are shown in hebrew where available and where it's not available we have few numbers and the names of the birthplaces are shown in hebrew where available so Rio de Janeiro is available in hebrew someone has put in a hebrew label for it but whatever this place is does not have a hebrew label now this may not be what i want right if i want hebrew labels where available i still probably want like a fallback maybe to english uh where there is no label instead of the ugly q numbers so i can just add when using this this service right when using this uh magical line i can add a comma here and add en english as the fallback language and run this query again now i get hebrew labels where available where not available i get the english labels right so that's again a very easy way to get the results in the language you need them where available and in a fallback language where not available you can even add multiple commas to to have multiple fallback languages okay back to interesting examples so we saw this awesome bubble chart what about what about maps so from death let's proceed to hospitals uh here's a sample query called map of hospitals very simple query and by running it we get potentially an awesome map already done for us by wicked data query and this map is as good an illustration of our content gaps as any right so we can see that europe and north america are really well covered in terms of documenting the existence of hospitals and their coordinates because of course if wicked data doesn't have the coordinate property for a particular hospital it wouldn't be able to display it on the map so there may be hospitals on wicked data in you know chad for example i mean you know chad has some hospitals presumably right but none of them is both documented on wicked data with a coordinate if you look at south africa for example we see something interesting apparently nobody has documented any of the hospitals in the northern cape although they undoubtedly exist even in kick town only hospitals in kick town itself have been documented here as we consume in right whereas someone has gone to the trouble of documenting what looks to be a good number of hospitals in the eastern cape which only goes and you know this could have been maybe a single afternoon of work just getting a list of hospitals making sure they're all listed in wicked data with their coordinate locations this this shows you wicked data still has huge gaps in content and these queries especially with the visualizations like on a map are a good way of finding out right western europe has a lot of its hospitals documented with coordinates ukraine moldova romania not so much this is something you can fix so again the magic here is let me scroll up this is a the map visualization if you click on table you are shown what the query actually gave gave you which is really just a list of items with the coordinates the moment your results include coordinates you can just switch the visualization to a map what else can we show here there's a there's a fun project called some of all paintings you know like a side project or sub project of the sum of all knowledge that we're working on is to get the sum of all paintings documented and yeah this one map of all the paintings for which we know a location with the count per location so this will also give us results on a map and will also show us the content gap that we have on wicked data but it's worth the wait so in this query we will see points on a map which are essentially locations where paintings are on display museums galleries etc and with each point it will also tell us how many paintings are known to wicked data to be in that location so if we go to Egypt Egypt and what do we know about Egypt well we know that in the Muhammad Mahmood Habibi Museum there is one documented artwork painting according to wicked data i'm sure that museum has more than one painting but wicked data volunteers have not yet documented more than one painting there again something we can work on conversely if we if we go to say Rome Rome in Italy has one or two museums right in for the Vatican see what we have here so this dot for example is the apostolic palace and we know wicked data knows specifically of two artworks there again this means there is an item on wicked data about a painting that knows what collection or what institution that painting is in and further knows the coordinates for that institution there are two of those documented again there's probably many more artworks five artworks from the Nicolene Chapel 25 works from the Sistine Chapel one from St. Peter's etc but there's all kinds of other places in Rome with their own numbers if we go to the Netherlands which so far has been the center of the Sun of all paintings project simply because Dutch volunteers have spearheaded it you will see that some of the museums in the Netherlands although far less famous than the Sistine Chapel have hundreds some of them even more than a thousand paintings on this map that's simply because people have gone to the trouble of putting in that data that the paintings exist that they have that they belong in a certain collection and that puts them on the map so again queries and maps are an excellent way to both explore and demonstrate and show impact on coverage right so if we were to do or maybe Egyptian users were to do some kind of wiki datathon of documenting artworks in the various museums in Cairo they could show you know a before and after an image of this map that would show the impact of their mapping work and the state of wiki data coverage is such at this point in time in 2018 that there's still plenty to do and a lot of low hanging fruit to cover and really improve what wiki data can tell you I return to the quite to the warning when you cite the results that wiki data gave you whether it's about paintings or political dynasties or causes of death remember the context and remember the inherent biases if we look at we look at causes of death overall cause of death ranking work I hope this doesn't time out but if you look at the overall cause of death ranking or you know what we're gonna yeah it does work okay so we have this bubble chart of overall cause of death ranking and of course the major cause of death is myocardial infarction which is medical speak for heart attack followed by pneumonia cancer various cancers etc now something like dengue fever is a huge killer in Asia is severely underrepresented here because of wiki data's bias because wiki data covers notable people and notable people by and large have at least you know in since the early 20th century have access to antibiotics and when you have access to antibiotics you don't tend to die of dengue fever but a lot of people still die of dengue fever because they don't have access to antibiotics but those people are very disproportionately less likely to be represented here on wiki data so this chart lovely though it is cannot be called I mean it is called here for for I don't know brevity overall causes of death ranking but if you were to present it and say this is what people die of in the world you would be mistaken and misleading right this is what the people documented on wiki data whose cause of death was documented on wiki data tend to die of right that is the accurate description of what we're seeing here of course the more people we describe the more causes of death we note on wiki data the less biased this table would be but it will forever be biased so long as wiki data it only covers notable people so I hope that makes the case for how you need to carefully interpret the results you you get from wiki data the results you share from wiki data um I guess I guess I'll leave a couple minutes for questions I will just mention one other thing which is wiki data queries sparkle queries over wiki data are also useful as an intermediate tool in other tools a good example is take these yeah let's take these Brazilian poets query um and go to a tool some of you may know called pet scan you don't know it should know it it's a very powerful tool and deserves its own tutorial session but pet scan is a tool that helps you generate lists of pages on the wiki and if I wanted to get all the english wikipedia articles about the those Brazilian poets and novelists and my the best way I found to get at the Brazilian poets or novelists is through wiki data I can click on the wiki data tab here sorry the other sources tab here and simply paste my sparkle query this is just a text box so it doesn't have any autocomplete or anything like that so you need to kind of build your query over in the query engine but you can paste this query basically as is and use that to uh run your query and then pet scan runs the wiki data query uh to feed the results see I have now I have these 1400 results in pet scan and uh when you have results in pet scan you can do all kinds of amazing things with them for example I could ask for this output to come in wiki format which is kind of awesome because it would essentially generate this for me uh this horrible looking thing is actually a very elegant wiki table I can just take grab this whole thing uh those of you have dealt with wiki tables may know the pain I can just grab this whole piece of text right let's even show it right now go to go to wikipedia or any wiki really right and just edit my sandbox scan example and just paste this monstrous table I have in fact pasted it just took a few seconds because it's a long table and now I can publish it or even just show a preview here we go so you know this generates a nice elegant table uh with the articles and uh data about them from from pet scan with a link to regenerate the table and it's a useful uh tool to use pet scan for if you use wiki data as a maintenance query right use it to to come up with things to do it's really nice to have this option to output it as a wiki table so that then you can just paste it on your editathon page editing workshop page as as a to-do list I do also want to show you very quickly in the um a sample query of this sort that I just described like a maintenance query so on my own user page on wiki data this is my own user page ij on that's my volunteer account there are a bunch of query links at the bottom each of these is a link to a whole page of useful queries I'm going to go to my own examples from a pop I gave in Estonia and uh check out this query for example so this query which we won't go through and explain it does include some things we haven't learned today this query shows the distribution of biography articles by occupations on Estonian wikipedia okay so and it also gives you the labels in Estonian which was confusing so let's put that to English and run this query and within um see without speaking a word of Estonian which statistically most of you do not either we are able when the query ends yes we are able to see the distribution of occupations covered in biographical articles in articles about humans on Estonian wikipedia so we know they write a lot about politicians and writers and actors much more for example than they do about football players okay so that that gives us like an instant x-ray of a certain aspect of Estonian wikipedia without speaking any Estonian right I already have this picture of how things are I could add here a line that asks about the gender of the person setting it to female so that I can see the distribution of coverage of occupations among women where we can see maybe some kind of gap I could of course change this line from et Estonian to say af africans and get the same results about the africans wikipedia so now I can also compare that the preoccupations of of certain wikipedia right are they more or less interested in in certain in certain professions right so on africans they also write a lot about politicians writers and actors and then football players etc etc right in some other wikis it would be a quite different distribution so wikipedia is also useful in this way to to make all kinds of queries like that and those of you have heard about the gender gap tracking project gender gap index it is also using wikipedia to reach its results so we'll take a few minutes for questions I hope I have at least convinced you that wikipedia is powerful that you are powerful when you use wikipedia query and that there's plenty more to learn and dig into questions no I think that's it for myself nothing for me okay those of you watching this recording if you have questions I remind you of the help button which will take you to a help a help portal where you can just ask questions I remind you of the request a query page wiki data in general is a friendly project it's a new project it understands that most of us are also new at wiki data or new at link data in general certainly new in sparkle so don't hesitate don't be shy just ask and you'll receive help thank you I remind you use your new found power for good enough for evil with great power comes great responsibility etc etc thank you for your attention