 speaker, he's Lucas, he's a Sparkle magician I'm told and he will introduce you to his favorite querying language Sparkle and give you a little introduction and in the second part he will do some live coding which is always really interesting and funny and you can give him some things that he's querying for you and I'm sure will have lots of fun and interesting learning stuff here. So give a warm round of applause to Lucas. Is this better? Aha, it's a bit too loud so I'll just talk a bit until they have figured it out. Yeah so this is going to be kind of two parts but not really that separate but in the second part I'm basically going to write the queries that you suggest so if you see what I'm going to do here and then think oh I have a great idea for something we could perhaps query then just remember that and we'll get back to that hopefully because otherwise the second half is going to be really short if I don't get any ideas from you. But yeah so this is about querying linked data which allows you to do all kinds of crazy things and answer all kinds of crazy questions such as I think I had on the slide something like what are the largest cities with a female mayor and if you wanted to find that out traditionally you could like go through Wikipedia and try to find all the largest cities and see which ones have a female mayor and which ones don't or perhaps there's a category with all the cities with a female mayor but then you have to sort them by population and it's a whole mess and with linked data you can find that out much more easily and also all kinds of other things but let's start with some simple fantasy linked data so this is a tiny snippet of linked data some data graph it's just composed of a load of nodes which are these ovals and rectangles here and they're connected with arrows and each of these forms kind of a triple consisting of the start node and then the arrow and then the end node and that's how we represent all the information we have in the in this linked database so for example we can read this as this talk right now happens in Eszema or the dining room which is the name of this stage here and it's going to be followed by the live querying session which also happens in Eszema and the live querying session in turn follows this talk again and the Eszema the dining room is next to the kitchen the kitchen and the kitchen is next to the dining room again and both of them are part of the wikipakavige which is part of 363 and the talk happens right now and at the same time there's also some talk about how state elections are climate elections or something in the cows west stage starts at the same time cows west stage is part of the cows west assembly which is part of 363 as well and so this graph has a few important properties for example there's some redundant connections here you could see you could say if this talk is followed by the live querying then you don't really need to know that live querying follows this talk it's kind of redundant information you already know it but it doesn't hurt to have it and it often makes your life easier if you have a little bit of redundancy in your graph and then if you find that one half of this connection is missing for example you can still investigate what's going on and also in here we have a kind of bi-directional connection so Eszema is next to kitchen which is next to Eszema but this is two separate arrows and it could also be that only one of them is there so you don't have arrows which go into in both directions at once in this data model it has to be if you want something like this you have to have two separate arrows because that keeps the data model very simple you just have subject predicate object and that's everything you have and then to query this graph you kind of select a tiny part of it and then you remove some part that you don't know about for example we know that this talk is followed by live querying and if we remove the live querying part then we can ask something like okay I did the other way around nevermind this way this talk is followed by which talk and then you have a question but because you've left out this part and then if you ask this question to a query service it can kind of you can think of this like a down I only know the German word for this one a template so you put this over the graph and this has to match the existing node this has to match the existing arrow and then you see which nodes can you put in here and in this case that's only the live querying or the other way around which talk follows this one so you can have the beginning of the triple can be a variable like this one or the end of the triple can be a variable like in this case and you can also have more complicated patterns like no there's not a more complicated pattern this is the same pattern you have the question which talk happens in estimate and you have two answers this talk happens in estimate and live querying happens in estimate but you can also combine more graph nodes like this for example which talk happens in some room which is part of the Wikipedia so we have one free part here and one free part here but we know that these two have to be connected with happens in and then this has to be connected with is part of to the Wikipedia and you can kind of construct if you can phrase your question as a kind of graph like this where some parts are predetermined that you already know about and the other parts that you want to find those are these kind of variables which are here indicated with dashed lines then you can ask that question to the graph and find the matching results in this case you have these two matches this talk happens in estimate as part of Wikipedia and live querying happens in estimate is part of Wikipedia and then if you if we had more information in this graph here we might also have other rooms for example there's this library over there which also is going to have some talk if we had the whole schedule in here we would find those as well and we could also adapt the query so that we don't even make the Wikipedia part fixed we could ask for anything that happens in 33 C3 so that would be some variable happens in some room is part of some assembly is part of 36 C3 and then we would find this thing as well because if it's the same kind of pattern happens in is part of is part of 36 C3 does that make sense hopefully I'm seeing a lot of nodding heads okay that's great so then we can try to move ahead to actually ask some of these questions to a real queries system because in reality you're not going to actually draw these graphs but you have some kind of language where you phrase them instead which looks a bit like this so you have the part select anything where this kind of like SQL and then everything else is not like SQL forget SQL I hear this is easier to understand if you don't know SQL I didn't know SQL that much when I learned sparkle and I think it helped me apparently but what you write down here is these is this kind of description of the graph and these dashed parts which are the variables which you don't know those are marked with a question mark because that's kind of what you use to ask a question in this case I've just called it talk but it could be any name basically and then instead of happens in as two words I've just written happens in as one and then with the prefix 36 C3 and it happens in the 36 C3 S Simmer because I don't really have a separate dining room at home but a lot of people do so if we just wrote it happens in S Simmer that would be pretty ambiguous and no one would know which which dining room you're talking about and by adding this prefix we know we're talking about just the dining room in this at 30 36 C3 I think I assume there's no other assembly that has something called the dining room if it does then we would have to add something else here to make it clear and I've used the same prefix for happens in to make it clear which kind of happens in relation we're talking about that it's one specific to Congress events and then you could ask this to a career service which has this example graph in it and you might get the response that it's these two talks and at the end you have this period here because if you read the whole thing it's kind of like a sentence again because the talk happens in S Simmer and if you have two sentences then you have two periods so the talk happens in some room and this room is part of the wikipaka wiki and because we've used the same variable name here and down here this has to be the same room and it couldn't just be two different things so if we use two different variable names here room and something else then we would just get all the combinations of talks happening somewhere and rules being part of wikipaka wiki without them being connected anyway but because they use the same variable name they have to be connected like this and then you would get these results we've seen earlier what you can also do is leave out the room so when I translate this into English I could say the talk happens in the room and the room is part of wikipaka wiki but I could also say the talk happens in some room which is part of the wikipaka wiki as kind of a I don't know what that's called in English kind of a relative sentence sub something clause where we don't really talk about the room in itself just as a part of this larger sentence and you can write that in sparkle as well and then it looks like this and these square brackets kind of describe what the room looks like without giving it a name so in this case you can only select the talk up here and we don't have a room variable but if you don't care about what the room is and that can be very useful I've also changed something else here I've replaced the 36 c3 in is part of with schema which is another prefix and schema is kind of this collection of useful prefixes and other notes that you can reuse for example if you're describing things you have on your website you might say you have an article with a schema title and a schema publication date so this was mainly introduced by Google and some other search engines but we can use the same vocabulary to talk about our talks because is part of is one of these standard terms we can use for that and what else do I have okay the next thing I have is actual queries so I think I'm just going to I'm almost going to switch to wiki data so I should talk a bit about wiki data so all these examples here were just on some example graph which I made up here and through on a slide with a lot of probably over-engineered tick-sets later is magic which I shouldn't have wasted that much time about but it looks nice and but if we want to write real queries we could load this thing into a query service but it wouldn't be that interesting because it's kind of small but there are a lot of real data graphs out there that you can query with this query language sparkle and one of the coolest ones at least in my opinion is called wiki data or wiki data there's some kind of discussion about how it's pronounced and it's kind of a free database of anything that's relevant and it's part of the same family project as wikipedia and wiki media commons and other things and it's also maintained by the same community of volunteers and you can find all kinds of really interesting and cool and funny data there so all these example queries which I have here we're just going to ask to um wiki data but first um I will just give you one or two minutes to try to imagine what this question would look like either in the graph format or in the sparkle format just try to figure out how you would formulate which software is written in bash as a kind of this kind of graph query and then we can see what we can come up with so I didn't think this through I need some waiting loop music now does anyone have a kind of idea of what the graph looks like because I'm going to uncover it now and then you can compare if it looks the same way so they would look like this at least using the wiki data terminology so instead of is written in the property is called probing programming language and this could also this could be called bash or born again shell or new bash or something um doesn't really matter and in sparkle it looks like this which is a lot less than what we can do then so wiki data is multi-lingual and that means that we instead of programming language say p277 and I think that's wonderful but that's a ownership id and we can look at what it's called in German or English or in any other language when we go to wiki data.org I hope that's readable and then we write property double point p277 and that is the ownership of English programming language in German is called programming language and there's a lot of other languages that means you can use wiki data in any remaining language that's very nice I could show this side to another language and then all that looks different but the disadvantage is that this sparkle question is not quite readable because we all have to use these numeric ids but we have to remember to write this query select this question and we have select anywhere and then we have a software and it has the programming language basch basch is a wiki data item so we use it as a vd as prefix and if I now press control key or press max key then search it for basch and show me these suggestions and I can choose one in this case only basch and when I move the mouse cursor on it then show me what this id means that means on the website query dot wiki data dot org you can see at least what these ids mean that it takes a little bit longer to use for this part of the code we use a slightly different prefix like t, t stands for truthy so var and then we press again what is the author and make the let me ask this question with the control enter and get this list of ids that here I don't know maybe you want to guess what it is also package a package that is written in basch I have never heard of it so here you can see all these statements and programming language basch is what we looked for unfortunately so this is not a very useful list that is stupidly not particularly useful one thing that we can do in wiki data service and the relatively big data which is specific is to use the label that is magic that you don't have to understand and who writes the service and press the control key again and then it hits you that here and you keep that in your query all the time and then you just keep that in your query and then also the software label says that you don't just want the software but also the software label and then you get the label of the software and I can also ask for a description so software description and that makes the results much more usable and then I can rename it to item and then we can change the kind of question preferably and the variable name changes with the next part of the question will no longer be with software but the name software is a bit confusing all these programs are also in basch I have several more examples here which are kind of a few more examples that are quite simple should I skip ahead or should I skip ahead or should I skip ahead is that fine? Good who was born at sea you know there's a special variable for but that only gives five results because most then have in the then Where does neverending tripel take place? Well, this is interesting because in this case, the variable is in the last part of the question and not in the first part. Then we write the neverending story as the first part. Then the place of the handling. And then the item, the end of the tripel. That works very similar, but most of them don't have an English label. That means we add German as a fullback word. And then we get all these places that someone has added to Wikipedia. Let's see if there is any useful information about it. These are all very similar. They will be added in about the same time. And we find information about where this functional place is. And let's talk about a more interesting question. What is this? Which babes had a child? What is the graph going to do? How many triples are we going to have? How many triples are we going to have? How many triples are we going to have? Who thinks you need zero triples? Who thinks we need one triple? Two triples? That's more people. Mostly two, but some people think one. The people who think it might be one triple perhaps are thinking of something like Pope, which is a leader of a Catholic church has a child. That's not going to happen. You don't get any results. Perhaps the item has a father. The item has a father named Papst, but that won't work because the children are not directly connected with the Pope's office, but it goes over two levels. One is the father of a certain person and the other is Papst's office. In the graph it looks like this. A child has the father Papst and the Papst has the father of a Papst. That's an example of the redundancy. We have these two directions, but we can't really make that much of a difference in both directions. Let's try one of them. The item does not have Pope as father, it has someone as Pope, and then Papst as father, and this Papst has a position held, so a clothed office, Papst. Let's add a Papst label and then we get 24 results. We have a Duke of Parma, the son of Paul III. Alexander VI was very busy and some of them just have duplicates. I'm going to show you how my digital data works. Someone has a lot of information on this peerage database, and that's why we have duplicates, but let's just leave them alone. It looks very similar. Giovanni Borgia had two children with the same age, but in 1470's, we have a birth rate in the 1470's, and here in 1498's, there are probably different children. Wait a minute, that's a pope that is the child of another Papst. Three Papsts are the children, So we have to add that the item itself has to be accompanied by the Pope. So the item should be the father Pope, and the item should have the Pope, and the Pope should have the Pope, and the Pope should have the Pope. Now maybe it would be enough to call it the variable in children, I'm actually going to speak to the children, but I can future-proof this a little bit, because we are now going to have a mother should be a pope, but in case there's ever a female pope, let's just switch it over. Because we currently only have male popes, but if there's ever a pope to be given, then we switch it over and say that the child, so we don't ask after the father, but after the child. We save that and make a new tap-off for other examples of questions. Okay, now which Microsoft software is on Linux? Can you have it on Linux? Okay, yes, it's not as funny as it used to be and that's what it looks like. What are the music pieces for the organ and the orchestra? Composition, that's practically the music piece. That has the property instrument organ and the property instrument orchestra and I'm writing that now practically. And also composition and organ and also I forgot to add the label service. If you want to listen to that, we can also see if we can listen to the comments, so Wikipedia comments, and that's just one in that case. But what's a little bit annoying here is that I had to repeat the item and the property that I had to repeat, and what I can do instead. And you can do that in the previous case. I haven't written the software twice. And if I use a semicolon instead of a point, then it's basically in English, in German, a UND, so software that develops Microsoft and the operating system of Linux. And here you can do that exactly and with a comma, so to speak, that you have a comma. And that's enough, so to speak, to add a comma and then add the two values. And that has exactly the same results and it's just easier to read and write. In any case, to write, hopefully, to read. But the comma is actually not used so much, the semicolon is used relatively much. So when we go back to our parse question from before, we can somehow delete the parse. It means exactly the same thing, but you can see that the two are on the same parse, so to speak. Okay. Then we have that, that's not funny, but there are many people who once were in the NSDAP and then actually came into normal life and got the Bundesverdienstreuz der BRD. And you can find these. I've done that with three triples now. So a person who is a member of a political party, a person who is somehow a human being, so actually exists, and a person who got this prize, the Bundesverdienstkreuz. So because there are also a lot of non-true people in Wikipedia from books or something like that and they don't interest us in this case. Yes, to add that actually costs us nothing, because the parse service is very well optimized. Okay, now I'm just writing this. So a person who somehow is a human being and a member of the political party NSDAP, so National Socialist German Worker Party, and I can just search for the German abbreviation in this case and the one who won the prize, the Bundesverdienstkreuz. So I can just put in the German abbreviation again. And I think there are eleven results now. And that's actually not really right. Because you don't get the whole Bundesverdienstkreuz, but there are different degrees of merit. So there are eleven different degrees. And theoretically, each of these eleven people should have one of these. But actually it's only the ones who really have the Bundesverdienstkreuz. Now we can see if we are looking for something right. So we're looking for something like a Bundesverdienstkreuz. We use part of, so to speak. And now let's see if the results are produced. Yes, a lot of results are produced. A lot of results. That's depressing. That's depressing. Last time I didn't find any results. I just did something wrong, but this time we find a lot more results. So if we don't look for the exact Bundesverdienstkreuz, because it actually doesn't interest us what kind of merit we get, whether it's first grade, second grade or third grade, but just some kind of merit, then we can shorten the whole thing a little bit. And then we can actually do that with a slash. And that's the way you go into this graph. And then we get the 802 results. And some very well-known people. And if we get the 11 people who actually got the Bundesverdienstkreuz directly, then we can do that too. Or they can receive some award, which is part of the order of merit. And then we should get 813. And then we should get 813. So 802 plus 11 from before. And I'm starting this with the instance of human. And I start with kind of human. And the service, so to speak, would be to order it first. So if we would first look for all the people and then look at who was in the party, that wouldn't be so efficient. Instead, he's looking for what has the most results and then does what has the least results first. And that makes it much more efficient. I think I have another example for a complicated question. The biggest cities, in terms of the population, which have a female mayor, so this is the graph for it. This is a city. This city is kind of an instance of a city, so to speak, so there is definitely a city. It has a population and it has a mayor or how you call it the head of government. That's the head of government, maybe you don't call it the head of government at all, but we can look for someone who is the head of government and who is a person. The mayor should be a person and she should be female. The question for female shouldn't be there. That's a mistake, I'm sorry. Okay, now I'll write it down again. We're looking for the mayor and the head of government and this city has a mayor and the mayor is a city. It has a population and it has a head of government and this head of government is a person, a real person and is female and now we want to have the city as a label and maybe for the population. We know 83 people in Wikidata, cities with female mayor but Wikidata is not as important and not as strict as in Wikipedia and if we want to increase the number, we can say all by desk descending from population, so the population number and that's what we want, the only ten biggest cities, so to speak. Now we look at Tokyo, Hong Kong, Baghdad, Surabaya and Karakas has two mayors, Zwei Bürgermeister so we're only supposed to get the current mayor Does anyone know which one is the current one? Or we could just check Wikipedia Who is the current Bürgermeister? Now let's look at Wikipedia Hopefully Wikipedia doesn't get the data from Wikidata and the current mayor is Karolina Chisdadi who isn't there. Now we add someone new who doesn't have an item yet Now we have to find out if Karolina Chisdadi is actually the Bürgermeister That's all kind of unclear Doesn't she have Wikipedia article? No She only has a list and she's not on Wikidata yet Now we edit the data live Which country is Venezuela? It sounds like a female name so I'm just going to guess and check that after the talk So she's a human and gender is female and that is going to be our period Do this search again, there we go and set this to preferred rank So that's how the cruise service knows that this is current value So that's our approach to the result and then the demand service knows that this is our popular value and then all the other data that we can use if we want to but it's not the main value and you don't get it in a simple question It's a kind of political territory entity and it should have a start time I don't care about it But it doesn't matter and hopefully we get just one result No And hopefully we get just one result so the cruise service is up to date So that's kind of I think it just keeps watching for changes So it will get there I'm looking for changes. I might take a few minutes. Okay, that's how that works. Does that make sense? Okay. Does that make sense? Okay. Yeah, I think this is almost exactly what I'm looking for. Yeah, I think this is almost exactly what I'm looking for. There are some labels in the label service. There are some labels in the label service. I happen to know that Mexico City is a very large city with a population of almost 9 million. So we're right after Tokyo in front of Hong Kong and the head of government is a Claudia Sheenbaum. So I'm sure you guys have the same idea. Claudia Sheenbaum is that Mexico City is an instance of big cities for instance of cities. The reason why we didn't get this is that Mexico City is classified as a big city. I mean, other languages are big cities. I mean, other languages are big cities. I mean, other languages are big cities. I mean, other languages are big cities. And what we know is that big cities are a subclass of cities and big cities are a subclass of cities. Okay. Actually, we should arrive at the city somewhere, but we don't do that now. That means, if we say, please follow the subclass connection, then he would come there. So one way to do that is that it should be an instance of something that is subclass of cities. Something that is subclass of cities. But we have other cities, I think, which we used to have. Yeah. We have other cities now lost because that's just an instance of city directly. They are directly an instance of cities. Also, or even better, what we should do is make it optional or it can, or just as many as you want as standard, just like in a regular expression, like in a regular expression. And then we can say, we just want certain ways. And now you can, either that is an instance of city or an instance of an subclass of city or an instance of a son, no matter how many subclasses of city. And now we get all cities. So it is important to look at what kind of results there are and what kind of different characteristics there are before you ask these questions. You have to look at, are the results plausible? Ideally, you already know a little bit about the results and then you can know, oh, something is missing here. Why is something missing here? OK. If we are not interested in who the mayor is, then you can shorten it again. And you can also make these arrangements after a variable that you don't, so to speak, choose that is not given back. OK. That was my last slide. So if you want to have more requests, then you can just do it. There are super many requests that are on Wikidata. It is so big that it is no longer possible that we have to make new pages and so on. But I still have a few additional requests that I thought about, so for example, which films had, or in which films, more than a future head of government or a future head of government played with. And that does not fit all on a film. And what is important is that you have a film, an instance of a film, and it somehow has a publication date and someone who played with it, the head of government, and this person was once a head of government and that should be published after this film became a head of government. And you get, for example, Arnold Schwarzenegger and another actor who became a head of government in the USA. And you get people, some films, who were so future French head of government. So the question is, who are the players of the house, but they somehow played their own role. Yes, now I'm going to ask this question. It seems to be busy. You get 60 seconds in this question service and if it doesn't work, then the question is interrupted. There are a few examples here, for example Shaldi Goh and George Vido. Somehow he was prime minister. We have some Indian films and down below we have a few Canadian politicians and then we have Arnold Schwarzenegger and Jesse Ventura, who both became governors and the other, we have a lot of data about the British government, about the British parliament, because a lot of free information was added. I think we actually have all the parliaments and also the parties, where they are for at least 100 years and even data from before. For example, you can see how many people who are called John are in this parliament and how many John and how many women and when there were more women than men who are called John. It takes a while as well, this question. I hope it doesn't take 50 seconds, but it looks like the question service is currently occupied. I think it was relatively late, so 1991. Everything we saw right now was just a lot of tables, but you can also show results in different ways. You can also show the results differently, for example, as a graph. In 1992 there was the first parliament in which more women than John came. You can see that John became less and less and more and more women. Now there are 220 women, but we don't know how many percent of the women are already in there. So from the whole parliament. So the query looks like this. So this one is broken into several parts. So the question looks like this. There are different parts. The members of the parliament should be people again. Then they have the position that they are part of the chamber of the parliament. And then we get all these ministers, all these members of parliament. And then we look for those who are John and then we look for those who are only female. Currently the data model is that there is a separate item for transgender people. And there is still an optional subclass. The model right now is that trans women have a special subclass of women. We should have it in the long term. We should be able to change that so that trans women are part of women. And maybe they are also trans. But at the moment you still have to ask this complicated question. And we have exactly 100 years of history here. And we can now look at this as a bar chart. But actually this graph is the best in this case. And I think I have talked for 50 minutes now. This is the point where we start to ask live questions. And I was told that I should now ask a short question. So let's take a 10 minute break and then we start the second part at 3 o'clock. Is that okay or is that too long? If you stay here then please think of a few questions that we could write and that I can write. Otherwise I have nothing to do. Thank you so much. Thank you very much.