 Hi everyone, so next up we have Lucas, the picky data life query, and let's just hand it over and get started. All right, thank you. Yeah, I am basically looking for suggestions from you for what kind of queries we should write here. I shared this either I've had already before. I'll just paste the link again in the Hackathon channel on Telegram. So if you have any ideas, you can put them here. And one thing we can start with is the data challenge ideas from yesterday, which were great. And maybe if we can start with someone picked out this one, sure, a list of all the COVID-19 vaccines and include the developers of those vaccines. And so I started this query by looking for, I've remembered that one of the most called Covaxin. So I looked for that on the search and just look what kind of statements does that have? Is that an instance of vaccine? Is it an instance of something else? It's a subclass of a COVID-19 vaccine. It's also an instance of a vaccine type. Most importantly, it has this probability vaccine for COVID-19, which is great. So I decided I'm going to ignore all the instance or subclass of statements and just say it's a vaccine for COVID-19. That's how we find them. So to list start by listing all the vaccines, we would do select anything where the vaccine is a vaccine for and then with control space. If I can type control space gives me the probability ID and we say it should be a vaccine for COVID-19. And if we add the label service and then select not just the vaccine, but also the vaccine label, we get a list of 46 results, which is pretty good. I didn't know there were that many. I assume some of them might be experimental or something. Or ones that didn't work out that failed the clinical trials or something. This is a candidate vaccine. This is also a candidate experimental. Yeah, but I think we're interested in all of those. And then the question was to include the developers of these vaccines. I should zoom in here a bit so we can read it. And we can get the developers as another statement. It's right down here. Developer is the property. So I assume that's the right one and we don't want manufacturer or something but the developer. So we add developer and select the developer label. The issue is now we get a lot of results and they could in theory be in any order. And I first thought we could order by vaccine to ensure that the same vaccine is always together. But then we can also combine them in a better way, which is to use something called group concat. So we say we group by the vaccine and the vaccine label, which means we get one result per vaccine and then we combine all those developer labels, which we do by saying group concat with a custom separator, maybe a semicolon, as developers call it developers maybe. And now the issue is this is actually empty because the label service is quirky. So we have to tell it that we're interested in those labels. So that's vaccine, RDFS label, vaccine label. And more importantly, developer, RDFS label, developer label. Because once the developer label is hidden in the group concat here, the label service can't find it automatically anymore. So we need to tell the label service we would like this label. And now we get this one comes from the National Institute on Algae and Infectious Diseases from MIDI Data Solutions and from Moderna. This one just has one developer. This one has two and so on. So that's how you can use group concats to combine those developers. But someone was also asking, let's close this, how to concatenate the vaccine developers into a single value. We've just done that. And then is it possible to show the countries the developers are from on a map? And I'm sure we can do that. So in that case, I guess we would actually discard the, we would stop selecting the vaccine and instead get the developer with label and also the country label. Remove the standard label service again, remove the grouping. And the developer should have country, country. That's just a list of countries. And now we want them on a map somehow. And I guess one option is that the country should have a geo shape. And we can select that shape and set the default view to a map. And now it will load the shapes from commons. And then in a moment when it's done loading, we should see them on a map. But we will probably see some overlapping shapes which might not look very useful. This isn't working for some reason. Let's just try that again. There we go. So those are the countries that have developed COVID-19 vaccines. This one is shaded a bit differently. I'm not sure why. But maybe one thing we could think about would be to show the number of different vaccines so that it's not just overlaying areas. So in that case we would again group by the country, country label. You would have the shape and then count distinct vaccine as vaccines or call it the layer maybe. And then we should see, oops, and we need to group by country, country label, shape. Group by everything that doesn't have an accurate function like the count. And then we have from one, that's orange to, I think my screen just froze. Two in red, three in blue. Four in pink. India has four vaccines apparently. And also the other countries down there that I don't recognize as well. But that's still India. But this one is, that all India thought some of those were going to other countries. Maybe not. And then five. The United Kingdom is participating in five apparently, seven China and then 15 the United States. And if we wanted to be, do all countries have shapes on counts already? As far as I know, yes. I think possible, I'm not sure. I think when I checked this two years ago or so there might have been some confusion where the Netherlands had a geo shape but the country, the kingdom of the Netherlands didn't have one. But I think apart from that, all the UN member states had a geo shape already two years ago or so. But we can also try that out. I have a query for UN member states because you need it pretty often. And let's find exists state geo shape as shape, as has shape and select this. And two countries or two UN member states do not have a geo shape which is the Danish realm and the kingdom of the Netherlands. But then you get into the question, what is the country of a, so the P17 of a company in the Netherlands anyways? Do people set Kingdom of the Netherlands as a country or do they set Netherlands as a country? Because it must be down here somewhere, right? Netherlands I'm pretty sure has a geo shape. Just the kingdom doesn't. Yeah, this one does have one and the Danish realm I assume Denmark as a country also has a geo shape. Yeah, there it is. So it's only those I guess wider states that don't have geo shapes out of the UN member states because complications. What's going on in the ground? Nothing related. So this, I guess, should I put this query somewhere already? This is a map of countries developing COVID-19 vaccines by a number of vaccines. And if we wanted to get really fancy we could try to color this better because you can control the color. You can say something like 0FFDD as RGB. And then all the layers are shown in this color. Apparently I picked some kind of pastel blue or cyan. And so if we got really fancy we could select the right RBG RGB according to the number of vaccines so that we would get a nice scale that way. But that's pretty complicated and I don't think I want to do that right now. So having it like this with a legend over here is probably good enough. Let me just make a short URL for that and drop that in the either pad. I did not mean to make that into a new line. To experimental possibly expand the data challenge tasks related to maps to fetch geometries from OpenStreetMap using SoFox. Or plot all the values of a datetime type on a timeline. Do you have some more details for that? I have to write in here because the YouTube stream apparently has some 36th delay or something. So if I write in the either pad it's probably going to go faster. For a single item. That could be something like the population of Berlin. Select date, population where Berlin is Q64 has the population. No, timeline, not the values on a certain time. I was thinking of something else. What would be a good example for that? We can also change this into times at which the population of Berlin is known. So that's PQ point in time. And we actually ignore the population value and default view timeline. And then we get it well. Okay, I think we need the value after all. PSP1082, population, add that as well. And now we know why do we not see it in the table. Oh, because I called it time here and date up there. That's why. There we go. So that's a timeline of all the times where we know the population of Berlin. There's some big gaps here. And then towards the modern day it gets pretty crowded. But maybe there was something else a biography would do of any property. Right. I missed that part. I forgot about it for Q42. So let's try that. Sure. Timeline of Q42. Q42, any property predicate time and property has wikibase claim p. So this p is then something like pp31. And then other property has the statement property or the wikibase qualifier predicate. And this can be something like psp31 or pqp31. And we select the time and let's say the predicate so far and make that a timeline. And then we get mainly qualifiers. But there's also some main statements. Yeah. So maybe let's turn that into a two-part query. Let's make that a union. So we start with some ps time where the same property wikibase claim p and wikibase statement property ps union a case where Q42 ps or pq time where property wikibase claim p and other property wikibase qualifier pq. And then we need to create a nice label out of that. Let's still include a value here. Do it like this. And then let's do something like this for now. Bind main statement as kind and over here bind qualifier as kind and then select the property label and the kind. And also add a label service. What does this look like? We get date of birth, date of death. And where did all the other results go? Select table and all the other ones. The time is, oh, we should limit the other property to be wikibase data property type wikibase time. The same goes for the property up here, by the way. Okay, now we only get two results, which probably means that it's something wrong here because I assume Douglas Adams has some time qualifier somewhere such as where he was educated or something, right? So we would expect to see qualifiers such as the spouse or the child. Why is the date of birth a qualifier here? Novelist start time. So we would expect to see those. Why do we not see them? Any property with a value and then pq. Because I wrote quality liar instead of qualifier. That's all. That's the whole reason. Now the query is being pretty slow, which is strange. Let me just catch up on telegram in the meantime. Yeah, okay, someone noticed it already in the YouTube chat, but I did not see that in time, sorry. And now someone noticed it in telegram as well. Why is this being so slow? Yes, pq. It's probably running something in the wrong order. Did I not bind some property the right way? No, this one is definitely... Let's try disabling the optimizer. That might help in this case because we want the query service to run all of this forward. No, no, if it's still not returning quickly, then I probably did something wrong. But this part worked. So it would be something in this part. Let's put that into a separate query. Select anywhere. Limit a thousand. PSLUPQ. No, I think that should work. Let's see why it's not working. Let's open another tab. Make the limit one even and also hint. Maybe a hint. Prior hint gearing forward helps. No, that is query hint exception. Blah, blah, blah. State and pattern note. No, that doesn't work. That is not a pattern where we are allowed to use the gearing hint. But it doesn't work. Why doesn't it work? Let me just check that I use the right variable names everywhere. PSLUPQ. Property and then other property, but that's intentional. The statement and the qualifier can come from different properties, in this case. Let's try to have this in a separate query again. Remove this part, maybe. Also limit one. I do not understand. It worked earlier, didn't it? We saw some more values on the timeline. Let's reload this and remove both of these lines. In fact, let's remove this one as well. This should definitely work. To remove the limit, we'll get some thousands results. Twenty thousand results, but still that's pretty fast. Now we restrict it to this. That took half a second for a thousand results. The P is P108 and the time is whatever. Then we say that some other property should have the qualifier PQ. That still works. Then if we add this property type time. If that's the expensive part, then we can work around this and instead say filter data type of time equals XSD date time. That's an option. It still takes a while here, but I'll reload this. Does it help here? Now it returns in one second. This one still doesn't work. Maybe because I still have the PQ time here. Let's just reload this. Also stop disabling the optimizer. And add the same filter data type of time equals XSD date time. There we go. 300 milliseconds. Why is the PQ service do that? I don't know. Also my screen is frozen. No, it's not frozen. The scroll wheel just doesn't want to work for some reason. There we go. I thought the wikibase time here might be less of a problem, which is why I only added it down here or replaced it down here first. Apparently this works. Select property where property type time. 56 properties is not that many. Apparently the PQ service picks the completely wrong direction with that. What if I add prior hint gearing forward? Does that help? This needs a semi-conon. No, but if I add, if I write it other property with a dot instead of a semi-conon. No. What's wrong? Statement pattern node. Just forwards. Then I don't know how the gearing hint works anymore. We have something anyways. The results look like there's the date of birth statement. It probably makes sense to put the kind in parentheses, I think. Let's distinguish it a bit from the other things. What we don't select here at the moment is the value if there is any. Let's add the value label. Then we see them in a weird order, but we see adders number 10. The point in time of that qualifier is 13 April 2017. I don't remember if there's any way to affect the order of these. That's kind of... Okay, and then add some fortunate, of course, if the value is actually not an item, then using value label is a bad idea. But, yeah, we have something. Let's create a link for that and put it in the ether patch. And then close Douglas Adams. Oh, I still had the query with the timeline where it worked. Too late now. Okay, do we have any other suggestions for queries? What did I miss in the chat? Can you share that query? I put it in the ether patch now, which is also linked in the schedule page. Expand the data challenge tasks related to maps to fetch geometries from OpenStreetMap. How much time do we have left, half an hour? Not... Does anyone have ideas for a specific data challenge tasks related to maps? Let's just go through the tabs. I still have to open all the water bodies. Nope. List of all the rivers that end in the Mediterranean and rank them in descending order of length. We could try that on Sofox, maybe. So, end in the Mediterranean, get that bit from Wikidata in Sofox. So, there's the user interface for the career service. I need to start with... Oh, good. There are no examples. No, there are examples. Okay. I would need to start with some kind of river. Other water reservoirs and dams. OSMT Waterway. Okay, let's try... Let's just guess that something has... Subject OSMT Waterway River. Limit 1. And OSMT Name. And what's down there? OSMM Lock Car. Thank you. I hate this toast down there. I can never get to it in time. Okay, now we have the Chico River with coordinates. Which is this subject. So, let's copy this. Comment all of this out. That was with the alt key held down, by the way. And then describe this thing. Let's see what else it has. It has a way. No, the way is what we're looking at. And the way is Waterway River. User Name. Okay, let's use the rel instead. And see if we can find something useful there. It has ways, ways. Lots of ways. And the ways have coordinate locations. But I'm not seeing a line path or polygon there is something. So, I guess I would want to look in the wiki if that's... OSM. Our data is stored. Polygon. Generating polygon files. What does this do? So, Fox only provides access to centroid points, not geometries. It doesn't have the info for the river. Okay. That is unfortunate. It's this. Okay. Then maybe we have to give up on the idea of using ZorFox. Or at least I would... I think it would be better if we go through some other queries. Because trying and failing to figure out ZorFox is probably not the best content for this session. So, if you have any other ideas for queries, feel free to dump them in the etherpad in the meantime. I could maybe look through the queries I wrote for the data challenge yesterday and see if there's anything interesting in there I could talk about. Belting the centroid shows something as the result as rivers are split into many short parts. Okay. So, maybe that would be useful after all. But we would also need the relation between something and the wiki data item. But okay, let's go with that. Let's describe wiki data. This one, which I think is a major river in France, isn't it? In Switzerland in France, okay. Don't think that one ends in the Mediterranean? No, it does. Great. Then that's an example. Yes, okay. OSMT wiki data. So, we look for select, we have a rel, OSMT wiki data, this thing. And then the rel, let's look for everything those rels have. Oh, that is a lot of results. What? It's supposed to be all the ways as the predicates and the inner or outer as the value. Okay, that's weird. But then OSMM has was the one I remember from the other results. So, we have ways. So, OSMM has way. And then the way has what were the coordinates down there? OSMM lock coordinates. And then we want the coordinates and display those on a map. And now we have, oh yeah, that's certainly a river course. The river seems to be branching out here, which is unusual. I assume that might be tributaries of the river. Let me zoom in. Okay, we only selected the coordinates so we don't get to see anything more. But okay, so we need to service, make a federated query to query wiki data. We have our Sparkle where we get all of these rivers that end in the Mediterranean. So, that is all of these. We're not interested in the length part. But the river, we're only searching for rivers, not water courses. And it ends directly, not indirectly in the Mediterranean. Then we have a relation for that river. And the way has coordinates. And let's use the river as the layer. So, then we should see the different rivers in different colors. And it might take a while also. That's interesting. There's three update icons for different parts of the metadata, I guess. This is taking a long time. Let me reload that and add a limit of 1,000 maybe. 10,000 would also be okay, but if this returned like 100,000 points, then my browser would probably struggle to even display it, so I don't want that. Taking a long time. This part should be fairly fast, right? Select where... Yeah, that's very fast and returns 60 rivers, which is not the end of the world, really. But it might be too much for Zofox already. I don't know. There might be many, many, many ways linked to those rivers. Let me see if there was anything else in the chat. Let me see if there was anything else here in the meantime. Speaking of all these data challenge queries, I assume I will... I don't remember if the organizers of the data challenge were going to publish the queries that were sent in, otherwise I will put them on the wiki page somewhere, so if you want to look at them. But while we wait for this, for the list of all the rivers in France, I'm sending order by length. I use normalized quantities, so length in meters, just in case anyone is specifying river lengths in the birthplace of the international system of units as miles or something like that. I thought that might be an option, so let's use normalized units just in case, or someone might be using kilometers and someone else might be using meters. Anything in there? That's a bit weird. I'm curious if other people had a map that also looks similar, because all the rivers in France, highlighting the ones that end in the Mediterranean, means that you get some rivers in French Guinea and... Well, whatever other overseas parts of France there are. I assume this is something in La Réunion, right? No, that is... Where is this? I don't know, my French territories. But yeah, that's how I did the highlighting. I'm curious if someone, if other people, tried to filter out these points, these various other rivers, which are technically also in France. Oh, yeah, I was confused for a second why so many... why some rivers not on the coast here were still highlighted, of course, if they're in Corsica and they end in the Mediterranean, yes, then they're still going to be marked like this. So yeah, that is actually correct. I did not notice that yesterday. Did this return in the meantime? No, but it also didn't time out yet, so Sovox gives us more time, apparently. Right, this one was funny. All the images of different ghost species. I interpreted this as, give us all the images you can, which means obviously using structured data on comments. But it has to be a Wikidata query service query. So I'm using the MWAPI service to make a search query against comments with HasWB statement p180 of all the species item I could find. And then also get the images from Wikidata from the image statement, just in case, and then combine all of this. And also the goat species was a bit ambiguous for me. I first searched for goats and items which have parent tags on this item, Q2934, but that turned out to be no other item, only goat. So I've been looking a bit further up, or I looked for goat items, and then found Capra and even Caprine, probably an English Caprine or something, which are some more general families or genera of goats and decided to pick this one. And then you get 820 results of many, many, many pictures of goats. I assume there's goats in here. They're just very well hidden. Goats in Booty Booty Group. Oh, Petroglyph. Oh, those things here. Those are goats or depictions of goats. Wow, that is amazing. Okay. Oh, that's a great goat. Yeah, so that's how I found my goats for this query. I thought maybe this might be a language dependent thing, because if we look at this item, for instance, in English, that's Capra, but in German, that's Ziegen, and Ziegen is just goat. So if I had searched for goats in German, I would have found this item rather than this other one over here. So how you write the query might actually depend on which language you're choosing, I feel like. And then these Caprine are... I think the common name was goat antelopes, for some reason. And in German, they're alike goats. So that's what I picked as all the species of goats at the end. Yeah, using NWAPI to find all the images on comments. Sorted by conservation status. This was the one where I realized it can't be correct that it's just a one goat item. Coordination of all the shipwrecks on Earth. I think I can skip a few here. And we've discussed this one already with the developers of the COVID vaccines. And then a list of all the scholarly articles on COVID-19. I actually did that completely with another NWAPI search because something like searching in label Wikipedia would be terribly inefficient in the Wikidata query service. You could do something like item RDFS label, item label, and filter contains item label Wikipedia. Or if you wanted to be more inclusive, include more items, you could do the lower case, include lower case Wikipedia. But the query service doesn't have any optimization for this kind of search. So it would just have to go through all the labels of all the items ever and check each of them if they contain the string Wikipedia, which would be horrendously inefficient. And that's what elastic search or a search is much better at, which is why I'm using that. And then I thought if I'm using that already, I might as well use that as well for instance of scholarly article and has statement what this, I think, main subject COVID-19 because it should be an article on COVID-19. And then the query, the Wikidata query service actually does nothing more than build an item ID out of the title and add the label. And that's my query for this. You did this by selecting the title property and contains it was, really? Okay, that's surprising for me. Interesting. And yeah, I only found six results with that. Maybe there are actually more results. But yeah, title is label. What do I just do? Don't limit that to enable but search anywhere? Do I get more results? No, same six results. And that's interesting that it worked with the title property. I wouldn't have thought that. A map of all the volcanoes on Earth, color them by country, and then the least common properties. But that's not that interesting, I think. So Fox is still working. Was anything else added to the Etherpad in the meantime? Doesn't look like it. We have 10 minutes left, but less than 10 minutes. This one I guess is kind of interesting, the query of all the shipwrecks on the Earth. I looked at the Titanic item because that was the first one I could think of even before looking at the immediate next query which says RMS Titanic. But yeah, on RMS Titanic, this one I noticed that it had instance of shipwreck and also a significant event shipwrecking. So I figured I should include both of those in the query and say that the item could be an instance of shipwreck with coordinates or it could have a significant event. I put something wrong in the comment there, but I think I said the right item, which is shipwrecking in this case, not shipwreck. And then as a qualifier, the coordinate location of that. Then grouping that by the item and item label and selecting any random coordinate. So if an item actually matches both of these, such as the RMS Titanic, any coordinate and that is fine. Okay, let me look at that query for the scholarly articles on COVID-19. Instance of scholarly article. Oh right, of course the main subject, COVID-19 is going to narrow it down quite a lot. So that, yeah, okay, that works. Okay, but same, it's still six results. And this actually looks very suspicious. Is that the same paper? Quantitative science studies. Giovanni Colavizza. Giovanni Colavizza. Publication date, 14 May. December. Maybe it's the same article published twice or maybe it's actually something slightly different. I don't know. It's suspicious. I'll just drop it in the telegram chat and let someone else figure it out maybe. But yeah, okay, I didn't realize that. But yeah, of course, once he limited to main subject, COVID-19, which is probably only as count where that is, okay, that is 69,983 items. So that's still a lot of items but it's much less than the millions and millions of items of checking all the labels, what I had said earlier. So yeah, okay, I didn't consider that. That makes it work rather better, of course. At this point we could also check for any subclasses of scholarly article, but I'm pretty sure there actually aren't any subclasses of scholarly article that are in broad use. But let me check that maybe. Class count as count. Class is a subclass of scholarly article and instance of the class. Group by class. Okay, that is actually very, very slow. Just surprising. One is the preprint of the other, might be. I don't know how we model preprints. 15 results. Okay, but this class, whatever that is, actually has 2 million instances and that is a review article, I see. And then this one has a case report, has 100,000. Let me do something else. It has also instance of scholarly article, bind exists, instance of scholarly article, has also instance of scholarly article and group by that as well, or by desk count. Because as long as those, maybe all of those items, again, and something are instance of scholarly article and of review, and then we would still find them with our query. Or maybe they are not, and that's what I'm trying to find out here. This one, meanwhile, timed out. Okay, so, yeah, you really, you can't include subclass of academic article or scholarly article because that makes it time out already, whereas without that, we get results in a few seconds. Six results. And the other one, that's still going. It is still going, yes. Anything else? I also realized when I pasted that link or those links into the telegram chat that was like 30 seconds before anyone could know why I was doing that. That is a timeout. Damn. Okay, then let's just bind review article as class. Limit ourselves to that one and hope that that works quickly enough. Or I could just go through what links here we have. We're almost out of time. Don't remember if anything was in the schedule after me if I need to leave now or there's still maybe five minutes. This is a review article and it's instead of tourism in a region. Oh, good. Great item. Architecture of Norway is a review article. Country Norway. Counts countries. Wait, review. Are people interpreting this as article as in, that's what the Wikipedia article is? Oh, no. A literature review is a subclass of review article. This one sounds like a review. How much can we boost IQ and scholastic achievement? 1996, 69. That sounds like a great time for IQ research. So that's a review article and an academic journal article but not a scientific article. Okay, so if you search only for instance of scientific article, you will miss some items but also you don't really have a choice because including subclasses probably makes the query time out. That is not the nicest note to end on but okay. Okay, apparently I can go until the end of the hour. Do I still have anything to talk about other than would someone please look at this tourism in Lithuania which is not a scholarly review article? Systematic review of... Okay, so this one is a scholarly article and a review article. I mean, we can count them. Discard all of this. Select count as count where item instance of scholarly article and review article. That is... It takes a while. I guess it has to go through those 2 million review articles and check for each one if it's also a scholarly article and that's the more efficient way of doing that because the other way would be even worse but that's going to take a while. If you search for scholarly article you also get the lovingly miscategorized book reviews. Yeah, because they're reviews and reviews are review articles, aren't they? So Fox... Okay, now there's a timeout. I don't know what the timeout was, it doesn't tell me, but yeah, okay, this doesn't work out. I'm afraid. We could look at the results of all the shipwrecks which was kind of interesting, I guess. This one is still running. 24,000 results. Takes a bit to plot them on the map, but yeah. And I think I mentioned this in the GSD channel at the time. There's a lot of results around... specifically not even Great Britain, but Scotland apparently. And also a very suspicious line here which I now realize might be the line which points straight down to Greenwich Observatory. No. If we scroll down, where's Greenwich? So the line is about here. And if we scroll down, we find right around London. Okay, I don't know where exactly Greenwich Observatory is, but I think it's pretty close to London. So this certainly looks like it could be around zero degrees longitude. Then again, it seems to be slightly slanted to the upper right, I feel like. So maybe it's... I'm wondering if this is a bug in some import or if it's actually real data this way. But anyways, it certainly looks like some project did a big import of shipwrecks around Scotland. And that's where all of this data comes from. I'm trying to open one of them, I think my screen froze again, yeah. Now we have it. Unnamed shipwreck Canmore 102,000 something. Does it say who import this? Canmore is a database. Quick statements, temporary batch, okay. I have no idea who imported this or when. But yeah, we have loads and loads of shipwrecks around the coast of Scotland. And then also plenty of them scattered around the rest of the ocean. But that definitely looks like a large concentration. Okay. It'd be nice if we can wrap it up. Yeah, I think with that we can be done. If you have anything else, feel free to always contact me on Twitter at Wikidata Facts. Maybe I'll also look at this Etherpad for a bit. And yeah, thanks for your attention. Thank you. Thanks for the session.