 Okay, we're live awesome Hi, everybody. Welcome to the July episode of wikimedia technical talks Tech talks are for members of the wikimedia technical community to share their knowledge today. We have the Bishko paper ASCII a Senior software engineer at the wikimedia foundation. He'll be oh you're live now Sorry Sorry guys, I just gotta I have a second video going at the same time a Senior software engineer at the wikimedia foundation. They'll be oh you're live now Sorry Guys sorry, can we start over? We're live right now. I okay stop it, but I'll get us started. Sorry about that I had another feed running at the same time. So I'm gonna actually just start the intro over and then If we can cut that one when we post it to YouTube that'd be great. Yeah. Yeah, okay. Awesome. So sorry sorry y'all Hi, everyone. Welcome to the July episode of wikimedia tech talks Tech talks are for members of the wikimedia technical community to share their knowledge today We have the Bishko paper else paper ASCII a senior software engineer at the wikimedia foundation He'll be speaking today about wiki data query service Some quick house be keeping The Bishko will give his talk and then we'll open up for questions afterwards You can ask on the YouTube live the YouTube live stream or the wikimedia office IRC channel and Yeah, without further ado, I will hand it over to the Bishko Thank you So I have to tell you on the front end. It's a very interesting experience for me. That's my first stock In this manner, so there will be probably some hiccups, but what kind of body? My talk today is called beyond wikimedia knowledge that even a computer can understand like Sarah said will be Focusing it quite a bit on the data query service. I plan to cover a range of topics mostly related to Development side of the solution, but I will also talk a little bit about wiki data and wiki data query itself SparkQL and and things similar even if many people of you know those tools already There will be a surprise at the end something extremely new That's may pick your interest. So let's begin Uh Well started most of my introduction. I will just say that I'm a part of the search platform team and We're dealing mostly was the name suggests with search across the community of projects, but also and currently mostly Would maintain maintaining with WDQS, which is this short of course for W data with data query service But it's also my main focus in the organization. I've been with wikimedia foundation for the last seven months So that's not very long a little disclaimer here Opinions will be presented here probably and those opinions are obviously my own So let's I straight into it. I Want to start with the topic of the semantic web basically it's hard to tell Say anything about wiki data wiki data query service Without at least mentioning this the concept of semantic web So semantic web was created as in standards was set for semantic web around 2001 But the idea itself was voiced at least few years later the idea was to create a worldwide web of Interconnected entities basically Add metadata to every page that will describe what the page is about how represent the content and digestible by a machine way Have a standard ontology on top of it that would allow allow knowledge to cross boundaries between all those pages applications And of course the metadata Model metadata was to be provided by the alpha of the page There were many different formats to describe this We'll be mostly talking about one today one that is most important to us But let's dive and do what the description actually should look like so this This is probably something most of you know, uh, that's a The children's rhyme Murray had a little lamb its please was white as snow There are like many versions of this one, but Mostly boiled down to the same thing and exactly if we want to boil down to the gist of it What's the what is the information presented in this? In this children's rhyme We would get something like this so This is what you see is Knowledge graph that can can be represented as a metadata and In case of a semantic web. So basically information we have our statement likes Statements like Mary has a lamp Statements about statements that the lamp is little and its color is white I also can extract the information About the color of the snow because please is white the snow So, uh, we can also save this information Uh, describing color of the stone obviously we know the snow can be a different many different colors, but that's just going to be white So this of course was the person should be represented not in form of the graph itself But the statements I mentioned before we will dive more into this But let's first try to understand Why would we even want to try to do something like that? Uh, because uh, as you can imagine Having a page that has to be updated not only not only content should be updated, but also Also some form of metadata is a it's generally a problem if For developers out there. That's like Comments in the code. They tend to get outdated quite soon um But the goals of the semantic webs, uh, are quite maybe not noble, but very interesting So first of all, and that's the basis of semantic web is to create an actual worldwide web that is machine-readable So every single page describes Describes It's content. It's knowledge and uh, compatible form with the other every other else So there are ontologies which basically in that case mean a set of agreed upon names identifiers That each page would use to describe its content Which would mean that And if we get contents from the two different pages relating to the same concepts like I don't cars or animals Of some sorts you will be able to connect them together and basically get a Information from both of those and add another one you get more information Which means basically you could You'll be able to infer much more knowledge than a single page would be able to provide you Um, obviously That's a great thing for scientists because they can infer new knowledge based on this but But from a perspective of development It's a fantastic base for AI and machine learning Machine learning implementations Working on stuff that's called data Machine learning can Get some part of the knowledge that's some that I will dive a little bit more on this after that but Semantic web can provide a lot of context that may not be May not be obvious from a simple analysis of some text That's something that is curated by users. So we have a higher chance of this being actually correct Um That's uh, there are a few applications of this that I really like and I will be diving this a little bit later another interesting aspect of of Semantic web itself is that it's basically a language agnostic So if you see we'll if you see some examples of semantics web connection Uh, you can of course see some labels in some specific language. In many cases those labels are translated to multitive of languages Which means that if you have a graph representation of the knowledge you can basically provide a Flattened view or even like natural sentences That can be given in any any given language because the knowledge itself If you have the translation you can provide it provide new Knowledge basically on the spot. Obviously some work is acquired and And that's a huge amount of work to provide them in the human readable form like Well comfortable for let's say that But it's still much closer to having uh, Having a knowledge that is language agnostic than anything else Um, there's a many of you probably know already know there's a project that's that touches this subject quite a bit Okay So let's uh, those were the whites now. Let's think about how it's being realized in the real life Uh, so there are many different formats used for uh for rg for for Semantic web Model by men but different we're focusing rdf, which is a suggested standard by the worldwide concession Uh, we will talk about it because that's what we use basically as as our format for ulti data and structured data comes So this is basically a metadata data model. Uh, it uses a u r i s Uh To as a base for notification this could so something that we should all be quite familiar with rdf is a Specification or rather after a few recent or not recent but latest change Inversion number. It's a set of specifications that you can all go and see Uh, so it doesn't actually describe the format, but there are a few few different formats used for describing resource description framework um those That we mostly use our turtle and n gipples format, which I will touch a little bit upon quite soon Um So what you see here is the same graph we've had before but the statements are represented as a sort of rdf. It's Uh, something that would be translated to rdf uh What we have here is basics your basic statements that are basically triples subject predicate and and And object That's how we mostly describe this and we also have things called free reified statements. I'm probably pronouncing this incorrectly, which uh describe uh Those are statements about statements. We call them qualifiers. They do present some additional knowledge About the about the statement itself You can imagine that for example, if we would describe some famous person We would for example say that he was or she was educated at St. John's University And qualifiers would for example tell you that Here she started this year in the state and another qualifier would tell you that That that this is the end of his or hers education Uh, and basically what the format we use looks more or less like this doesn't do it. That's a turtle format we What you would basically see is something like this. That's a full quote a full uri that describes like I previously mentioned subject predicate and object, but Honestly, that would be quite a quite a handful to write every time. So Format introduces something called prefix. You can for example state that wd will be Something like this wd t will be something like this. Uh, what is important here is that This is basically a Replacement if you see something like this, this will be replaced. What's something like this? There is no magic involved here There's no domain or address resolution. Those are your eyes. They don't have to actually be connected to any actual physical page They can be done and in case of cookie data they are Okay So now we're starting to Go into the subject most important subject of our of the presentation so rdf Basically describes a graph. So each statement Like in the form of this like this this kind of Is basically an edge in the graph that describes Describes a directional edge. So this Set of those statements build a graph And since it is a graph, it's quite natural that we want to ask questions We won't ask questions so The recommended way of doing that for for the semantic web is spark u l acronym stands for spark u l protocol and rdf query language It was definitely invented when recursive acronyms were a thing What it's basically does is allows you to uh allows you to query and rdf datasets. So Um theoretically all you need is a some form of rdf which can be a file And there are tools that allow you to use spark u l this It has few interesting features that support that That support querying the graphs One of my favorites paths. Basically, you can ask a path in the graph, which is something that will be Very difficult if if impossible to do in the standard SQL state queries It Does support other I'm sorry And besides this they also supports quite interesting features you won't find I wouldn't say anywhere else, but that's not really popular. One of the features is called federation federation means that Spark u l allows you to ask another spark u l end point inside of the single query. So for example, you if you have a statistical statistical Spark u l end point somewhere and you have And you want to join this with some other data that uses the same ontology. That's the remember That's the thing about semantic web. It is a common ontology You can easily use federation to ask a Ask a sub query inside of your bigger query But you ask it a different server Also, it's one of the supporters is called service service calls service calls look similar to federation, but Allow you to enrich the data with some external service We will have example quite popular example for this quite quite So semantic web was announced about two decades ago And it had a very high high hopes of bringing some order to to chaos And allowing are very very nice things would do it itself. So It's 19 years later. How how does it look? Not really great. I mean in one perspective, it's I could say that Semantic web is better than ever, but on the in the same time it's The ideas many ideas, especially the ones I just described basically now exist in the general when it comes to the general web So what I mean by this people well It's difficult to keep up with the metadata So when you have this the pages that are you have to fill it out by yourself manually and that's the idea of the process You have this done manually and Honestly, the I don't think that incentives for doing that are Ever-reissued to general people Which means that there are issues with Finding the pages that actually use RDF describe its meta its data with metadata But it doesn't mean that semantic web is that Semantic web currently is probably better than ever and will get even better Because uh front and I will call it front and part of semantic web, which I mean by this is the using Knowledge graph basically that's a very common description And many many kinds of many kinds of achieve If the kinds of projects Quite is quite popular nowadays The places that don't use the issues with the Being up to date with RDF descriptions were basically replaced with the AI and machine learning projects that that Come through the web and try to understand Try to understand what pages are about and it happens and it works very well As you can try it out google facebook even linkedin They've been using the semantic web or knowledge graph in their case to quite to quite great success There is there's quite a lot of issues with the though. Uh, first of all some this way of Getting the information from the pages using machine learning Is is prone to an informational bias basically Problem is that unrepresented underrepresented knowledge will not Be handled very well because all of the machine learning that others will work on some sort of statistical data which means that Knowledge that isn't very well represented will be will be of worse quality On the other hand It will never be as context of those searches will never be as good as Something that human could could Could have created or community Fortunately, we have a way of doing this What didn't work in the large competence of what can can work in the context of a project that is focused on the On the knowledge itself Obviously I talk about toolkit data. I won't go much about into Data element it is a knowledge graph. That's basically For people who don't know the details here It Kind of it's a similar thing that I just described. There are a few additional things here Additional additional things. Well, basically the additional to a general semantic web is our own on ontology So we have q items and liquid data. We have properties We have values. We have other q items. We also have qualifiers that's what I described before as a sentence about sentence and we have References, so this is a wiki wiki project. So We do want to be able to trace to a reference just as we would do this with the with the Wikipedia page so As everything else in wiki media projects, it's it's edited by community It works much better than the original idea in the small context and also wiki community our committees are basically Dedicated and share a vision of free knowledge accessible by anyone so That's something that's that Is perfect perfect place to actually implement the semantic web It's eight years old It's I believe I heard in january that is the fastest growing wiki media project right now And it can help with some of the of the issues of the Of the automatic camping and machinery algorithms Uh, it won't help everything so When using wiki data Remember that there are things that can can be biased there also like in any kind of a community driven approach with editors volunteer editors So in most cases, uh, the best example is that wiki data is for example Not a very good place of researching things like what's the most popular? What's the most common cause of that for example? Starvation won't be very high there because People who are notable and that's the same condition as the place in wiki pedia Only notable people can can appear in wiki data. There will never be a database of everywhere every people on wiki data And those people generally not come from the farm district in communities. So this is not a very good way of representing a Statistical knowledge, but contextual knowledge works much better with this Um Of course like in every other community driven editor volunteer editor based project There are some biases based on the Of the distribution of of editors This is something wiki media quite involves involves quite a bit and And community also has the many projects that Try to address the under representation of Of some knowledge and that's editors are not the only at least not directly only Beings editing the wiki data quite a lot of it is also by by both right now There are different things some of them are used to Update the data based on some statistical knowledge available in different other databases Uh, another example There will be a film the graph like adding the reverse links Which also is being done by both Uh, both of of course maintain maintain but by community We do expose some some resources that can help with maintain one those are but yeah, those are pretty much community Community And it's mostly separate from wikipedia By me being saying mostly I mean there are a few things that overlap Places that also use bots for From from wiki data to wikipedia uh Quite directly we do have site links from q items q items present the items entities and data To pages in wiki data I won't go into details. There's a fantastic talk from them that tells More about wiki data itself. It's ideology and so on Out there will be links at the end. I recommend everybody to check them out Some of I guess They will quite much expand your knowledge on this on the subject. Okay so, uh This is the fun part because I'm going to do some demonstrations and I just get to replay change the operating system three minutes before this presentation So there's bound to be things that will fail um Let's see. Okay. So let's uh, the idea is to present some queries Not much of them I wanted to show some fun fun aspects of wiki data query service and spark ul that We provide In our service uh Those were definitely will not exhaustive and uh, I highly recommend if you find this interesting Uh, also seal in some the end of my presentation and Try them out for yourselves. Additionally, there will there are some examples. Okay. So first question I will ask the leader query service when I was isaac asking the worm uh Now I know this looks scary. So let's uh, let's see Uh, how it looks like in the actual service I'm I hope that everybody sees this The the service itself So let's input the query and let's talk about what we see here um So first of all, uh, let's start for now with something Basic let's start with a very basic query. So as you can see, uh, sorry this is uh This is uh, quite uh similar to to General like SQL languages. Uh, but it does have a thing of its own right now. I will do something that is basically, uh get all query with some Uh, I hope I did it correctly. Yeah, so Let's talk about the stuff about what wiki data queries have actually is So what you see here is a wiki data query service Gui It connects to a backend service that is that I will talk a little bit more in a few minutes Um, I'm sorry. I had a polish language. Let's switch it to something more universal universal So quite a few fun things are the If you want to see what prefixes are used you can browse them here Uh, if you select something we'll put one here. It's not necessary. You can use wg on other Trifixes without this, but if you really you will like want to see them you can you can do that um another fun another Fun part of the wiki data query service go is our examples. Those are the great ways of exploring the service itself Uh examples as as everybody everything else is is community community driven So, um So, uh, I recommend highly to to see things like this as I see it still translates. Thanks to my my language. So, sorry for that So, um I recommend going through examples. They will show things much better than I will here And what you also can see is of course play button and here are the results Okay, so let's see what was the thing about about the date of birth of isaac asimov Let's copy this one again So let's uh, generally if I would ask, uh Some of somebody's date of birth I will do something like this Uh, wd is the graphics for a general wiki data items as a auto complete here because Identifies can be quite unwieldy Doesn't matter if you inside inside of the computer programs probably not that important But in case of the query itself, it's useful to actually see something. So let's find asimov here Sorry for that Here we go This is demo. I'm doing this a different system. So something will break the date Let's uh, let's see what happens Yes, we found something. So, uh, as you can see this query shows And his date of birth to be January 1st, January 1920 Uh, but this also is something that's it's not immediately Visible here there are qualifiers to this information. Let's see isaac asimov on the Under On the actual page Let's copy his id So here's the wiki data page on the isaac asimov So, uh, what you see here are all those statements that we can use in the wiki data query Uh The interesting part i'm i wanted to show is This one So if we ask directly for the date of birth, we got what we told we couldn't get It's the first of january because that's a basically a year. So it's translated to the first day of this year, but Actually, uh, nobody knows when asimov was born. He would never never actually did admit When uh, when was that so, um, he did actually i think he did Uh Celebrities birthing on 2nd of january, but it's not really it's not really sure when he was actually born. So the query i presented There's a thing differently. So Let's see this query my query will look like this Uh We will use different different prefixes This will be a prefix that will allow us to get into qualifiers. Basically. This is the same property we have here, but it's Will allow us to get qualifiers and qualifiers the information about information We get by using the pq pq prefixes So if I do this I'm actually able to see both earliest and latest date of his birth. So Okay, that's uh, that's In that's something let's go further. Let's i want to show also. I'm sorry probably Uh, so the graph features of the wdqs We want to see all the descendants of elizabeth elizabeth queen elizabeth's the second Chorus looks quite simple. I will explain it in a second. Probably have to change a bit here The bishko I can I interrupt you really quickly? Can you? Can you increase your screen size just perfect? Thank you Okay Now folks on the on the youtube screen can see it a little bit better. Okay, that's something I have had set up before Okay, so let's see First of all, we're at the query. So we'll see what I'm relying And let's explain a few new things that happened here First of all, I again created some variable in the query itself, but I didn't use it here We'll get back to that in a second But interesting part is his here is this p40 as you can see Yeah, here we go. It is a sort of a parent type relation from child to a to a to a I can get the word like parent or grandparent and so Plus is a fun one here plus basically means that If we have a path between this item, this item is obviously also a bit second And to this item there should we only expect those kinds of relations in between and at least at least one of those We can do much more with this. We can do a horse on this path I want to dive into this here, but there's a link that describes this in the end of my presentation But like I said, there is something different about this so This is an example of a service call. I was I mentioned before But this service call allows you to label the data that is right here. So if there's an if this isn't an entity And for example, this is an entity You have information here that tells you what is the actual label of this entity and it can be presented in different languages As you can see in policy, it's aljubietal druger language as of the second so Those kinds of labels can be Can be presented here. You can enrich your data with those labels if I do something like that It's still here It's still here, but I get something additional. Okay, let's go here. I see that I Spend here a little more time than I wanted. So let's skip this one This is also an interesting one You can combine the you can create path on your own For example, if you want to know and that's like a small recommendation engine for you. It's basically What are the Bands and projects of different members of Nightwish So this is the query. Let's execute this instead of just showing you Okay, here we go and what you can see here is that I found the relationship member of Band I am interested in Nightwish. So this will basically return to me every single member of of Nightwish and once I did this again, I could probe this differently, but The same way I will get all the bands right now. I will also get the band Nightwish, but we can exclude this Maybe I'll show that So here you have a list of my bands. Like I said, you also get Nightwish. The fun part of here is we can also group by those bands You can you may be wondering why this is sub-select here because Unfortunately, you cannot really group by the at which data come from a service So you first have to get a list of the q items and once you have that you can again apply service service called label and apply the labels Okay The last one query I will show which is about Seeing what we can do to consider the planet And this one will be important later on so this one describes the Describes which we're talking about the solar system basically part astronomical body is the sum Oh, by the way, dots are because every sentence should end with a dot some statements too and We are interested in uh Those kinds of instances and instances are inner planets and outer planets of our solar system We could drop this but this makes easier to query this because we limit the number of data And we of course can language this Outer language is probably bad because this operating system is in polish. So And as you can see That's the list of planets Unfortunately Pluto didn't get the attention it really deserves Okay So what could be the application become academic? Academic culture we already touched upon I can easily see and there are some examples of knowledge based apps That would allow for example to see some some To show some museum items or some paintings and so on based on rookie data Uh There are instances of science data from rookie data use for example for different calculators to to some transformations but the most Interesting for me example is a company as a search And that's something that google uses. That's something that Facebook uses and so on It allows you to provide a more context to your searches Arrows you to write Queries to match a broader subject if you can pinpoint which item or which entity represents the query itself you can provide a basic knowledge graph provides something That's that's That represents about the subject quite interesting case mentioned recently on our office hours was diversity search for example Amazon If you search for amazon you will find an amazon page, but you can also add different other entities that Uh fit the amazon words like the you know the river for example Also could help with the query understanding. Those are not only entities in rookie data, but also rookie data provides statements itself and statements in general Can be Transca you could try to match your searches to sentences These statements which will give you a huge boost from the perspective of potential results And obviously the relevancy of their of them So let's talk a little bit. Uh, I'm running out of time. So I will probably get through this very quick What's from the developers perspective and uh specifically What how does the wdqs with the data query service look like I'm in the hood? So this is a repo. It's a mirror of our Garrett repo, but I learned that presenting Garrett when we first met meet is Something that people don't usually do but There's a good half mirror that's gonna easy easy clone Some the schema of our let's call it architecture is quite simple. We listen to recent changes We can also ingest rdf dumps Wicked data provides rdf dumps and there's a two-step process That contains of monitor and updater Updater basically communicates with blazegraf blazegraf is a query as as our database Well graph database that allows us to query with using spark well and gooey partial already saw it's something that Blazegraf is communicating well gooey's compute with blazegraf. So The manager may sound a little strange, but We do not use the data from rdf's provider work data directly There are a few reasons for this first of all WQS was designed as a Service that anybody can use on any different rdf set. So there's we don't have to use our wiki data At least that's the idea, but we want to be able to validate that That the item that the statements provided are are valid We do some cleaning up of the data We remove duplicated items like there are labels that come from three different varieties that mean exactly the same We remove some unnecessary data like types for size We allow we do not use that that's specifically, but we allow data filtering So if you want you can filter out only the data you're interested in and we Do a slight transformation when it comes to some data to make it basically More compatible with the queries itself So that's what manju does it does it's on all the sets based on the specific queries Q items that are in subject That is the updater updater Is as the name will suggest something that updates the blazegraf entries Each update is basically a few different update queries that First of all, for example, remove some stale entries update actual statements updates statements about statements update references song We use blazegraf to reconcile the differences This approach has been a little bit problematic and there's something we are working On actually replacing we are currently working on the thing we call streaming updater that will reconcile the difference outside of blazegraf So to help with the scalability and also allow us to potentially think about having different data Bases in blazegraf. We quite depend on what blazegraf does for right now. So what I'm talking about this that's because official query service is quite limited when it comes to timeouts when it comes to Throttling and so on it's a service that and really basically accesses our database. So ideas that if you want to try it out if you want to do it offline for example Use offline like to enrich some some search parameters, but But use the In some of the new in your workflow Then that we can probably use query data.org But if you want to but it's quite easy to just set it up by yourself It's all the data that we use our public ones. Uh, so it's you can just set it up on your own Infrastructure and somewhere and you could use this quite easily from the Basic point of view there are a few scripts that can help you once you download the download the repo There's a run blazegraf script with a bunch of parameters. You probably won't have to use for the first time You have to have run blazegraf running in the beginning a munch as Munch requires some points. You have to provide a path to a to a dump Uh, basically, this is a way of setting up for the first time. They're Looky data query service instance And then you can have use the script called load data that will allow you to Load the munch data inside of your into your own wiki wdqs instance Beware right now. It probably takes like like a week, but that's the first time you have to do this Uh, and you continue this run you would use again blazegraf, obviously, but you can use Updater just like we do basically updating from a certain point in time. All the chains will be downloaded you used And you can use them Use blazegraf normally and I haven't mentioned how to set up GUI, but there's a docker there will be a link to us to a to a blog post that describes this How to set it up also provided more detailed information on the on the commands. I just provided I also recommend to watch the the blog itself because atro is Heavy at work Trying to provide you the better options for data ingest basically ideas to have something That will allow you very quickly to bootstrap your wdqs instance without actually Doing the whole process of the dating by yourself When it comes to using core Wdqs, this is the end point you should definitely remember It's quite common that if you have some service that has rdf data, it will provide slash spark you at the point You can do access this program accordingly. I myself Am a java developer mostly so I use upper jama rq for this jama is a whole surround rdf It allows you to create a memory databases or even like Once you probably you could expose And also provides you the tooling that allows you to query both look rdf stores and remote ones like wdqs I know that some few people use and quite often use python rdf lip is quite popular library that allows you to do the same thing and I would always recommend for anybody of you to contribute wdqs data query service isn't as popular as media wiki for example So we do not get much many sorry many contributors and we would very much like to Operate to cooperate with with people from outside So if you feel are feeling up to it It's a fabricator board I would very much like to work with you. So what are the next steps? The first thing that is going to be announced like probably quite soon Is wiki media commons query service? I mentioned that wiki data uses rdf, but actually Underneath wiki data is wiki base. It's a general api that also structured data on commons uses If you've been to the comax token know what it is, but in short description. That's the that's something that allows you to query the To have some structure information about the Media inside of the wiki media commons. So pictures video audio described in the digestible manner using also parameters from Using ontology from wiki data. So a quick quick example of this I probably shouldn't show the World bar, but we'll soon be doing this. So let's go with this Wcqs a little bit different from the wiki data query service. It's It's behind the wiki media commons login. So we have to Second If you go You have to log in to access this, but there are no restrictions right now. We did to limit Some spamming some bots that query way too often. So let's do a quick example Here we go So this this is the query that uses federation, by the way, this is the federated query Since uh, wiki media commons only had a content about about the media We do actually need some more data to find out something interesting. So you may recognize this query It's basically the same query we had before about the planets and wd Okay, so again wrong language Give me a second Okay, so wd t means the text. So basically that's a very common Property in wiki media commons, which basically means that something is in this picture I needed to limit a bit a number of pictures because It started index. So I limited it to celestial bodies And here is the exact query we had before actually needed that here Which basically selects all the planets And here I can get the entries on wiki media commons that show this plant. I hope Yeah, for example, that's Jupiter And we should see some plants here too. Yeah, that's error So, uh, that's the quite example. I imagine that will be used quite a lot as mean I mean the way of doing this federate things to wiki data and actually wiki data query service and get information that will be enriched with the wiki media commons query service It should be launching today. I hope that nothing changed in the last 15 minutes And there will be announcement on this. This is beta. So this is quite limited But I hope that will bring some additional value. We this is definitely still work in progress Okay I'm near in the end so Obviously something that many of you probably already heard wiki media foundation much the new project called abstract wikipedia wiki lamp, if you will, I think abstract wikipedia is currently currently Official name project is launched by denny who is also creator of wiki data And it will be based on wiki data. It's uh, when I describe the language-agnosing approach to creating new content That's basically what we abstract wikipedia some has to do. It sounds It's a very very simple sentence for something that is extremely complicated But I hope we'll see very nice things coming from this Like promised as I promised Here's a bunch of links that you may find useful RDF format description Link to an actual blog. This one particular because this tribes the The setting up guide this is a guide to setting up docker Instance of WQTQS how to load the data and so on I would recommend to to see All of this in the log. It has many interesting articles about the WQS other things the Assets guide to SparkQL. It's a roughly two hours introduction to SparkQL quite. It's very good If you're not a visual person, you can go with our own official WQ data SparkQL tutorial And I mentioned before talked About wiki data concepts from denny's So yeah, I'm hoping that some of this Spark your interest in using WQS in your applications or Contribute maybe even contributing to the stuff we do Um So thank you very much Awesome Thank you. Uh, so what we'll do now is we'll go ahead and open up for questions Um, there is a definite lag between um our zoom video that we're recording right now and the youtube stream. So, um For folks on the youtube stream, it might be a little bit. Um behind before It might take a little bit for your question to get answered I don't see too many. I'm going to talk for just a minute and see if a couple come up Um, I just wanted to ask you about the resources that you have in In your talk that you did Uh, will you share with um with us the slides and then I will post You um, that was that was always good. Yeah, awesome. Great. And then so then and then I'll also share those links out on our youtube stream So that folks can just access stuff from that stream as well if they come and and watch this later Um, yeah, super interesting Um Okay, I'm looking at our questions right now Um, I'm not seeing anything in irc. We did have one question come through on um the feed on the youtube stream and um Uh, it was answered by another community member, but I'll go ahead and ask you to just um just in case there You have a different a little bit different answer. Um, is any team also using scalia for spark related? Uh jobs was the question I would answer if I know what I think I heard this but uh, I can One thing let's get one thing clear is that uh from the Which media foundation perspective there are two, uh two Uh Not teams, but uh Both wikipedia foundation and wikipedia.deutschland works on the wiki data wiki data query service The main uh main part of the work is being done And wikipedia.deutschland they they handle wiki data itself And wiki base which is a base for wiki data and also Comes comes such a data in commons you can think about database Uh We handle key data query service and not even not a complete one because goo is also Handled mostly by wiki data query service by wikipedia.deutschland So, uh There are at least two and of course like the the heaviest the heaviest user the heavy in terms of traffic are as the community We unfortunately not know yet much about the profile of our use cases, but we will be doing Investigation this in the second half of the year So So i'm not seeing many other questions coming in there's been a little bit of discussion on that stream So i'd encourage you to go look at the youtube stream afterwards and just have a look at what folks are saying as well um Yeah, so this was just really really interesting. I'm actually super excited about um About the wikipedia commons query service that looks really really cool. So I can't wait to be able to play around with that as well Yeah, one word of caution though. Uh, like I said, it's a heavy database right now um Now what'd be a communication about this? Uh, there should be communication about this and Today, maybe it was postponed. I'm not sure but For example, we didn't set up yet. I'm in the idesis. So this will be This can get quite slow. There are a few missing functionalities. There's not a like a 100 percent coverage of the functionality to debate a query service has Uh, but we are definitely working on this and this is quite quite important in our priorities right now Awesome. It was good to have the the sneak peek of it Um, cool. Well, I don't see anything else coming in um, but what I would like to do is for anybody watching uh, this after We wrap up if you do have questions feel free to um, send them to me or and I can pass them along Um, and then we'll make sure that those get answered for you. Um, either We'll leave them on in the comments on youtube or we'll um Leave them on the wikimedia technical talks page for you to be able to go back and look at them as well And then we'll also have this list of resources for folks to look at because there's a lot to a lot to look through and it's super interesting So thank you so much for doing this talk today. It was it was really cool um, and also thank you brendan for um doing the av um And then again anybody who's interested in doing talks or um interested in technical talks in general We have a full list of them on our wikimedia Uh technical talks page. Um, I'd encourage you to go there and take a look and if you're interested in doing a talk yourself Let us know Can I answer a quick one question real quick? So a question that was about uh, if I said the quality is low on uh Because common was missing in the articles. That's not really true. It's the only that uh Uh data only represents notable people so they'll be so Judging from this set of people will not be enough to get a larger view of for example some Of the reasons of the cause of that and on the other hand, uh, as in any community driven approach It's uh content itself is also uh, Kind of a derivative of the people who create the content. So for example, if we have a uh community of Fish and specialists, there's a quite quite a good chance that we'll be missing some facts about uh ancient Ancient cultures. So it's quite important for any kind of this community of this Kind of community driven project to have a good representation of different knowledges different backgrounds and so on So it's not like per se a lack of quality, but rather under representation that we all are Maybe that's struggling with but we want to uh, we want to address That's it Awesome. No, thank you for thank you for taking that one. Um, yeah, I didn't know if there was uh, that that's totally perfect So thank you so much. Um Awesome. Well, we are at time. Uh, and again if anybody has uh, has more questions Go ahead and pass them on to us and we'll we'll get back to you and thank you so much again Thank you Bye. Bye Bye-bye Thank you