 Hello everyone and welcome to this session about Wikidata, what happened, where we're going. I'm Lydia Pincher, the product manager of Wikidata and we will be taking a look at what happened over the past year and where we're going. Last year hasn't been easy to say the least, but I'm immensely proud of what we've achieved together despite those circumstances. And I hope just like for me, the work on Wikidata has given you so much needed destruction, comfort and the way to do something about the state of the world throughout the year. So let's jump in. Today we have about 11,700 amazing people who are active editors on Wikidata. As you can see, we had quite a jump during the heights of the lockdowns there. When we look at the content on Wikidata, we can see that we now have over 94 million items describing all kinds of concepts in the world. And on those items, we now have about 1.31 billion statements we make about the world and everything in it. About 10% of those statements are external identifiers, so links to other websites, catalogs, databases, stuff like that, making Wikidata highly connected to all the other resources on the web that we have. But we don't just have items on Wikidata to describe concepts in the world, we also have like scenes. These describe words like in a dictionary. Today we have about 570,000 like scenes across several hundred languages. The final numbers for today, I promise. Let's look at reuse. How is Wikidata's data used? It's not easy to get numbers for all of this, but some numbers are here. Right now we are getting about 11 million Spark queries to the Wikidata query service every single day. That is quite a lot for a publicly accessible endpoint, as many of you might be aware with some of the scaling problems we've been encountering. On the other side, about 71% of all Wikipedia articles make some use of data from Wikidata. Not all of that is visible to the reader of the article. Some of it, for example, is used to add categories to the article automatically. And for all those yearning to travel again, Wikiboyage is the project family with the highest percentage of articles making use of Wikidata from Wikidata, with about 95% of all their articles making some use of Wikidata's data. So then let's take a look at what happened over the past year. There are way too many things to name them all, but here's some highlights with a development focus. One focus area was and continues to be data quality, because we obviously want to provide reliable and verifiable information to the world. This continues to be important. Here we worked on understanding, improving and securing that data quality. To help us all better understand data quality, we improved how the machine learning system measures data quality. So it is more accurate in how it assesses the quality of an item. For each item, you can now get a more accurate score that tries to tell you how high or low the quality of an item is. We also developed a small tool to measure how many constraints violation an average item has so we can track that over time. The constraints violations for those who don't know them are the editor defined rules that the data in Wikidata is checked against. This for example includes rules like a person shouldn't have died before they were born. Right now we have about 0.2 relevant violations per item. This isn't too bad, but can still be improved. And the last thing we did here was take a closer look at the different types of issues we encounter in Wikidata's ontology. The ontology is how the different concepts are connected to each other that Wikidata covers. And we wanted to better understand how these ontology issues happen, how they are introduced and what we can do about them. And this is especially important because issues in the ontology make it harder to build applications using the data from Wikidata, which is something we of course want. Now in terms of improving the data quality, we developed two new tools, the item quality evaluator and the curious facts tool. More about them in a second. We also improved which changes from Wikidata show up in the watch list and recent changes on Wikipedia. So that all the relevant changes and only the relevant changes are shown there. So if you haven't tried it in a long time, please try it again. In terms of securing the quality of the existing data in Wikidata, the Wikidata community has decided to semi-protect properties as well as highly used items to make them less easy targets for analysis. And we're seeing more and more that especially the large Wikimedia projects compare their local data in their articles with Wikidata's data automatically with some special Lua code, for example. And they then put those pages where the local data and Wikidata's data doesn't match in special maintenance categories. And this really helps make the whole system quite a bit more robust thanks to the big projects. Now let's take a quick look at the first tool I mentioned, the item quality evaluator. It looks like this. It lets you enter an item ID or a sparkle query to get the list of item IDs. And when you ask it to check those items, it gives you the order's quality score for each of them. Here in the sample of random superheroes, for example, you can see that apparently their double is the item with the highest quality score. This tool is especially helpful if you want to find the lowest quality items in a topic area you're interested in so that you can then improve it. And it is, for example, a great tool for an editor to work on improving items together that are of especially low quality. And the second tool is curious facts. The world is fascinating and weird. There are women marrying the Eiffel Tower. There's a cat with the mayor of a town. There are countries with two capitals and so on stuff that kind of defies our expectation. And this curious facts tool analyzes Wikidata's data for outliers like that. And then these outliers could either be something generally curious happening in the world or it could be an issue in the data in Wikidata. And this tool shows you an anomaly and then you can see what's up. If data quality interests you, you might want to come to my data quality talk tomorrow to learn more. Now let's move on to the query service. The search platform team at the WMF has been working a lot on scaling the query service so it can keep up with the growth and popularity of Wikidata. They have developed a new updater, for example, that will be wrote loud in the near future and it will help the query service better keep up with the fast edit rate on Wikidata that we're seeing now. They also did a lot of research to better understand the content composition of Wikidata to learn how we might use that to help with scaling. There's a survey going on right now to better understand the priorities of everyone who's using the query service. And if you are using the query service, it'd be really great if you took the time to fill it out. It's very short. The link to this and everything else I'm going to link to following now are in the session page for this talk. Please do fill out that survey. It's really helpful. And last but not least, one small improvement that hopefully makes many of you very happy is you can now add a title to your query by using hash title. And it will be used in the results visualization so that it's easier for someone else to understand what they're seeing when they're looking at your query result. Now let's look at lexicographical data. Lexicographical data is definitely got a lot of attention over the last year and has grown a lot with absolute Wikipedia on the way even more so because lexicographical data from Wikidata will be a very fundamental building block of abstract Wikipedia. To help us better focus, the extracurricular team and the Wikidata team ask different language communities to come forward who want to work more closely with us during the development. And I'm very happy to say that we were able to select the Hausa, Igbo, Dakbani, Malaya, Bengali communities to help us and be trailblazers for the future of lexicographical data on Wikidata and for abstract Wikipedia. There's still a lot of missing data in this part of Wikidata and to help figure out which words are still missing and needed the most. We now have a lexicographical coverage dashboard created by Denny and it looks at a Wikipedia and then checks how much of the words in that language Wikipedia are still missing from Wikidata's like scenes. You can then get a list of the, for example, most frequent missing words so you can add them and have a very high impact with just a few edits. Another way to easily contribute to lexicographical data is the lexeme forms tool. It gives you a simple form like you can see here. You can add it to all the different forms of a word in your language and it correctly adds it to Wikidata for you. It already covers a lot of languages, but if yours isn't covered yet, you can provide all the necessary information and then get your language added to the tool as well. And finally, we had a blast with lexico days. It was 30 days of online session that we did together all focus on language and words. We did intro sessions, we edited, we did data modeling, we improved documentation, we triaged bugs and we improved tools and it was a lot of fun. And you can see some of the results and videos from those sessions on the page that is linked here. And the final thing we now have Wikibase and the Wikibase ecosystem. Wikibase, the software underlying Wikidata is used more and more also outside Wikimedia, not just to run Wikidata but also other great knowledge bases. And that will have pretty far reaching consequences for Wikidata. For example, some of them are going to make use of Wikidata's properties. So we don't need to be much more mindful about making changes to them so that we're not breaking other projects. It will also mean that we will have an opportunity to move some more niche or large specialized data into their own Wikibases and connect them to Wikidata over the next years. Now, let's take a quick look at some of the amazing things people are building on top of Wikidata's data. Here, for example, is a scientific paper by our very own Darius and others that takes Wikidata's data to figure out if prayer actually has an effect on longevity. I'm not going to spoil the results of the research for you, go and read the paper. Or here we have the election tracker by Datastory. It gives you a timeline of upcoming national elections so you can see what's going to be happening around the world. Or here, this is entity tree. It's a cool educational tool that visualizes family trees like this one for JR Talking. Then here we have entity explosion. There was a session at Wikimedia earlier today. Watch the video if you haven't seen it. It's a great tool. What it is, is a project extension that opens up the web for you. So for example, what you can see here is the Spotify page for Walk of the Earth, the band. And it shows me links to many other places on the web related to that band, like the Wikipedia article or their Twitter account. This works not just for Spotify pages, but for many of the over 6,000 websites, catalogs, and so on that we link to through external identifiers from Wikidata. And the final thing I want to highlight is the best thing. Tom Scott is a popular science YouTuber and he wanted to figure out what the best thing is. And to do that, you obviously need a list of all the things that can potentially be the best. So where do you get a list of all the things that can be the best? Of course, here Wikidata comes to the rescue because it knows all the things that could potentially be the best in the world. So he took all the items from Wikidata as the things to vote on throughout a number of items that can legitimately be the best thing. And then presented his viewers with always two items and asked him which of these two items is better. And he repeated that many thousands of times and today now we know what the best thing ever is. Again, I'm not going to spoil it for you. Go and watch it yourself and you will find out what the best thing is. Now let's take a look into the future. What's coming? Most importantly, WikidataCon is coming. We will be celebrating Wikidata's 9th birthday in October and I hope to see many of you there. It will be online and it will be a blast, I'm sure. On the development side, we will have the query builder. Writing Spark queries is really hard for many people right now. So we worked on the query builder, which makes it easier by letting you build a query in a visual way. And hopefully this will make querying accessible to many more people. You can try it out on the test system that I linked here already and it will probably release best in the coming days. Another thing my team is working on right now is the mismatch finder. The idea is that we compare Wikidata's data against other databases and then find mismatches between them, which are potentially errors. For example, if Wikidata and the German National Library have a different date of birth for Goethe, the author, someone should probably have a look and see what's going on there. The mismatch finder lets you review those mismatches and then determine if a change to Wikidata or the other databases is potentially needed. And finally, we will soon start coding on a new page for creating vaccines. This makes it easier for people to create new entries for words in Wikidata and hopefully makes lexicographical data a bit less intimidating. And I wanted to leave you today with a lovely quote from the UK Parliament data team that I think reflects the view of many of the people and organizations that use Wikidata's data. And they're saying, our admiration for Wikidata and for the people who work with it knows no bounds. Having a single source of well-modeled, massively interlinked, well-managed data that anyone can query at the press of a button is a real thing of wonder. Thank you all for making that happen and working on Wikidata. If you'd like to talk more about any of the topics I talked about, please feel free to reach out to me. My contact details are here and I will also be in the space where you can talk to the speaker again later on. I will quickly have a look at the etherpad if there are questions I can answer right away. There's a question, how many lexemes have sensors and how many of these are interconnected? We have statistics for that and I will add it to the etherpad. We've been hearing for ages that Wikidata is not sustainable at this point and things like bottom ports have to stop. Is there any truth to it? Yes and no. I would ask you to come to my data quality session tomorrow to learn more about how we think about improving and sustaining data quality on Wikidata. I will take one more question here. I wonder how many come from strings like the constraint checks. So the constraint checks so far have been using some of the query service, but they now no longer should. So that should no longer be a problem since a few days ago. And with that I think I'm out of time and thank you so much for coming. I will answer more questions in the unconference space. Thank you so much.