 Wow, so many people. Many more than I would have expected, and many more who unfortunately couldn't be here, but we have streaming so that they can also take part in that. Thank you very much everyone for coming. It's really great to see you all here. I want to give a bit of an update on the state of project that is obviously a very limited view because Wikidata by now is so big that even me don't know everything that is happening. But still, I hope it is useful for you to hear that. First of all, five years. We're here for five years. Can I get a show of hands who has been here since the first year? Wow, a round of applause for you. What up? I should not do that. Why he is fixing that? Who of you is here for only a year or less? Yay. Thank you very much for coming. And yes, everyone else, if you're sitting next to a person who is only here for a year or less, introduce them, welcome them. So let's take a look. Why are we actually doing this? We're doing this to give more people more access to more knowledge. By the force also, because geeking out on structured data is kind of fun. I know. A lot has happened over the last year. I'm only going to give a short overview. By now, we have around 17,600 people who make at least one edit a month. That is really cool. And we're going to talk about this spike a bit more. About 8,000 people who make five or more edits a month, and the ones I love the most, people who make 100 edits or more, of which we have about 1,400. Who of you is among those? Very good. Very nice. We've also gotten a lot more content over the last year, especially over the last month or two. And by now, we have about 37.8 million items. That's a lot. A lot of concepts in the world we're describing. And we're doing this with about 3,900 properties. I was talking to someone the other day who was mind-boggled by this number. But about 2,000 of those are actually external identifiers linking us to other databases. So it's actually a bit less scary than that. What we can also see is that we've massively increased how much we know about every concept we have. So a year ago, we were at about five statements per item, meaning we knew about five things about everything we have in Wikidata. By now, we're at eight over just one year. And again, you can see most of that has happened over the past months or two. We've also made a lot of progress in terms of how multilingual is Wikidata. But through all the imports that have been happening lately, we're kind of losing out again. Unfortunately, the blue line is all the items that have only one label. That means there are a lot of concepts on Wikidata where we only know the name in one language. And that is kind of sad. And I think that's something we want to improve over the next year. Another thing is this. This is the number of links to other Wikimedia projects that our items have. And as you can see, the number of items that don't have a link to any other Wikimedia projects is increasing a lot as well. And I think that's actually a good thing because we are growing our scope and we're not only talking about the things that are covered by Wikipedia, but of course we are also covering all of these things, which is good. Over the past year, the usage on other Wikimedia projects of Wikidata's data has also increased. Here you can see some examples of info boxes that are either coming completely from Wikidata or almost completely on different language versions. Is anyone here who is coding on these info boxes? Nice. Thank you. So over the last year, the usage on Wikimedia projects increased by 60% of our data. And most usage we see on Chinese Wikipedia, Commons, Catalan Wikipedia, and Russian Wikipedia. We are also seeing a lot of queries to our Sparkle endpoint. By now, 8.5 million queries per day, thanks to the team who is making that not explode and keeping up with all your demands. But what was even more impressive for me over the last year when I heard that is that by now, one third of all edits in Wikimedia projects is happening on Wikidata. One third. But what's even better, as you all know, Wikimedia projects were not growing anymore. We were not very good at recruiting new editors and retaining them and so on. So by now, apparently, this trend stopped. And we are back to about 2% growth of non-bought edits in the Wikimedia projects. And almost all of that is happening on Wikidata. So on the technical side, a lot has been happening. So for example, we started out with experimenting with support for Cytoid to make it easier to add references to data because having references is important. We have actually meaningful order of the statements in an item by now. So you can easily find what you're looking for. We made sharing on social networks prettier. So we've also done a lot of work on data quality. Or else, for example, the machine learning system that judges edits for how likely to be vandalism has been improved a lot for Wikidata. And it's been expanded to try to judge the quality of an item automatically so that in the future, you will be able to say, give me all items in a specific area and order them by quality so I can work on the worst items in those areas, for example, or give me the best items in that area so I can showcase them. And the biggest one, I think, is the constraint checks. So they have been around for a long time, but not a lot of people were actually able to use them because they were very hidden. And we started developing a gadget that now shows you constraints violation right next to the statement when you look at it so it's much easier to see for everyone where the issues are. If you haven't tried it, give it a try. There was also a lot of improvements around documentation, which has always been a bit of a sore spot around Wikidata. There is now a page that helps you, a portal that helps you around the query service, explains everything. There is a portal for how data donations should work and how people can contact the right Wiki projects, for example, what the right tools to use are, and so on. And then there's two that are in progress that it would be great if you join and help improve them, which is a page for Wikidata in the Wikimedia projects, which started out as a page that addresses some of the fears and issues that people in other Wikimedia projects raise again and again so that we can address them and have good answers and pointers to how to fix issues and so on. And the second one is a documentation to make it easier to actually use Wikidata data in the other Wikimedia projects. The query service has seen a lot of improvements. For example, it now has support for linked data fragments. You can write queries that take other Sparkle endpoints into consideration with Federation. Unit conversion was added so you can actually meaningfully query for values in different units. And we added a bunch of new visualizations. But the best thing, I think, which still needs some work but is already helpful, is a new query builder which lets you click through a query for people who don't know Sparkle or who just want to change a part of a Sparkle query. There's also going to be a booth where you can test a prototype for a similar system so we can use that in the future to make it easier for Wikipedians, for example, to create queries that will generate list articles on Wikipedia, for example. So if you are interested in that, please go to the booth for that. And of course, you're all doing amazing queries. Thank you for sharing some of those on Twitter, for example. Here, plans that have an emoji, very important. Useful information to have in Wikidata, clearly. Or here, space crafts and missions and what they've been named after. Or paintings by Mimir that have a map in them. Or here, most general neutral given names according to Wikidata. Kim apparently is high up on the list. And this was very interesting for me, most disputed properties, meaning which properties most often have a disputed by qualifier. And apparently, country is only third. And I think this has to be my favorite. People flew seeds around the moon, planted them in all kinds of places in the United States and I think Brazil. And of course, we have them in Wikidata. All right. The next thing I have is the article placeholder. The article placeholder is one of the things that's very close to my heart because it is one of the tools that helps us support smaller languages and give them more support through Wikidata, which was what Wikidata is all about. And we've improved that, for example, by making it show up in search engines, allowing you to translate an article from another language if it is available, and so on. Then we worked on improving the integration in Wikipedia. Specifically, we have this prototype that should, in the future, help people edit Wikidata directly from Wikipedia. And if you want to try that out, we're looking for feedback. Talk to Charlie. Where's Charlie? Yes, there she is. Talk to her. We love your feedback because it's really not trivial to do this in a way that is both understandable for the Wikipedians but also does justice to our data model and everything around it. Another thing we've done for Wikipedia support is making our changes show up in recent changes and watch lists. In the past, it has only been available in one of two versions of those, and now it is available for everyone. And we've shown more information about which items are used on a given Wikipedia article, for example, in the page information to make it more transparent where data in that article is actually coming from which items. A big project is structured data on comments that's coming up. And Sandra will talk more about that in another session. Where's Sandra? Here, yes, her. But to make it short, Wikimedia Commons is a treasure. But there's so much amazing images, videos, and so on in it. But it's so hard to find anything. Just yesterday, I was looking for pictures for my presentation, and it was so frustrating. With the support of Wikibase and Wikidata, I hope we can make that much better over the coming years and make it much easier to find stuff on comments, also for people who do not speak English. And the other big venture we're going into is lexical graphical data. So data like you would find it on Wiktionary, for example. And I will talk more about that tomorrow in a session. And I would love to see many of you there, because I think it is another important point how Wikidata can support more projects inside Wikimedia, but also open up a lot of opportunities for projects outside Wikimedia. Wins and challenges of the past year, as I see them. We had a bit of, actually, I believe, quite a lot of struggle with coping with the growth of the content and the community. And this is why I'm so happy to have many of you here so that we can talk about some of those in person and see which processes, for example, we need to change and adapt now that Wikidata is much bigger than it has been a year ago or two years ago. Another win was that Freebase's ABI was shut down and with that Freebase is gone. Yes, I know. Sorry. No, I'm very grateful for everything Freebase has done before us, but it's also good to, as Sunset projects, when others can serve that need better. And I think it is a great thing that Freebase has decided to say, yes, Wikidata is the place that can now serve what Freebase has done before. On Wikipedia, the person data template and many other like it, on English Wikipedia and other Wikipedia's are deprecated or in part gone completely. So people on Wikipedia decided that Wikidata serves the use of that much better than they can right now. And we're here for WikidataCon. Some of the things that in terms of partnerships and that I thought were really great over the last year was, for example, the work Mapbox and OpenStreetMap have done on Wikidata, the work that my society has done around politicians, the work of the GeneWiki team and everything around it, and of course, content mind who are doing great work. Thank you very much. We've also seen quite a bit more use in Quora, for example, who are using our ontology to improve their topic tree or your wings who are using Wikidata's data in their inflight app. Yes, it's very cool. And unexpected. Or Yle, who is the Finnish broadcasting agency and is using Wikidata's ideas to tech their content in a language-independent way. People have been building new tools. For example, Libreview, which is a website where you can publish open, freely licensed reviews of pretty much anything. Books, newspapers, movies, what have you. And you can use Wikidata concepts to describe what you're reviewing or to identify what you're reviewing. There were games. And I think the coolest one is probably Gesser, which shows you a picture and a map and asks you where on this map was this picture taken. And then gives you points based on how close you were. It's actually really hard. Or Stadlandfluss, Stadland Wikidata. There's a German game called Stadlandfluss, city, country, river. And it gives you a letter and then you have to find a city, a country and a river starting with that letter. For some letters, this is really hard. And this Stadland Wikidata game does it and checks your answers based on the instance of statements in Wikidata. Or here, Skolja, which shows you information about institutions, scientists, publications, what have you based on the data that is in Wikidata. And there will be a presentation about that tomorrow as well by Fin today. Very good. Or here, Monumental, which is an app that takes the data in Wikidata that it is about monuments and presents them to you in a nice way with links to Wikipedia, puts them on a map and you can see pictures of it if there's pictures available on Wikimedia Commons and so on, which is very nice for planning your trip to a new city, for example. Or here, that is a tool that helps us better understand the gender gap on Wikidata, but also on Wikipedia. So based on the data that is in Wikidata, it will give you statistics on the gender of what Wikipedia, for example, is writing about. So what you can see here is that we have 5.9% articles about females who were born between 1700 and 1700, 09. And there's a lot more of those statistics and it really helps to see where exactly our gaps in the content, in Wikidata, but also on Wikipedia, is making it easier to counter the bias we have there. And of course, this was the point where I said, okay, we made it, Buzzfeed used our data to write an article that 2016 wasn't actually that terrible. All right, what's next? Data quality is an important topic. It comes up again and again and again in discussions from Wikipedia, for example, having the need for many more references for the data we have before, for example, English Wikipedia is able to use it. And I think we have to increase the data quality in many, many ways. One of them is getting a lot more use for our data because only if our data is used is it actually going to be kept in a good shape. So if you're working on tools that use Wikidata's data, awesome, please do more of that. I also think we need better tools and processes, especially processes around importing data, like making sure it fits with what we have, figuring out how that data is probably going to be used and stuff like that. The primary sources tool is getting an uplift right now to then make this a really viable tool for importing data into Wikidata with a human in the loop. I think we need to work on better feedback mechanisms both with people who give us data, but also with people who use our data so that they can tell us where they find issues in our data and then we can work on that. Yes, but with the amount of data we have now, we really need to do more automated checks like Fabian has been talking about for Yago. There's no way we are gonna be able to grow Wikidata without more of that. Simply because it's too much change that is happening on Wikidata. I was talking about the constraint checks that we have improved already and I think there's a lot more we can do. And another thing I have been thinking about for a while is signed statements. So that institutions, for example, who give us data can actually cryptographically sign a statement and say, yes, this is the data we provided to you. And it's easy for us to see when that value was changed and the reference no longer is correct. Then another big area of work that I think we need to focus on is making Wikidata more useful inside Wikimedia. So that's, first of all, working on better relations with Wikipedia editors, but also editors on all the other projects. Second, making it possible to edit Wikidata directly from Wikipedia and then support for Wikimedia, Wikimedia Commons and Wiktionary as the big projects that still need a lot of support from Wikidata. And then there's making Wikidata more useful outside Wikimedia. And there, I think we need to work on closer relations with the people who give us data, the people who use our data and as I said, get feedback from them on our data. But also making Wikibase software powering Wikidata easier to use and install. And with that growing ecosystem around Wikidata, for a simple reason, I don't believe that we want Wikidata to be this tree in the desert. I don't believe we should be the only ones doing open data and everything around us is nothing. Instead, I think Wikidata should be very central and a very big player in an ecosystem that links many, many similar projects together. And I hope we can see more of that over the next year. This was Wikidata one year ago and this is where we are today. Thank you very much. Are there questions? Yes, there. Okay, thank you very much. Here's from Pamel Wikimedia Belgium. Actually, I have two questions. I will put them together. Did you think about making constraints checking mandatory proactively? I mean, if a constraint is violated, why do you allow creating the statement? And then a second question is, do you think to make statements, marking statements as being doubtful just by clicking? As being not true. So then a user can indicate, I believe that the statement might be wrong or could have a data quality problem. Right, so your first question, do I think we should prevent input that violates constraints? No, for the simple reason that there's weird stuff in this world like this woman, Mary, in the Eiffel Tower. Yes, exactly. And things like that exist in the world and Wikidata is built to be able to handle that complexity. Or earlier Fabian was talking about this check that says, okay, a person probably shouldn't have died before they were born. That makes sense. Unless of course they are a time traveler. And I'm sure we have someone, Wikidata. So in that sense, no, I don't think so. But what I, where I want to go is make it easier for people to see when they enter data that would violate a constraint. To make them think like, is this really what I want to put in or is there a mistake? And I hope that that will already get us very far. And your second question was about indicating that a statement is wrong. Or wrong or doubtful, yes. A large part of that is handled, I would say by ranks, so that you can say this statement is deprecated, for example, for something that is no longer considered true in general. There's also the disputed by a qualifier where you can say, okay, the statement is disputed by this person, this organization, this body, and so on. And I think that already gets us pretty far in modeling. I'm curious about the massive increases this past year and specifically the last few months. Could you point at some known or suspected culprits? Yes, I believe I can. It's a lot of importance around scientific papers. Do the people who do that want to say something about that? Wait, wait, wait, wait, wait, wait. Wait, they can't hear you. Okay, well, that is part of the Wikisite project and we're happy to answer questions right now if they are short and we're going to give a talk on this later today, round five, I believe. So you're happy to come and then ponderize with more questions there. And you're also all welcome to contribute whether today or later on. Yes, so please come to the Wikisite session. A large part of these spikes is coming from there. And the spike in labels, you asked about that. That is the same. So a lot of those scientific papers, for example, have a label in one language. So that's why we're seeing the spike in labels or items with only one label. More questions, yes. Well, I want to add on why we should not restrict the constraint violations. One of your slides, quite some of the properties I proposed appeared on is one and it was expressed as being a disagreed property disputed by. And so in this specific project, we use a disputed by in statements that have both citations that agree on it and that disagree with it. So in that case, the disputed by is not to say this statement is incorrect. It's only there are multiple views that there is disagreement within the community. So when you would actually go into saying be restrictive on, you don't allow it, you actually go against, in my opinion, to the notepoint of view that you stick to one view. I think that's, yeah, I want to make this point here that we should have in wiki data the different possibilities and views of the data. Just a point. There was another question here. What kind of transparency is there on automatic checks so that people can see what the rules are that are being applied? Right, so all the rules are public. If you go to a property, for example, instance of, then at the bottom of it, you will have a constraint section that has all the rules that are being taken into account. So anyone can view those and unless they're heavily protected, also edit those properties. I have a question about wiki base. Yes. So when it comes to encouraging people to use wiki base, do you think the need to, on top of installing media wiki and not having the sparkle that need to use Lua to have full functionality of everything is a major constraint for the growth of wiki base by other people? Yes, I believe so. Which is why this is something I want us to tackle in the next year. Because right now it's way too hard to set up your own wiki base installation. And if you want the query service on top of that, it's even harder and it shouldn't be like that. Yes. Excuse me, I have a question. Is there some investigations which are done to increase the performance of wiki data? We have a lot of time out querying when we want to have some query results. And I think- Are you talking about the query service specifically or? No, just about increasing the performance to, like to solve that such a problem like this. There's always work on that. But if you tell me what specifically, do you mean performance of the query service or performance of- A query service, for example. Okay, then the person behind you is probably the right person to answer that. So, yes. But basically because of, we have hardware that is not changing too much, at least not very fast. And we have growing database. Some queries will always be slow. And given that this is a shared service, we have to put limits. Otherwise, other people would be hurt by it. What we are looking into right now is to basically have public and internal service and split, basically queries generated by internal wiki data mechanism like constrain checks, bots and so on, and users queries, which will give probably both more space to play and run every query. But I cannot really promise that all queries would be served in time. Some queries are just too expensive. Right. Like the new service probably a year to a year. And in general, if you have some queries that you believe are slower than they should be, talk to me and we'll see either maybe queries bad, maybe services bad, maybe there is a bug that can be worked around. Talk to me. All right. Someone was quicker. So, hi. There are two questions from remote. I'm reading them from the Atherpad. Is there a plan for a mobile interface, for a better mobile interface, or at least for a responsive interface in the future? Right. Unfortunately, no concrete plans at the moment. My hope was that when we work on integrating Wikidata editing in Wikipedia, that we redo a lot of our input widgets, which is the big part that's holding back better mobile support, and go from there. Okay. And the second question is, can we manage the way to separate the identifiers from the actual properties? This is a question from remote, eh? Separate how? They are already separated. Yeah, in a sense that they have another letter identifying them like C or I instead of B for property. C for catalog or I for identifier. Okay. That's the first time I hear this request. If that person who asked for it would explain a bit more, then we can talk about it. Yep. Just out of curiosity, is this map represent the number of it? This is how many or the lighter part of the map is, the more items on Wikidata have a geocordinate in that area. Okay. Okay. Is this published anywhere? Yes. On comments. Okay. There is a category Wikidata visualizations or something it's called. Hello, dear. You made the point that we have a very large number, I think it was 20 million items with a label in only one language. Yes. We could reduce that number greatly if we used a bot to copy the existing label to other languages for things where the label is common like people's names and the taxon names and the names of rock bands, particularly for where they use the same alphabet such as the Western alphabet. Obviously that doesn't apply to Arabic or Japanese or whatever. Right. There is some resistance to doing this. People say, oh, the software should handle that, but I don't see any indication that the software is going to handle that. It is. So do you think it will or do you think we should use a bot or do we still have to wait for people to edit them manually? So the Wikibase already has language fallbacks. So if you have, if you look at an item in Swiss German, for example, and there is a German label, then it will show you the German label also if you look at it in Swiss German. And so there's a whole chain of how these languages fall back. Everything at the end of that chain is English. So if we have an English label that at least helps some. So for the case of copying, if it's the same, I don't think that is necessary because language fallback should handle that if it's put into the English label. Where it makes sense, right? Yeah. Because the stats I showed on the one hand it is set, but I also want to not encourage you to game this in the sense that just copying labels to get up these statistics that's maybe not the best thing to do. But like where people really don't know what an item is about because there is no label in their language or any fallback language that they understand, those are the cases that we need to fix. Yes. Another question. I hear. I think trust is important in the new internet world. And my feeling is that we get a lot of external sources. Some are good, some are less good. Shouldn't we have some kind of ecosystem where we regularly check data for some facts with the sources we trust most and then document the difference or see if a change has done, which we don't like. We totally should. Yeah, and then I would like to see it so that if you have an info box up in Wikipedia, you can see that this is from a trusted source. As a reader, you shouldn't understand all those external identifiers. You need to understand some. I think this is better than... Right. So there are some bot runs already by I think Magnus and potentially also other people who are doing these checks and then adding references for those external sources. And I think that's one area. But the other area is automating that finding of differences. So you take Wikidata and you take another database like that of the German National Library, for example, and you compare the data and then tell Wikidata's editors and the German National Library about differences. So people can look into, okay, who is wrong here or is this a legitimate difference and can fix it. Yes, that is definitely something we should have and that we will be working on. I don't know yet when we will get to it. All right. Charlie, you mentioned in one of your answers, input widgets as a thing, but not in your description. Could you explain what that is? That sounds really interesting and important. What input widgets are? Yeah. Okay. Basically just a little thing you see when you put in a date or a URL or a link to another item, that thing. It has a box. Yeah. Sorry for using jargon. Hi. I'm Harmonia Amanda and I wanted to answer the question about Label because we actually have a project about names and it's a really, really complicated meter and we can't copy Label from one language to another without having a clear understanding of what the difficulty are and we are actually doing it. We are a team doing Label copying work slowly because we are doing it cleanly and please do not ever gamify that because we are not enough human with the understanding of the complexity of the situation to clean up after bad game understanding. Understanding, so please do not do that. All right. I wanted to ask what's about scale. So I'm from the cultural heritage sector and virtually Wikipedia is the perfect platform to document all our cultural heritage with all the context and so on statements about history and so on. And if we guess that we have maybe, I don't know, 40 million works of art which are relevant or like in museums and so on. So there might be then even 10 times more if we start declaring a single chair as a work of art or whatever. So that's a really large field. So what are the perspectives in this or what are your thoughts about this? I can tell you my thoughts. Yes, okay. How far Wikipedia and Wikipedia could be such a platform which can then be a wonderful thing for everybody to reference on. So we all talk about the same thing and make our statements about. So I can tell you what I think, right? So there are two things to take into consideration for me. And the first thing is just pushing the data into Wikidata is all nice and good but what is it gonna be used for? Who is gonna use that data to build something, to build an app, to build a visualization, a website? What have you? Or who is gonna learn something about the world through that data? And the other thing is technical and social scalability of Wikidata. Technical is something I and my team are working on and always have to work on because you're all pushing us to those limits. But what I'm much more worried about is the social scalability of Wikidata, right? With the people we have and the amount of data we have, if we want to grow the amount of data we have, we also have to grow the amount of people we have taken care of that data or and give them better tools to do that. And I think that is something we really need to work on. All right, thank you very much for all your questions. We are running out of time so if you have more questions, please ask after this session in private. Thank you very much. Thank you.