 Hey everybody, I'm Dario. I'm the director of research at the Wikipedia Foundation and I want to give you like an overview of something We're doing that. I think not many of you may be familiar with With the unambitious goal of building a Wikipedia scale CC0 license collaborative creative open source knowledge graph to support open science and a bunch of other fun things So first of how many of you have heard of wiki data? Good, okay. Awesome. I'm gonna give like a short intro to what we did that is trying to do and what it's about I'm sure you've all seen infoboxes on Wikipedia They're great. They include a ton of machine readable structure information About anything you can name, but they're really really hard to query and to export and translate to other languages This is the info box for Marie Curie And in the English version of the Of the of the article and You'll see example here of all the tuples that have been associated with this info box of the provenance for these statements, etc Etc. And we have one of these infoboxes for all the nearly 300 languages. We have Wikipedia It's a freaking nightmare to maintain to curate as you can imagine. So We get that there really was born with the idea of centralizing or the Maintenance and collaborative creation of these tuples about pretty much any entity you can name not limited to Authors or human beings, but pretty much anything you can find on Wikipedia article and beyond this is an entry in wiki data about Marie Curie and It looks roughly like a Wikipedia article, but you see that all the country structure. So it's really like a gigantic Tuple store allows you to collaboratively edit properties for this entity In a nutshell wiki data is to data what we do is to text all the data cc0 anyone anybody any machine can contribute to it It covers pretty much or it aspires to cover all domains of knowledge It's fully version control and collaborative It has a tight integration with the semantic web via our dfdoms and open avis It has a high performance Hopefully high performance query engine in sparkle It is stable in the sense that it's not tied to any short-term funding cycle is there for the long term It's being actively developed by the wiki in the Deutschland chapter and there's a very active community So currently the fastest growing and just to give a sense of like any of the item I guess full support for multi-lingual labels So this is the entry game for Marie Curie You can see that we have labels and aliases and descriptions across a variety of languages all of these contributed by the community members The atom of of wiki data is a statement which represents typically a tuple and it can represent Providence through a variety of properties including conflicting provenance statements We have a massive coverage of identifiers and in fact to me this is the most exciting meaning of wiki data is that it acts as a the universal glue for knowledge basis by providing A growing body of identifier mapping across catalogs in all joys So if you do have a catalog or an open data resource want to plug into wiki data, please come and talk to me We have many data sets there are Complete and very well curated. This is just an example where you can find in wiki data in the area of biomedical content all of these have been created by communities of biomedical experts and bio curators and The second part of this I can talk is about a subset of items you can find today in wiki data Which are about Creative works so you can find a ton of stuff about wiki data about people buildings artworks events Molecules and whatnot, but it's also a sizable part of wiki data. It's about creative works And I'll talk about that in the context of the week is that we decide is initiative built on top of wiki data To create a structure bibliographic repository of siteable sources as structured data And the immediate goal is to create this as a as a way of supporting the the work of volunteer communities are doing in wiki media projects But we think that's some broader applications that I want to talk to you about Here's an example of how we can represent that in wiki data The fact that zika virus identified by that q number up there Has as its national reservoir a specific species of mosquito but this information is stated in this paper and You'll see there's additional metadata for this paper that states Who funded this piece of research business research funded by the city scene Where it's been published and who the publisher is and so on and so forth. So this gives an idea of what kind of a Power the system has once it's uh, you know reaching large scale to allow you to slice and dice when you want to know What is the institutional provenance of a very specific piece of research a system like this with its APIs and sparkle queries will allow you to answer these questions Something that today is fairly difficult to do outside of the narrow domain of demographic metadata, right? Um, I think I'm gonna skip this So this is like an overview of where things are in terms of like size and growth We currently have a 90 million items in wiki data to represent creative works. That's about 40 percent of all wiki data So pretty scary number We also have a growing number of demographic properties are being used for example, uh, 75 million author name strengths extracted from papers 2.6 million individual items about authors Many of which with with orchid and other types of identifiers Other types of identifiers. We also have a growing number of descriptions of what a paper is about that are being both generated by machines and humans. I think it's super exciting Um, of course we the core use case was creating database for wikipedia. So we currently have a Old scholarly article cited across languages and wikipedia in wiki data that are queryable and and analyzable in the system You're a broad coverage of scholarly journals. We have about uh, 42,000 items, uh, accounting in wiki data We have a growing citation graph right now. We have about 90 million citation links Extraterrestrial literature represented in the same way using statements We also have curated corpora of the bibliographic literature as an example of What we think is a complete annotated a scholarly publication corpus on zika virus So it's been it's a fairly small corpus What allows you to visit explore like I said, uh topics and authors and institutions contributing to research, uh on zika virus on a basically daily basis And for those of you who are into nano publications, this is going to give you a sense of what you can do With wiki data, you can track the impact of a scholarly paper that's very granular way These are all examples of statements, uh from a specific paper that's linked down there That represent basically the bits of information that are included in that paper And finally, um, there are many potential applications for other open open source tools. This is the way in which, um, Hypothesis annotation is embedded, you know in the propoena statement for a wiki data Item about paper to represent the specific part of a paper where that statement has been found And finally, there are open source applications being built on this. Uh, scholarly, I think we're going to have a demo by Daniel We can figure out today. So I'm going to leave this for later What is this going just rub this up? Um, we have a parent initiative that we can be a foundation called knowledge integrity Is it trying to really build the uh citation citation foundation and infrastructure for open knowledge? And would be the projects you would be doing a bunch of the open data releases recently in in in that direction We have a bit of a roadmap discussion moment trying to figure out exactly what we're trying to build and uh, Who for um, so there's this document that I really encourage you to go and check out What we're trying to see if we're building a database of sources for wikipedia A platform for this custom bibliographic corpora or maybe something beyond that just close up the graphic comments We haven't figured it out yet and we can use some help And finally, uh, a reset is also an event, uh, that happens every year It turns out the next conference is going to be in the Bay Area actually here in Berkeley Between the 27th and the 29th of November and the application form should go live today If I managed to do it here in the lunch break. So Thank you