 Hello everybody. My name is Dario Tarabarelli. I'm the Wikimedia Foundation director of research and I'm here today with Daniel Meechan and Lydia Pinscher to talk about initiatives that we've been involved in. We all deeply care about citations and verifiability. If we've been busy for the past three years almost building an initiative to make citations suck less in Wikimedia projects. That's pretty much what it's about. So the primary reason why open knowledge works is that it's verifiable. Every single statement that we can find, Wikipedia article, Wikipedia statement fulfills its function because it's backed by a number of vetted, reliable sources that our communities have been able to contribute with contents. And Wikipedia wouldn't be what it is today with its extensive coverage of citations and references. And that doesn't only apply to encyclopedia topics. It also applies to current news. For example, we have extensive coverage of news articles to make everything we read on current events as verifiable as possible. And the same of course applies to topics of scientific relevance. We have entire communities of volunteers and researchers who are basically carefully annotating Wikipedia articles on any topic that you can think of with the best possible literature and review articles, et cetera, et cetera. So basically for Wikipedia and Wikidata to fulfill their missions, every single entity you can think of, every single topic you can think of needs to find the best possible sources to back this information. So there's a debate mounting on how to fight misinformation. As of today there are apparently 5 million hits for solutions to fight fake news. Wikipedia has cracked the solution to this problem 15 years ago. So a combination of policies around sourcing and processes to vet this information today represent the best solution that we have to come up with information that people can trust. And so building tools and repository of sources, you'd imagine is something that as a movement started out 15 years ago, the story is slightly more complicated, right? And I'm going to give you a quick history of efforts that we've been putting together over the course of the years. So as of 2017, this is mostly the state of how we do references and citations in Wikipedia. So what's possibly the most important ingredient of open knowledge is still served by this efficient but fairly rudimentary mechanism of a template with semi-structured metadata about each source. And if you think about it for a second, every single ingredient of a Wikipedia article, be it media files, categories, links, has been supported over the years by dedicated technology. We basically build infrastructure and tools around any other type of element of a Wikipedia article. When it comes to citations, with the exception potentially of cytoid and some exciting new developments, there hasn't been much technology breakthrough over the course of the years. And the idea of building some kind of centralized solution to support the sourcing efforts of the community actually goes back to 2005, a year in a structured format. There's a long history of attempts to build a central repository of sources to serve Wikipedia projects at large, but the reason why these efforts didn't really take off is probably because up until recently the social and technical infrastructure that we needed to build these efforts was quite there yet. Now, fast forward to 2017, we have an answer to these problems we believe thanks to Wikidata. So Wikidata today has division, technology, the community, scale, the licensing model, and the independence to build this notion of a central repository of sources to serve all human knowledge. So Wikidata doesn't only provide an infrastructure to build this repository of human knowledge, it provides an infrastructure of structured knowledge that both humans and machines alike can contribute to. And if you think of citations and source metadata, they are by definition structured data. So every time we represent a source, we think about its bibliographic record, its author, title, identifier. So all of these data is by its very nature structured data. So there's a natural fit between what we're trying to do here and Wikidata as an infrastructure. So in 2016, we started Wikisite. And Wikisite is an initiative that is trying basically to make this thing happen. The idea is to build a universal repository of sources to serve human knowledge, leveraging Wikidata as its infrastructure. And the vision of Wikisite is pretty much the same as the original 2005 idea. So if we build this infrastructure, we'll be able to design better workflows and better processes to allow us to discuss and analyze and curate and vet and reuse ultimately all these sources across Wikimedia projects. And what's really powerful about Wikidata is that it allows us to represent the granular relationship between a statement, a piece of knowledge and a source. And by doing that, it also allows us to connect each source to all the information existing Wikidata about that source and its authors and the outlets where it's been published and the statement itself to all the corresponding entities. So this allows us to build this network of how sources and knowledge relate to each other, which I think is really the powerful value proposition of Wikisite in this context. So think about it for a second. Being able to represent the connection between a source and a knowledge statement will allow us to answer questions such as what are all the statements that are citing a New York Times article or what are all the statements that are citing a journal article that was retracted. Doing this at scale is something that Wikidata allows and has not been possible in the past. And similarly, if we think of applying the same approach to Wikipedia, we can start thinking about how to improve the quality and the process around sourcing information in Wikipedia and also study the ways in which Wikipedians contribute sources to Wikipedia articles. I really like this quote by Egan. Hey Egan, I don't know where you are, should be here. He's been using this notion for a while that Wikidata is really the provenance engine of information and he tweeted some point two years ago that in five years time the verb to Wikidata will mean to look up a fact with literature provenance. I think it's a very powerful way of capturing what we're trying to talk about here. Think about the idea of having a specific statement in Wikidata. For example, the fact that Zika virus has as its natural reservoir a specific species, in this case it is Hensili. So that specific piece of knowledge can now be traced back to its provenance. It can be traced back to the article that was published. It can be traced back to whoever funded that piece of research. It can be traced back to the authors, the outlet, the publisher. So in a nutshell, Wikidata allows you to represent the entire genealogy of the very specific piece of statement by connecting again a statement to its sources. And a few people have also suggested that once we have this extensive coverage of sources and how they relate to statements, we can start thinking about this notion of a computable trust. We can start doing analysis on whether there are biases or quality gaps in citations and get to a point where we can provide humans and curators of knowledge better ways of understanding whether the sources they are using are accurate or not or they should be maybe looking for different types of sources. So in 2016, we brought together a pretty small group of enthusiasts including librarians, Wikimediants, Wikidata contributors, researchers, software developers, trying to figure out what this thing could look like. It was a very preliminary effort to figure out even the scope of this initiative. And this year, we hosted a much larger three-day event. We had nearly 100 attendees from 22 countries. We hosted 16 formal presentations. We had 17 work groups at our summit, 38 lightning talks, 20 hackathon demos, where the community started to take off around this effort and it's really exciting to see how much has been produced by this group. So there's been a lot that happened and I want to pass it on to Daniel to give you an overview of the many highlights of what has been happening in the movement over the course of a year. Okay, thanks Dario. Yeah, so one way to look at it is just by the numbers and plotting them over time. Some version of this was already in Lydia's talk. So about one quarter of Wikidata is now publications. Of those, there's lots of different kinds of publications, but the majority is essentially articles. And what does the Wikisite community do? Well, lots of things, but a very first step for most of those things is actually try to think in terms of data models. So for a book like The Origin of Species by Darwin, we think about how we could actually model that in Wikidata terms. So it has an author, it has a publisher, it has a number of pages, it's part of a genre and all these kind of things. And then we first, like everybody here, experiment on a small scale and for books this is still happening. We don't really have a fixed data model, but once we have done our own experiments with that, we talk to others. There is a Wikiproject books on Wikidata, for instance, where the different aspects of the data model are being discussed. And then we always keep in mind the other Wikimedia projects. So what you see here is actually screenshots from Wikisauce where Darwin's text is there and he cites Goethe over there and Humboldt. And I had to bring this up because they both appeared in our keynote, opening keynote. Yeah, and together with Friedrich Schölle and Wilhelm Humboldt. So yeah, Darwin cited Alexander von Humboldt, which is this one. And I have to mention this because it's in Jena where I live. But the point here is for an image we can say it depicts something. And so it depicts, for instance, those four people. And they can also be related to literature because they're all authors of things, many of which we have in Wikidata. And Darwin cites them. We call the initiative Wikisite because citing is one of those things you can do with bibliographic items and that the bibliographic items do amongst themselves and so we can represent that within the Wikiverse. We don't have an equivalent for texts that we could use to say something like the text is mentioning this. So the way Darwin is mentioning Goethe and Humboldt. There are also things like he of course refers to them in his English text in his English translation, but isn't very specific as to which translation of which German version it actually was. So some basic stats. We have a very detailed template here that has many dozens of properties that fall within the scope of Wikisite. Just to get an overview, we have 800,000 statements of author P50. We have 40 million statements of the author name string that is confusing for people who are not used to work with this. But if you are just looking at a book what you see there is an author string. If you go to a typical database with references what you get there is an author name string. If you are lucky they might also have an identifier and for Wikidata the identifier would be the Wikidata ID. In other contexts it might be an ORCID and WIFE and all these other kinds of things. But since we get them from other places usually as just the string we then need to disambiguate them. Smith J, John Smith, Smith comma J into that particular John Smith or Josephine Smith or so. And so the conversion from those 40 million to the of 2093 to the P50s is going on. And yeah this will probably take a while and we will almost certainly never be complete. On the other hand Wikidata might actually be the mechanism that brings us closest to completeness in terms of author disambiguation. Because it encompasses in scope many of the other databases of literature which are more focused on specific topics. And so integration with other initiatives also centered around author disambiguation is very important. We also are well as we heard a number of times Wikidata is an identifier hub. So in the space of citations bibliographic metadata we have identifiers for books like ISBN which actually comes in two flavors. Here 29000 and we have identifiers for scholarly articles for instance like the DOI digital object identifier of which we have almost 7 million. And then we try as Dario showed in the example for the Zika we try to link those things all together. So for instance we try to annotate the articles as to what subject they are about. So articles about the Zika virus are tagged as being about the Zika virus which then constitutes the Zika corpus with which we can then do certain things. It's now becoming a guinea pig. It's interesting that a virus can become a guinea pig but on Wikidata it can. Yeah so how does that relate to the other Wikimedia projects? So one thing we've done is we've gone systematically through the Wikipedia's starting with the English but then also reaching out to the other Wikipedias looking at which things are being cited on the Wikipedias that have a persistent identifier. So those strings that don't make sense to any human at the end of a citation. They're very useful for automated processing so we looked at which of these are actually cited from Wikipedia and then for those articles we imported them or we tried to set up a Wikidata item. For most cases this actually worked but there are some problems still to be worked out. And that's within the scope of those initiatives that for instance we were doing at the Hackathon and so on. All these demos they had focuses like this. Yeah this is from a slide earlier this week there was another conference in Berlin where let's call it an initiative that is doing something similar. Scopus is one of the largest scholarly databases. They also provide citation information. They have information about the journals in which the scientific articles are being published. Okay well let's survive with science which is a competitive scope. Well there are other initiatives it's not just Wikidata doing this and so for scale they are now indexing 32,000 journals. Wikidata has 42,000. They are focused on English so is Wikidata but is less focused on English. Yeah there is a property that can express that Darwin cites Goethe or basically the specific work by Darwin is citing that specific work by Goethe. And this property has been used 36 million times from which we can then build a citation graph of which you have a very small example here. And by bringing all these things together this is again the Zika corpus. We actually have a Wikiproject Zika corpus that's linked from here. Yeah the slides have been shared and tweeted here just by the way. And then simply by looking at the topics well if any individual article item is annotated as to which topics it is about. And if it's more than one then you can look at whether any of those come in pairs more often across a number of articles which is called core curing topics matrix. And then from that you can actually discover relationships. Maybe you're interested in reproductive medicine and then of course you had a mosquito. What about breast milk? What's the effect of the virus on breastfeeding? Things like that. So this is the kind of network of topics that are related to that Zika virus. You can also slice the information in different ways and that is now the last two things were screenshots from a tool that is being presented in more detail in the next session. Here you can slice the information by what are the core authorships. Just the same way we did it with the topics you can look at which people are actually publishing together for more than one article. Or you can look at the wards that people are getting and then you can look at where those people are actually. Are the institutions located that the people are affiliated with that get a particular award. Or and that comes back more or less to like the very reason for wiki data to exist or wiki site to exist. That's the referencing like which statements in wiki data are actually supported by a given reference. So here you can plot it and in academic circles the impact of a particular publication is usually measured by means that don't make any sense if you look at them scientifically. But here you see this paper has lent support to eight statements in wiki data. That is a very useful indicator actually of impact. If of course the inverse doesn't doesn't work like if a paper has given no support to any statement wiki data it may well mean while wiki data is just incomplete which it still is. Or if there is a paper that has thousands of statements support that it supports on wiki data well then it might actually mean it's really impactful right. But it doesn't mean that it's the only paper that could support the statements just the one that was used by this particular editor. Another tool that we haven't spoken about too much is actually annotations. So here is an example if you click on that you would come to a paper that is annotated by way of hypothesis and so someone has annotated human text. And then put in the link the annotation has a persistent URL put in the link as a source for a statement and the statement here is that in schizophrenia the expression of real in so we're on the item of real in here is reduced. If you don't fully understand that that doesn't matter too much the important point is we can actually basically harvest the reading process if you're reading literature you're probably making annotations. If you read them digitally you should make those annotations digitally and if you make them digitally you can make them open. Then you can give them in a URL you can use them as a reference to statements and wiki data. And not just you all the others that are watching our life as well right yeah and in order to support all this we are reaching out to a number of other communities so. Let's focus on one initiative here which is the initiative for open citations which builds on work that many others have been doing in this space but has been catalyzed in part by wiki data and wiki site. The idea here is that there's a lot of information contained in the network of citations between different let's say scholarly articles and other publications and so far information about these links between the different publications wasn't available under an open license. And so there is now an initiative that focuses on making those relationships available under an open license and also machine readable form it was already machine readable just not open license. And yeah we're talking about this to different stakeholder groups quite successfully although we're still far from completion especially some of the big players are not yet part of that initiative initiative. And yeah we're also working on technical collaborations so for instance if you have part of what you read apart from annotating it you probably keep the metadata somewhere in some sort of reference manager. If you're using Zotero you can basically translate your library into wiki data terms. And yeah we're collaborating with a number of other initiatives content mind there will be an additional talk in this room later on. Yeah I can't go into all of them but they're all valuable and we do different things with them that's part of those what was it 40 or so different demos we had at wiki site. Not everyone could be at wiki site yeah it's this work wouldn't work without the sponsor so we also have to write proposals and reports about the events that we do. Reach out to sponsors. In the meantime we have a lot of fun. Yeah and there but there are challenges so and I leave the challenges to Lydia. I get to be the one to talk about the challenges awesome. All right so what Dario and Daniel talked about is pretty amazing and important work. I think Dario made that very clear how fundamental citations and everything around it is true to what we do. Still there are some challenges that I want to talk about. One of them is how do we represent all the different publication types that are out there and how do we model them. Just to give you some idea we're only scratching the surface on properly modeling all of those what we probably want to because all those are important for citing knowledge. Then the next one we have. I talked this morning a bit about a steep increase in the number of items we have. Coming from wiki site and in order to be able to deal with that we also need tools and processes to do that. One of the challenges we have is figuring out how those should look. What kind of tools do we need? What kind of additional changes to existing tools do we need? How do we need to adapt our processes to make that work? Here's just one of those tools that exists called SoxMD that's being used to add data to wiki data if I understand it correctly. Number three, we need citation data but really understanding what kind of data we need and what kind of data we already have is a challenge. We need to better understand what kind of citation data wikipedia for example needs and how we can serve that and where we go beyond that. There's for example this here in English wikipedia there's three million citations with an ISBN and 90,000 in the Dutch wikipedia and in wiki data right now we're at 30,000 items that have an ISBN. So there's clearly a gap and analysis like that is what we need to do much more and much more in depth. And number four is something that popped up recently is better understanding and better working together with the other wikipedia projects in how they're going to use that data. It's great if we have that data in wiki data and make it available but if wikipedia doesn't use it then we have a problem because then we're losing out on a lot of the benefit that the work we do should have. Just a recent discussion for example on English wikipedia was about the template that is making citations from wiki data items available in English wikipedia. Alright and I think with that we open the floor for discussion. Yeah so basically we have a few final pointers about how to get involved and what we're going to do next but basically we wanted to hear your thoughts. This is the first time that we present this at an event with a larger part of the wiki data population, contributed population. And so we want to hear from you if this makes any sense at all, if there's something like some burning question you have and we'll continue the conversation after this talk of course. And I will hand around the microphone. Is there a notion of statement that's larger and more complicated than a wiki data statement? Like for the fact checking use case at the start, the moon is made of blue cheese. You could break it out into a triple and put it in as a fact into wiki data but that's not really the core of the problem. I'm trying to repeat the question. Is the current granularity of wiki data statement sufficient for representing something for example fact checkers will need to do their job? That's a great question. Personally I feel like we should get to a point where the combination of properties and qualifiers will allow us to represent anything. But we're not there yet. So probably the answer is it depends. It depends on the domain. There are some areas because of people who've been doing bio curation. You can do fact checking on proteins and genes in a very granular way for topics in current news, maybe not. So maybe we need more properties and more qualifiers. I guess I wasn't always sure when you said statement if you meant statement in the colloquial sense or in the pure wiki data triple sense. I meant in a formal sense wiki data but maybe that's me seeing the scope of statements in wiki data as covering eventually 100% of all human knowledge. But we'll get there at some point. I don't want to hold the microphone back and talk about this all day. All right. I have a question on the publication time because that's something is very interesting because I try to like for example many of the recent research papers they use different articles. Oh sorry, different softwares. And these softwares are based on some other algorithms. So this is different type categories of publications or that they cite. They cite in their work. And second question is related to what Daniel Mission was saying was about annotations. And annotations are subjective for me. And if I put something like this is this is my point of view that this is correct and I put a reference URL for me annotations are a bit more subjective. I don't know. Let me start with the annotations. Yes, they are subjective but so are most sources that they are being made on. And the point here is making like the the route more transparent from where the information came from. So actually if we just say it came from that article then you're lost in trying to find that the source of that statement in those 10 or whatever how many pages that are. If you have an annotation it's much more easy to locate that information to actually verify whether that can be translated into that wiki data statement. So that's a short response I guess on the annotations. It may be good for us I think for our scientific articles. It may be good for scientific articles but think about political articles or I think it can go. This is actually being done. So hypothesis has which is one of the annotation tools and the one that was used in the example. They have some sub community that is annotating systematically news articles about climate change. Which has led to a number of those news articles being retracted because the annotations clearly pointed out and you can comment on them. That some of those statements just were plain wrong and that speaks to your subjectivity but it also speaks to the possibility of using annotations to actually do fact checking. Now by now I've actually forgotten your first question. Different types of publications. Yeah how the different types of publications speak to each other. Yeah well technically we can all say they like for instance they cite each other. You saw that along large metrics of things that are being cited including poems and things like that. Yes they do cite each other and we don't have good data models for most of them. So right now the best data model that we have applies to a subset of scientific articles. We have emerging data models for certain kinds of books for conference proceedings and patterns and court cases and things like that. But many of the other things we haven't even touched yet. Just trying to figure out that these things exist and they come from different people who may not have heard of Wikisite. They just want to annotate their poems or Psalms or whatever they are working with and or movies. Yes they cite each other. Yeah and we just try to take a holistic view on that so that we do not just go for physics or whatever. We really try to want to handle references in a more general way. And the more structured they are already the easier it is. And that's why we start with those areas where we have some databases that are somewhat structured. And this is by the way what we're trying to engage with the librarian community. It's so important we don't need to reinvent the wheel. We are decades of efforts around modeling sources and so trying to see how we can map these efforts to what exists in Wikidata. There's a properties in data models. Okay so this might be kind of a stupid question but how does Wikisite sort of overlap with like the notion of a reference in like a Wikipedia. It seems like Wikisite may have a particular bent towards scientific journals but that may just be in what's happening now. Yeah that's a good question. I think it's important to clarify the scope of the project. So there's been a pretty significant growth as all of you saw in the era of scientific references. That's not what we thought would be the scope of Wikisite so speaking for the organizers and the community has a different view of this. But the idea of Wikisite was to be agnostic about different types of sources. Scholar papers had found a community, a group of people like really excited about experimenting with data models. But they're also a fairly simple case because modeling the bibliographic record for a scholar paper is relatively easier than modeling a news source or a book. Turns out that we had entire groups of people at Wikisite trying to come up with a pragmatic way of representing books in Wikidata and it's a very complicated question. So papers are easy. They have consistent APIs so it's easy to retrieve data and do some cleanup and de-duplication. Books, news articles, poems, patents, like all the other types of works are much harder. But the scope of Wikisite is not, and this is something I need to emphasize, is not limited to Scholar papers. Thank you. Thank you. I'd just like to give a bit more background to some of the issues around the citations in Wikipedia and their relationship to Wikisite. Lydia put up a slide about SiteQ which was the template I built as a prototype for how we could call metadata from Wikidata for citations in Wikipedia. And Lydia also this morning pointed out that the person data template has been removed from Wikipedia completely because the data is now in Wikidata. And Wikipedia calls all of its authority control data about people from Wikidata, so it's the same model. But while we were discussing the SiteQ template recently, I did a little bit of research looking at the most cited work in Wikipedia, in English Wikipedia. And I'm afraid I can't remember the title, but it's basically a catalogue of astronomical bodies. And I took a random sample of, or semi-random sample, of some of the instances of it being cited, averaged a number of characters and multiplied it by the number of citations, and that one work alone takes up about a third of a gigabyte of text. But by being cited repeatedly in multiple, you know, hundreds of Wikipedia articles. And if any of us were given the job by our employer of writing a database and said, well, you just put this piece of text into the database 3,000 times, we'd get the sack because it's a stupid way to design a database. In Wikipedia it's grown organically over the years and Wikisite is the mechanism by which we can improve the structure of that data and prevent needless duplication. Yeah, I think there's a fair point. There are questions that are technical that Wikisite can solve on top of like storage, the reuse of source metadata across articles. Right now it's impossible, you said, if you have a given source cited in article A to reuse it in another article of the same language, not even a different language, right? So we could optimize enormously the processes. However, I think that there's a message that I think Lydia was emphasizing. There are technical problems. The social ones, as we know, having been there are as complex and probably more complex and technical ones. And more fun to solve. Yes. There's a question here. I'm in New York City now and we had an event recently where people were translating articles using content translation. And what they told me, one of the biggest problems is they can copy things over and translate, but the references don't easily copy over because they're using different templates and all that. So I think potentially in the future, with Wikidata having the references and maybe a more standardized template or thing, that will be easier in the future. But right now that was one of the biggest problems they have. So maybe the references get lost or something, which is not great. Good point. Early on you showed a slide where you had financed by. Okay. Very exciting. I was wondering, do you have a ready-made set of queries, graphs, things like that that one could show to people so that they'd see right away the utility of it? I'm especially thinking of investigative journalists. Right. We do have queries. I think the main blocker is coverage. Right. So we don't have yet extensive coverage of funder information in Wikidata. I might be wrong. And Daniel is now running a live query. Why are you running on your computer? Yeah. So I'm very excited too about funder information. Like reconstructing who's paying for specific information and whether we should trust the funder or if you want to know what the funder is specializing. It's something that right now is really hard to even visualize and study. So there are some cases where this is fairly easy. For example, in principle at least, scientific papers should have a funder ID that if I am a grantee and I publish an article, I need to acknowledge a funder. And in principle, this funder information should be part of metadata deposited with a bibliographic record. When that happens, in principle, we should be able to ingest this information represented. For other cases such as news articles and who funds specific types of work, it's harder, we don't have consistent APIs, we don't have identifiers for sources. So it's challenging, but I'm excited as you are about the idea of reconstructing this provenance of knowledge. Let me just add the things I was trying to show. First, on the Wikidata query service, we have a number of examples. Some of those examples are actually about Wikisite stuff. So for instance, you can get a list of papers published by people who got the Nobel Prize or something like this. And about the funding in Skolia, there is a feature that actually exposes the information if it is in Wikidata, which it typically is not, but in some cases it is. The information as to who has financed the research report in a given article as indicated in the publication. Some publications actually make the grant numbers available. And so if the information is there and harvestable, it can end up in Wikidata. Sometimes it does, but it's not systematic yet. So one thing, I don't think Wikidata itself is very consistent in the way we use references, because of course every statement can have a reference. Sometimes we use stated in, sometimes it's reference URLs, and sometimes it's the entire reference information with the title and the author. How can we get Wikidata editors to be Wikisite aware and try to follow this practice by adding their citations as actual entities? I think that's a challenge for us. Definitely. And I think with the new constraints reports on the one hand, where we have constraints that help make things more consistent, and on the other hand, support for citrate. So people put in a URL, for example, and it automatically gets all the other important stuff as an example. And I think those two things will help quite a bit with that. One more thing, I think you're totally right. The Wikisite community, as far as I know, has put more effort into thinking about modeling citations in Wikipedia and how they may possibly map to Wikidata. We don't have a good citation model for Wikidata itself, which is ironic. So I think you have a point there. Yes. I think there was a question there, and then Ben. Thank you. I want to ask about access to the full text of the articles that have been ingested, whether there's any plans to try and link to open access copies where the paper is behind the paywall if you use the DOI? Yes, in capital letters. So one of the foci of the current import is actually on articles that have a PubMed central ID, which means full text in biomedicine. Not all full text days. A lot of them link out to a publisher, but it's behind the paywall. PubMed central as a full text archive. Then we also have a property that is in English. It's named something like full text available at and there is a bot going around on Wikipedia, not yet on Wikidata. But I guess some of the people are here. So that could basically do that same thing on Wikidata and then that information would, in principle, be harvestable by other Wikipedia's and so on. What that bot does is basically it takes the identifiers from the citation on any Wikipedia, then looks up certain full text options on the web, including like the free floating web and then provides the user with a link to some copy of the article that that tool things is a copy of the article. And then we have some tools that can actually help human editors to verify that this is actually a copy of that article and then you can insert that into Wikipedia articles for the moment and hopefully soon into Wikidata items. Just to add to that, Jake Orlowitz from the Wikipedia library project has just put out an appeal for somebody to help with some coding on the OA bot, which is getting a bit sluggish because of the volume of work that it's doing. So if anybody fancies doing a bit of open source coding, I've tweeted a link to it and I'm sure other people will as well. Thanks. Lydia, at the end of your year in review this morning, there was a beautiful image of a federated ecosystem of projects. And I'm wondering, I mean, this is incredibly exciting and incredibly important. It's obviously at the very, very, very beginning of what it could be. Should this be in Wikidata or should this become a federated project? I know this is an ongoing debate and I'm new to it but I'm curious to hear what thoughts are in the room and from you guys. That is indeed an important question, right? And I think at this point we're not set up for doing it outside Wikidata. So I think from that side the question, at least for now, is answered but in the future that might be different. But maybe you want to talk a bit more about... Okay. All the work that Hayosh has done in... Yeah, so James here, I don't know if he's here, has done a lot of work with this project called LibraryBase to try and extract granular information, not just about sources but also about their instances, when they happen, specifically revision ID in the context of Wikipedia article. Obviously Wikidata today cannot support that level of granularity and so I think he made the right call to build this in a separate Wikibase instance. Now the trouble is that this is a separate instance and so it doesn't have the richness of other entities that we can now use to cross-link all this information with everything else. So the question is, in the long term, is this going to be served by a dedicated instance with more granular information? It's going to be more useful to integrate this into Wikidata? I don't have the answer either so we'll need to see what happens. We already have another project inventor I.O. president here and it also refers to Wikidata items about publications that enrich this information about a lot of more additions than we want about books. So maybe this could be a model also for other bibliographic databases. Absolutely. And we know that the Italian National Library, thank you, has also experimented with the idea of storing natively bibliographic records in a Wikibase instance, which is also amazing. So there are many experiments. We need to figure out what works best for the ecosystem. And what kinds of tools we need to make that work in a more federated way so we don't have to have everything in a place like we do now. All right. We have five minutes left. So maybe we have time for one more question there. Yes. And then we can move on to wrap up. Final question? We answered all the questions. Amazing. Then let's go to getting involved. All right. How to get involved? So we believe that Wikiside's vision can actually benefit a wide range of stakeholders and groups. This is a very incomplete list of people who benefit from this. For example, we didn't mention reporters and investigative journalists. I do believe that that should be part of the picture. And obviously we need help from all of these groups to understand their needs to, in some cases, to tap their expertise. Librarians and metadata providers and curators of digital collections are obvious fits for this project. And there are many ways to get involved. Wikiside is an initiative so far has been mostly a series of events and some online activity and set of discussions. So we have a page on Meta called Wikiside. We can find all the information on the project. There's a fairly active Twitter handle if you're on Twitter. They basically tweets every single thing related to citations and Wikipedia and Wikidata. We have a mailing list called Wikiside Discuss where most of the participants are discussing in more details these issues. And we're also hosting events. So Daniel mentioned before our funders so we're fortunate to have financial support from several organizations and we've been running, thanks to their support, an event in the past two years. We were also able to fly people in from all over the world thanks to generous fellowships and we're currently fundraising and organizing the next event. Tentatively the destination should be Barcelona at the end of May 2018 but this is still subject to confirmation about logistics and funds. So save the dates and if you're interested we hope to see you there. And with that I think it's a wrap. We have a ton of people to thank. These are people who gave us feedback on the deck all the participants on the two events the funders of course everybody has contributed to Wikidata Source metadata project which is the project on Wikidata predates Wikiside and I'm sure there are many more that we're forgetting here. So thanks a lot and get in touch if you're interested.