 All right, we're set. Hi everyone, my name is Sandra. As a volunteer, I'm known as user spinster. I work for the Wikimedia Foundation since the beginning of July of this year, as both community as on for structured data and also 50% of my time in the GLAM team, working specifically on strategy for structured data in the GLAM space. And I'm going to tell you a little bit more about a project that is kicking off this year and will take until 2019 where we will add structured data to Wikimedia Commons, the project you have all been waiting for. Who here in the space doesn't really know Commons quite well, Wikimedia Commons. Also a few people. So I do have a little introduction for you. For the rest, you can still get into the groove. Who in this can just the people who work in the team raise their hands? Yay. I'm actually not a coder myself. So if you have very specific, more technical questions, I can pass the mic to them and that's really reassuring. Who here in the room has signed up for the amazing focus group that we have? Yes, yes, yes, yes. Thank you, thank you. Yes, we'll get to that. Okay, so now I know what's the base point of this presentation will be. So Wikimedia Commons is one of the oldest projects in the Wikimedia ecosystem family. It has, at this moment, it's a media repository and it has at this moment around 42 million files on it. And these files are images, videos, stuff. Oh my god. Media used as illustrations in other Wikimedia projects but also media that can be downloaded, reused by people anywhere in the world for any purpose. We have roughly, and this is a very large generalization, roughly three categories of types of media that live there. A lot of them are on work by Wikimedia volunteers. So you can imagine photographers making photos of people doing presentations and uploading them into comments or, yeah, on work. People creating infographics, people creating illustrations, videos, et cetera. We also have lots of media coming from other platforms, free media that are, for instance, taken from Flickr. We have many volunteers who look on Flickr for interesting images that can be used as illustration for Wikimedia articles, et cetera. Look at YouTube for free licensed videos and upload them to Wikimedia Commons. And we also have many files, increasingly large amount of files that have been uploaded to Wikimedia Commons in the context of partnerships with cultural organizations, knowledge organizations. So these are the three rough categories. Yes, so I give an example of each of these. This is a very good example of own work, a beautiful example of one of our most prolific photographers, Diogo Delso. And yeah, he's one of the people who makes very high quality photographs of places where he lives, where he goes on vacation. This image is a featured picture. So Wikimedia Commons also has a social system on rating the pictures that are on there. So we have a specific rating system for highlighting the most valued images in there. Second example, oh, switching around. Oh yes, this is where it comes. We also have quite amazing community-driven campaigns to encourage the upload of these kinds of images. One notable one is Wikiloss Monuments. Actually, Jean Fred gave a great presentation about that yesterday. It's a competition that has been going on for many years already, where worldwide people make photographs of built heritage and these are uploaded to Commons. So these campaigns are very crucial also in making sure that we have these high-quality images on there. Second example is Fidiu. It's still a kind of a new area on Commons, but I hope it will grow. This is an example of a very nice video that I found that was actually uploaded from YouTube. It's a one hour and 20 minutes silent film. It's actually a film classic by Sergei Eisenstein and, let me see, Grigori Alexandrov. It's actually a re-enactment of the 1917 October Revolution in Russia, which they re-enacted for film 10 years later. Film classic, silent film, and it's available on Commons. So this is an example of high-quality media that people find on other platforms and upload there. This is an example of an image that came from a partner donation. Brazilian chapter, Brazilian user group, has worked together with the local museum and uploaded images from their collections. This is a skeleton of a pig. I think after the goats, we have to go towards pigs because they are really cool animals. Because Wikimedia Commons is one of the earliest sister projects in the Wikimedia universe, it's based on Media Wiki, the same software that powers Wikipedia, so it's actually text-based. So if you go to a typical media file on Wikimedia Commons, this is the second example that I gave the Eisenstein film. You have a page with templates and categories that looks a little bit like Wikipedia would look. Volunteers over the years have worked towards a system to give this some kind of structure, to give some kind of metadata structuring to these files. So what we have is templates, description templates. This is, in fact, an information, the most basic information template, and it tells a little bit about what you see there. So in English and in Polish, I don't know why, you have a little bit of description about the film, and then you have a very large licensing template that, for instance, says that author, it's about copyright in Russia because it's a Russian movie. The author of this work died between January 1, 1942 and January 1, 1946, did not work during the Great Patriotic War and did not participate in it. So that's one of the criteria, apparently, within a Russian copyright law to determine whether a work is public domain or not. So all this kind of information, all this kind of stuff is now in Wikitext in form of templates embedded into comments. If you don't go to the source, this is actually the new highlight function that you have for syntax highlighting. But if you go to the source, you see that it's, in fact, Wikitext that goes behind that. We have also a beautiful category system. It's, in fact, what I honestly appreciate it. I use it. Yes, it's the best thing that we have to tag what is in the media at this moment. It's the best thing we have. So these are the categories that are for this film specifically. For instance, if you go to the main, it's like a tree category structure. It's a very large tree, you would say. We have millions of categories. So you see that it's a mixed bag. They are organized by technique, but you also have a category for saving private rhyme. I don't know. So it's, yeah, a crowdsourced tagging system hierarchical. But we also have accesses in the category system. And yes, I won't read this one out loud, but it's one of the classics, you know. We all know this one. There's even subcategories, which are more specific. Does this help with finding media? Well, I challenge everyone here in this room to jump into the skin of someone who doesn't know comments and you have to find images and you use the search for comments. I actually have friends who are like also journalists and bloggers and they tell me they use comments all the time to find free media for their articles, but they never use common search. They always use Google search because that at least works. I'm also jumping onto the issue of languages on comments. I speak Dutch myself. I'm a Belgian-Dutch-speaking person. If I would not know English, which I think kind of a lot of people on this planet don't know, don't speak that language. If I would search in my own language for a painting of a pheasant, schilderij van een verzand. Schilderij verzand. I find one painting and I find this, I don't know, an old publication. That's a bit suspect. I mean, there are probably more paintings of pheasants on comments. So if you then switch to searching that in English, if you know English, of course, you do find a lot more. The search results are still confusing and it's not optimal, so search can be better. So how can we make comments even more awesome? We can make comments more awesome by, of course, the magic of structured data. And it's actually something that the community has been asking for a long time, multilingual categories, better structuring, better search, more good ability to edit the files, to curate the files. So it's happening. The upcoming three years, we're going to add structured data to comments. Converting the metadata also to a machine-readable format. So to make sure that not only humans can easily find and reuse stuff, but that also on a large-scale software via APIs, any applications can be built on top of comments and that they can be consistent. So thanks to this lovely project being like five years old already, it's kind of maturing, so we can confidently use wiki data to a certain extent to power this. And that's a very great thing. So what's going to happen is, in fact, this is a very old visualization and it's just this basic principle is you will have the existing file pages that you will have now, and in addition to that, we are not going to take anything away. In addition to that, we will insert also structured data. You will have a data section in there. That's what we're developing in the next two and a half years. So for instance, if you go back to the film, which I really like now it looks like this, in addition to that, you will be able to use the power of wiki data. If you go to the wiki data item for that specific film, it has a lot of extra information about it, which is really valuable to able to find it, to be able to reuse it, do interesting things with it, visualize it, the duration, the aspect ratio, the color, black and white, et cetera, et cetera. So that kind of power will be inserted into comments in addition to what we already have. And as I said, machine readable, reusable, so people will be able to build applications on top of that. And I just show you, in fact, Krotos, it's by Shonagon, yes, I told you I would include it. Here it is. This is in fact an interface built on wiki data, but these same kinds of interfaces you will be able to build on comments in a reliable way, because we will have structured, well-formed APIs that won't break and change. So this is the basic architecture model. It's very simplified, but what will happen is the software that powers wiki data, wiki base will be integrated into wiki media comments. We will build a new search functionality on top of that to be able to search through the structured data. There will be APIs that are better than the current ones. And there will also be possibilities, of course, for the community to build tools with that. And yeah, these are our target groups. These are all the kinds of people who potentially will either edit comments, use, reuse images from comments, and that goes from the wiki media community, of course, itself, but also we think about companies, journalists who want to reuse materials, researchers. Comments will become a corpus of structured data, which is, for research purposes, also quite interesting. So we do it for many people. And we can do this thanks to a grant that was announced at the end of last year. My wage is paid from that and some other people's wages as well. We received a quite generous grant from the Sloan Foundation who really believe in this project. So that's what actually makes it happen now. So what are we going to do the next three years? We're still at the beginning. What has happened the last months is people like me have been hired. So at this moment we are building the groundwork. Our team is now finished. We have a really good team together and I personally really trust in the people who will do this really. So I feel very confident that we now can start this. We have effect like a very large, I don't know, distinction between this year is really about the groundwork. Next year we will do really lots of development and you will see the first things being rolled out and in the third year we will probably focus more on making sure that the new features get integrated, get used and that more structured data gets added to Commons. So that's like the large overview. What we are doing now or have done now is we have worked on the technology groundwork, really like the foundations, the invisible foundations beneath it have been worked on a lot. For the geeks among you, we have actually the multimedia team is now working on the media info entities integration so that that actually works. If you have questions about that, you can ask Ramsey. Federation is also being worked on. That is just, I will explain this in next slides, multi-content revisions. We are doing design research so we are actually doing surveys, doing interviews with the users of Commons to find out what their needs are. Both Glam's cultural institutions and heavy Commons users are being researched so that we know what we need to change, right? We try to develop metrics so how can we measure if what we develop will be successful? Well, for instance, we want to know how difficult it is to search now, have numbers about that so that we can compare it with the future's first search function, for instance. First design sketches are being made and it's really more about explorations of how the interface could look like and my work is starting community engagement so making sure that we have a good system in place so that you guys are really informed about this, that we can talk about this together in a very good and nice way and make sure that we can make this project happen in a way that makes everyone happy. So Federation is, in fact, the idea that you can have several wiki-based instances and integrate data from one instance into the other, right? I'm not a technical person myself and it's really my task to be able to explain this so I hope I do it right. That has been already started earlier this year. You already have a test version that you can try at the URLs here, a demo version, but we are going to work on that further to make sure that we can also integrate it into comments. Multi-content revisions, that's in fact the principle. It's really groundwork. It's something that you as a regular user will not see but it's a crucial component of the whole is that you will be able to split pages, file pages into several components, do edit someone separately, but still have the page as a whole that you can follow in your watch list, et cetera. So yeah, it's groundwork. There's a good presentation by Daniel on comments about it if you want to know more about it. And these are really first preliminary sketches of what could the interface look like. This is by no means already an official proposal but it's really more the design team exploring what we can do. At Wikimania, two months ago, we had already like a workshop together, a brainstorm about for instance, what search could look like. There are already ideas about search, about how you will be able to filter that's going to be kind of powerful. So yeah, we are already starting to think about these first things. This is how, yeah, all the data where it will live. Actually, after this conference, I will publish this one on Wiki, it's very fresh, but this is a bit of a scheme of how, what exactly will still be in the old Wiki text, what will be in Wikibase within comments, and what will be drawn from Wikidata. So the red arrows that you see here, that's in fact the federation going on. So that's where information is drawn from Wikidata items into comments. What are we going to work on next? And that's for the upcoming, I think year-ish. Soon we will also publish our first roadmap on Wiki for the first six months, that will be kind of precise-ish, but afterwards it becomes more difficult to predict. But large things that we want to do is we want to start with the first feature that will be smaller than the whole structured data. I think we think about making captions of images, translatable and just editable and translatable as structured data. And then afterwards all the structured data itself will be rolled out as well. We recommend that we wait until that step for starting data modeling. So at that point there will be a large need for the community to start thinking about how to model this, how to model the information that should fit into structured comments. For a part we will be able to reuse probably the modeling that has happened on WikiData already, but we might need new properties. We will probably need a lot of modeling around copyright and licenses. I think it's best to wait really for the rollout of the first technology because that will really tell us how it happens because before that it's really just inventing things. And of course search and upload functions is something that we are also going to work on. We will have painful times also upcoming and because we're here together with friends I want to just openly discuss this. I know that people are really excited about structured data on comments but there will be hard moments. I just want to put that out there and let's get through this together. Let's, it's my soft side that cut out while making this presentation. Let's work together in a good way. I would say. I know that we have many people who work for both projects but we also have lots of people who are really only familiar with comments who will think WikiData is a strange project so we will have quite a few people who are suspicious and let's try to make that process as smooth as possible I would say. So this is just a call out and some ideas of, I think things that will come up also for the WikiData community who is interested in comments of things that we have to work on. It's welcoming newcomers from comments, introducing them to WikiData. I want really, I hope to work with some of you to think about a good way of introducing people to how the structured data works at all. Yeah, support for tool developers. That's also something tools might break. Tools might go, yeah, I don't know, out of, yeah, they might become obsolete. People love their tools. So how can we transition that? One thing I'm already starting to do but I'm still trying to find out what the best way is. There is already a page on Wiki where we can track tools that are really crucial to our ecosystem and how we can help developers there. So if there are developers here in the room who are willing to help other developers, I would really like to enlist your help for that because I'm not a technician myself, not a coder myself and it would be really nice to have like a little community of people who are willing to help. There's also first fabricator tasks for that so let's figure out how we can best track that and things. Yeah, modeling will be challenging and I just, because of my own work as a volunteer on Wiki data, I thought about many things that will be interesting. I don't know, moderation of new edits that will happen to structured data. Will there be a flood of new edits and how can people track that and how can people keep track of the reliability of new edits? That's something that you now see on Wikipedia as well. So that's a challenge that we have. There might be conflicts with the commons community. How will we, I don't know, deal with that? There's going to be a discussion probably about the fact that Wiki data is CC0 and the metadata and commons is currently CC by SA. Just putting that out there. We will have new property proposals. We will have to model copyright and licensing. Kind of interesting thing. References, what will we do with that? I don't know. And text descriptions, et cetera, et cetera. So we will have lots of work to do, I think. And then as soon as we start to get this modeling going, of course, we will want to do data conversion as well. So converting unstructured file descriptions also to structured data. And it would be nice to also do some pilots, maybe with projects like Wikileaks Monuments as soon as we possibly can do that to see if we can already do upload campaigns that are really specifically targeted. And yeah, many monuments are already on Wikidata as Jean-Franc explained yesterday. So we might be able to do interesting things there as a pilot. And oh my God, sorry people. It's actually my son's best friend calling me now. Ha. But especially, yes, timing is perfect. Because this is not the last slide, but it's a slide that made me very happy by Lukas. It's, oh my God. Sorry guys. Yeah, let's imagine together what we can do with this. I noticed that some really great ideas are popping up and let's also think about the future and let's also think about the cool applications that we can do with structured data and comments. Let's think about that together. Two last slides is if you are interested in following the project closely and you haven't signed up for the focus group yet, please do so. And this is how you can get in touch. No questions? Hi, thanks for the talk. Are you also planning on targeting scientific content like figures of papers and so on and so forth? And if so, are you analyzing the relationship with other platforms like FigShare or Zenodo that also not only allow uploading your content but also get a permanent identifier associated to it? So you're talking about scientific publications. Yeah, I think that's a very specific area and that's something for the community to work on. Unique identifiers, we will have them as the media info entities numbers. So each file on comments will have a unique identifier that is not like the Wikipedia queue number with queue but it will be with M and a number behind that. I didn't say that in my presentation so that will be in there. We will have APIs to draw things back and forward but what to do with scientific publications or whatever, that's really up to the specific groups of community members who deal with those specific files. I'm myself mostly active on visual arts and in the visual arts field we are definitely looking at those kinds of questions. I don't know about the people who are dealing with science but I think they are as well. And what about like associating digital object identifiers to these kind of pages as well or contents? Dois, it's like the digital object identifiers? Yeah, I would say if that's a part of the metadata that you want to have a structured data on comments, I would say just go to the community, propose it to have it there. Yeah, exactly. It could just be a statement on that file with an identifier. Yes, that's part of the modeling that the community will do. There's a question there. We have to take the discussion offline to keep on track with the next talk. So thank you very much. Okay.