 That's live. Good morning, good afternoon, everyone. I'm Ariel with the Wikimedia Foundation's research team. I'd like to welcome you to the February edition of our monthly showcase. We have two exciting presentations this month. Mary Bright, one of the latest editions to the research team, is going to present first on an enriching knowledge basis with images and presenting working progress that should be conducting. Over the last couple of months, we had Magnus, who's also joining us from Keybridge, I guess, a special guest. And the second part of the showcase is going to be devoted to discussing backlogs and how research and machine learning methods can help us address the efficiencies in a quality-controlled backlog. Aaron Halsey here, and he's going to give us a talk. He doesn't need a presentation, he's been in the space a couple of times. As a reminder, the format is the usual. We're going to have 25 minutes of presentation followed by a short Q&A. And we're going to have an ample time at the end of the showcase for additional questions. There's an ISC channel, Wikimedia Research, on a free note. Okay, you're welcome to join for discussion, and Baja is going to be our host there. So please relay your questions to him and he will discuss them with you as a group during the Q&A. With that, we have the speakers. Okay, so good morning or evening, everybody. I think it's 7.30 p.m. here in London, so it's evening. Yeah, I'm very happy to be here today together with Magnus to talk about our ongoing research on visually enriching collaborative knowledge basis. I think I'm a kind of big fan of this topic because my background is computer vision, which is about using machine learning to analyze and understand images at scale. And I think not only because of that, I'm a big fan of images and I love images because essentially they provide a way for us to share knowledge without language barriers. Visual language is a form of language that does not need a verbal form for writing or a normal form for sticking. We learn how to read images when we are very young, so it's very easy for us to understand images. It might be less easy to understand languages that we don't know. And actually, anecdotally, I love images because they saved my life as an expert in the past 10 years, many times. So in London, it's very likely that you have 10 people in the room and none of them is a native English speaker. So if you don't know a word in English, it's pretty much useless that you find the translation in English. It's much easier if you just find the images corresponding to that concept in your own language and then you show everybody the images and everybody will understand. And they will say, oh, yes, good on that, yeah, rain gutter, so it's much easier to communicate three images in multilingual testing and somehow sharing knowledge through images can really break some of the language barriers. And if you think about it, collaborative knowledge bases such as Wikidata are also designed to share knowledge without language barrier. I don't think I have to introduce Wikidata here, but for those of you who don't know, Wikidata is one of the largest collaborative knowledge bases where everybody can edit and add structured data. Wikidata is structured on items that are concepts such as London, and each item has a bunch of statements describing some of the properties of these topics, such as, for example, population, but also the image that depicts the concept. And so Wikidata is structured to be an international and multilingual project. So everybody can add knowledge in their own language, unlike Wikipedia that is a bit more language specific. You have English, Wikipedia, Spanish, Wikipedia, et cetera. Wikidata is very international, and so potentially when Wikidata will be finished, we will have all the structured knowledge from every language in the world. But while we wait for that, maybe adding images can be a good way to increase the multilinguality of the project. And actually images are very, very important and very, very popular in Wikidata. This is a plot from Magnus. Basically what this plot tells us is that images in English Wikipedia has been outnumbered by images in Wikidata, and that in the past two years there have been a tremendous growth in the amount of visual contribution in Wikidata. So images in Wikidata are very important, and however, these 2.5 million images that we saw in the last plot are nothing but a tiny fraction of the visual knowledge needed in Wikidata. Actually 95% of items in Wikidata do not have an image depicting them and as much as some of them can be, say, bibliographic entities that do not need a visual form, there are some categories like people or species that definitely need a pictorial representation and that we are very far from having a complete visual representation of these categories in Wikidata. So what we see here is that there is a lot of work to do in this direction and what we thought is that we could bridge existing efforts from the community together with our knowledge in computer vision and machine learning to design smarter tools to help the community visually enriching Wikidata. So the problem of visually enriching Wikidata is that for a user willing to add an image to a Wikidata item, this can be a tedious process because he or she might have to go through different tools to search from a search over a bunch of re-licensed images. Only the common is 40 million images. So this can be a very long process and so we thought, why can't we use intelligence tool to help reducing the search space and allow editors to look for the best image for depicting Wikidata items among a smaller set. And so we envision this visual enrichment pipeline where for a given Wikidata items without an image, we can automatically discover from a bunch of sources linked to the last item, a set of images that are somehow related to that item. And then since these images are still a lot, we can then use smart tools to rank these images according to their relevance to the item and the photographic quality. And then at the top of this ranking, we might have good candidates for that item. So the images that could be a very good fit for that item. So that we can basically just show the editors this short list of potentially very good fit for that item, thus reducing the search space for images of that item. And so the first step towards this goal is to discover free licensed images that could be somehow related to a given item without an image. And actually here the structure of Wikidata can help a lot because a Wikidata item contains information regarding the Wikipedia pages that talk about that item. And these Wikipedia pages can have images on them. So we can pull these images in our set of candidate images. And then we can also discover other candidate images by querying Wiki commons and Flickr commons with the label of the Wikidata item or also the description of the Wikidata item. So after this step, basically for a Wikidata item we have discovered a bunch of images that can be related to it. And actually in some cases we can do a very good job. So for example, in the cases of people items for 75% of people items without an image, we can discover at least one image that is linked to them. But actually for most of them, the number of images that we discover is much higher. There are so many of these images and we still need to reduce the amount of images the user should select from. And so the next step towards this goal is to use some smarter tool to rank these images. Because obviously not all the image candidates that we gather can be actually good candidates. And so we want to rank these images according to two dimensions. The first is relevant. Basically we want to know how much an image, the likelihood that the image actually depicts the entity or the item we are trying to illustrate. And the second is quality, which is basically a measure of how well the image actually depicts that item. And so to rank images according to relevance this is a completely a multimedia information retrieval problem. So as I said, this is ongoing research. We have many possibilities including visual semantic embeddings and matching of keywords, et cetera. But for the moment the temporary solution we are using is that we find a way basically to match the name of the image and the description of the image with the item label. And this is a metric for the image rather than how much the image name is matching the actual item label. We also use some face detection just to filter out for people items, filter out images that do not depict people. But that's about it. So there is a lot of work to do in this direction and I'm happy to get suggestions regarding how to do this part. What is a bit more complete is probably the ranking of images according to the second dimension, which is quality. So although we might have many relevant images at this point, not all the images actually are good. They depict the subject in a very good way. They are of high photographic quality. And so we want to understand the extent to which an image is of high photographic quality and the score images according to quality. We resort to computer vision and more specifically to a branch of computer vision called computational aesthetics that basically exploit vision information to automatically score images according to their photographic quality, their aesthetic appeal and so on. Just by looking at the pixels, it uses supervised learning techniques, which means that it learns by examples. And so to automatically score images according to quality, we need a set of annotated images, a training set of images annotated as being either high or low quality. Now, where do we find this data? I always say that comments is the goldmine for computer vision researchers as much as it can be unstructured. Sometimes it's really the community puts a lot of efforts in curating the data, finding the best comments, trying to categorize in a very good way the images out there in the comments. And actually one of the categories of the comments contain 160,000 images and it's called quality images. Quality images are images selected by the comments community to be a very high quality according to a bunch of guidelines. And so we can use this as positive examples for our learning process, but we need an equal amount of examples of low quality images. And to do this, we randomly sample 160,000 comments because it's very likely that a random sample of comments will contain a lower number of high quality images. And so this is our training data and then we can train our supervised model. In this case, we choose to train a convolution neural network. It's called Inception B3, was originally trained to detect 22,000 object categories, but we transformed it to this effort so that it's able to score images according to their quality. So basically what it does is that it takes an image, it gives us out to the score, which corresponds to the quality of the image. And what happens in the middle is basically that the network is learning what are the pixel combination that makes an image high or low quality. There are a bunch of technical details you got in this, I'm happy to share them with you, probably offline. But the main point is that basically this effort does a pretty good job in distinguishing low and quite high quality images. 72% of the time, the network is pretty accurate, it's accurate, the prediction of the network on the image quality is accurate. And when we add to the prediction some external features, such as image size or simple features like the length of the image description, we boost the performances to 78%, which is absolutely in line with state-of-the-art models on computational aesthetics. So now we have a way to rank images according to relevance and quality, we want to combine them. There are mainly two ways in which we do this, again this is ongoing research, one is supervised when we basically take just the top X relevant images and re-rank them according to quality, and the other one is supervised, where we learn the ways to give to relevance and quality score to essentially put at the top of the ranking the best images. Now here we have some visual examples, they kind of look good, but as scientists we want a bit of evaluation, so we did a partial evaluation by basically taking advantage of the distributed gain tool, which is a tool built by the community that helps the community selecting images for liquid data items, among a bunch of image candidates. So we take some data from this tool, thanks to Magnus again, from 66,000, we select 66,000 items, and for each of these items we have the candidates, the image candidates for that item, and also that image that was selected by the user as being the best image representing that item. So we can use this data as a ground truth to evaluate the goodness of our ranking. We essentially want to know how well is the ranking model pushing at the top of the ranking the best image, so the selected image, and through normalized discounted cumulative gain, we can measure this property, and we can see that essentially this model, 73% of the time is able to put at the top of the ranking the actual best fit the actual selected images. So this is a partial evaluation, but we still see here a bit of signal that these smart tools can actually help reducing the space in which editors can look at to find good images for liquid data items. And we are continuing this evaluation through by feeding these recommendations to online tools that the community is building, and when I take community I mean mainly Magnus, and Magnus is really, I'm really grateful to Magnus because he helped a lot for this project, and I also know there is Magnus Manske Day because also the community is celebrating his existence, so I invite you all to observe the day, and I also invite Magnus to talk a bit more about this existing tool to visually enriching liquid data, and about the future of these tools and images and liquid data. Okay, so I'll try to be brief because you've probably seen most of these things already, one from one another, this is a tool called Wikishitme. This is originally intended for mobile, so you can walk around, look at your mobile, and you'll see Wikidata items around you with or without an image, and then you can just take a picture on your mobile and upload it, but you can also see, so the red dots here are Wikidata items without image. You can also see as blue dots images on comments that have a coordinate, so you can see images that are taken close by and then you can check maybe these already, cover what you want to add to Wikidata, and then you can do that as well. You can also overlay those Flickr images, but they seem to be a bit sparse. And the next slide, so this is a tool a bit more dedicated to this purpose here that was discussed here, which is finding images that are already on comments in this case via Wikipedia mostly, so you can generate a list of Wikidata items that don't have an image, and then the tool will check the associated Wikipedia pages for images and present those in this example here, this is Indian writers, so these seem quite sensible suggestions in most cases, and you can single click upload, well, single click add these images to Wikidata. On the next, we have what we've already seen before the distributed games, so one distributed game is a framework where everyone can add their own games by an API, and one of the games I have added is showing image candidates similar to the previous tool. These things are already, these candidates are already pre-computed and are just displayed for the user and you can just decide to use an image as an image or as another thing like a video, and can also decide there are no more images and that's not shown again to any other user. And then finally in this big section, so this is a new, relatively new tool, it's a few weeks old, this also has pre-computed image candidates, both from commons and free images from Flickr, associated with items. There, again, you can single click and add the image from commons to Wikidata. For images from Flickr, you can single click and transfer them to commons, guaranteed to have a free license. And one of the nice things here is in the search that's done the search, that's done is coded specifically for the group of items we're looking at. So for example, there's a group with churches. So what I'm using there is both the name of the chart as a label of the item and the name of the administrative unit it's in because there are a million cent marries but there's probably just one cent merry in the specific village. So that cuts down a lot on the noise that you get if you just search for individual label names. In the background, this can also have the ranking information that was already discussed and right now I'm showing these images randomly so they again can become a basis of judging how good the ranking tools are. And eventually, once we know the ranking tools are good, we can sort the images here and elsewhere, buy these ranks and present the best images first which makes it easier for people to choose one. Yeah, I think those are the big ones. The ones you've probably seen. Should I go on with the next one? So there's a few other tools I have I'll mention these quickly. You probably haven't seen those because they're a bit obscure. So one thing is once the images are associated with Wikidata items, what do we do with them? Well, one thing would be we can use them on Wikipedia. And it's nice because this is all very central. All the associated Wikipedia articles can find the image via the item. And there's a tool that I made that can do just that. And as you can see, there's about 43,000 pages on German Wikipedia alone where the Wikidata item has an image but the article doesn't. So there's a massive potential in improving Wikipedia through this and it's very, very simple and in quotes cheap to do. The next one is kind of going the other way. So here I can specify a category on comments and get a list of all the images in that category that are not used anywhere on Wikipedia on Wikidata or elsewhere. And that might find a new, a few new interesting images if you're looking for a subject area. And it might even inspire creating a new Wikidata item or Wikipedia article because there's a great image of an object and you think, hey, we don't have an article for that. On the next one, this is the last one I'm presenting. This is kind of the Flickr fire hose. Basically, this is called Flickr free. It shows all the latest free license images that were uploaded to Flickr, which have a lot of noise in them but eventually if you reload it occasionally, you can find a few nice images that we would want in comments and maybe add to articles or create articles or active Wikidata. This we need to filter down somehow if it's going to be a popular one. Maybe we could combine this with a ranking tool so that we only see high quality images which would probably get rid of a lot of noise there. And that's my part. Victoria. Okay, thanks a lot. So just a few more words on that. On exactly adding Flickr images to the comments, we did a small scale experiment where through computer vision we were trying to match and trying to understand how many Flickr images, Flickr comments are already on Wikicommons and the number is extremely low. So thanks Magnus for this tool because I think it is going to be very useful for enriching the comments. So thanks also Magnus for opening up the discussion for future parts. So just to wrap up basically what we showed today is ongoing research on image recommendations for Wikidata items. We can definitely improve this machine learning model and use even more sophisticated models for this image recommendation and continue with the evaluation. This is definitely a future part. And something that Magnus mentioned is actually we are doing a lot of work on Wikidata but the same efforts can be transferred to Wikipedia to try to make Wikipedia more visual especially for smaller article or language. And yeah, finally also we can leverage these efforts also to acquire more free license data from external sources such as Flickr and always and also thanks to one of the new tools of Magnus we can keep inspiring readers and editors through images so that they can create and share more knowledge. And with that I thank you very much and I'm waiting for your questions. Thank you so much Miriam and Magnus for the talk. I think we have a few questions from IRC so I'd like to ask Baha to relay them. Hello, we have three questions so far from IRC. So the first one, they're all for Miriam by the way. The first one is from Aaron and he's asking what kind of processing power is necessary for running the kind of image processing algorithms we really want to use to sort through comments and what kind of processing resources should we be investing in now Thanks, Aaron, for this question. So I think the processing power for example for reusing the quality model if we want to embed this quality model somewhere is not so huge because it's basically this is just about classifying images it's just about testing. So we don't need a lot of processing power for that. If we want to build new image models then we really need to invest more on graphic processing units definitely. Open source libraries are available such as TensorFlow to process this image seamlessly using multi-tread etc. So having GPUs would really help a lot if we want to train new models similar to the quality model. I'm not sure if I answered your question, but yeah. Okay. He says great. So moving on to the next question this is from Giovanni and he's asking what weights did the learning to rank model find when you trained it? Did it give more power to quality or to relatedness? Yeah, this is something I wanted to add but for lack of time I couldn't add. So it was giving essentially more or less an equal weight to both relevance and quality a little bit higher depending on the setting but in general a little bit higher for relevance because I think as much as I like ranking images by quality probably the most important thing even in image search is to have good images representing the item you're looking for and then if you have also beautiful images is better but definitely relevance is probably more important. Okay, great. And Giovanni is saying love which you shoot me. So cool. So that's that. And I think Dario is saving his question for later. Okay. Yeah, we're going to have more time at the end of the showcase. So with that I'm going to pass it on to Aaron for the second presentation. All right. So I will get my screen share started. Okay. All right, today I'm going to talk to everybody about backlogs and with maybe a specific focus on how the internet dumps things into Wikipedia and how we deal with that, where it works and where it's not what we can do with machine learning. Before I move on with the talk I've got to highlight one of my collaborators, Sumit. He's a member of my team. I've been working with him for several months. When I get to the end of this presentation I'm going to talk to you about a machine prediction model that we've been working on and that's mostly Sumit. He's been doing the hard work to make that work. So I'm Aaron Halfaker. I'm a principal research scientist at the Wikimedia Foundation and I lead a team that we call Scoring Platform where we build machine learning models to try and help Wikis work better. And I always like to highlight this little bit that I have on my page, think big, measure what you can and build better technologies. This presentation is going to be a little bit of thinking big and a little bit of building better technologies. All right, so this talk is in three parts and I'll talk should come in three parts. First I want to talk to you about how backlogs are everywhere in Wikipedia. Then I'll talk to you about new content review and how subtle judgment calls are kind of a difficult part of that. And then finally I'll give you a proposal that I'm working on for rerouting page creations to help out with some of these subtle judgment calls. All right, so when I look at Wikipedia I like to think of an iceberg. There's this beautiful thing that's on top, everybody gets to see it but it's really not a very big part of the story. If you just sort of look at the articles that are produced in Wikipedia you're not seeing quite a lot that goes into those articles production such as task routing, governance issues, socialization, moderation and quality control. Together all of these systems that are sort of below the water that most people don't see and how Wikipedia to operate is a sort of living system and they account for quite a lot of the work that goes into Wikipedia itself. And so a lot of that work ends up being backlogs. A lot of this work that's below the water level. So when I was preparing this talk I reached out to a few people who I knew I had a lot of experience working in Wikis and I asked them for examples of backlogs that they thought were really interesting. And I wanted to show you this email and Nick Wilson got back to me. This is user quiddity by the way which is just a pile of links to all sorts of backlogs across a whole bunch of different Wikis. And I asked them if it would be okay if I showed this screenshot of my presentation and he told me okay but please do indicate somehow that this is just the tip of the iceberg. So I want to make it clear that when we look at this iceberg what's in this email is just like that little tip at the bottom down there. So I want to give you a few examples of backlogs. So first I want to look at some task routing that people do. So in Wikipedia there's this term that we use for called orphan articles. So an orphan is an article with no links from other pages in the main article namespace. So essentially if we look at this little graph that I have on the bottom of the screen there are some articles that have a lot of links coming in and some articles that have none and I've sort of shaded them based on the number of incoming links so it's easier to see. You can see these two articles on either side have no incoming links and those are orphans. So if somebody is browsing around Wikipedia and clicking on links to find new and interesting stuff to read they'll never arrive at these articles. So it's better if we find some place that we can link to these articles from. So let's say that you're interested in Brittany, France and you're looking to clean up some of the orphan articles there. Well there's a wonderful page or a set of pages in French Wikipedia that have lists these backlogs of articles that are orphans that need links made to them that you can scroll through such as this one for the Project Brittany orphan articles and this list happens to be maintained by Dickenspot. By the way Dickenspot is maintained by J.R. Cortoyce. I'm sure I'm pronouncing that wrong. I figured that it was better that I just Americanized that than trying something that sounded fancy but was probably still wrong. Okay, well let's look at another backlog. Let's say for example you want an administrator to consider banning a vandal in Wikipedia. Well there's a lot of backlogs slash notice boards for getting a hold of administrators and having them look at specific problems. This is a nav box for all of the administrator notice boards that are around Wikipedia or English Wikipedia. And so we want to talk about vandalism so we're looking for the vandalism notice board. They call this the administrator intervention against vandalism notice board. So you can see here there's a bunch of reports of people stating that there are these likely vandals and administrators should look at them and consider banning these people. There's some instructions for how to engage in adding something to this backlog. So you use these fancy templates, either the vandal templates or the IP vandal template. And then a bot will curate these, this backlog. HBC, AIV, Helperbot number five goes through and it cleans up the items on this backlog when an administrator takes an action and decides not to take an action. And you'll notice too in this edit comment that I've copied into here it'll actually make notes of how big the backlog is at any particular point as it's cleaning it up. Oh yes, in this spot this AIV Helperbot is run by James R. And so you can imagine, like I was just showing you the pile of backlogs there's actually a pile of bots that are responsible for maintaining these backlogs and that's a really critical part of Wikipedia's infrastructure is this sort of army or legion of bots that are keeping the backlogs clean and making sure that people have good access to work logs that they can go through. So one more that I want to show you is a backlog that you can use if you want to do some cleanup work on Wikipedia. There's a whole bunch of different types of cleanup work that you can do such as rewriting sections to make them more clear, adding citations. You might note that these templates that appear at the top of articles such as this one that says this article may be too technical for most readers to understand. If you want to go find articles that are too technical for readers to understand so that you can go rewrite those sections, well we've got a backlog for you. So there's this tool that's run on the Wikimedia Toolforge called BAMBOTS which is not exactly a bot because it's not editing the Wiki although I bet you at some point in the history it was editing the Wiki. Now it's producing pages for you to scroll through. So what we want is to look at some cleanup work so I'm going to click on this Cleanup Worklist bot, Wiki Project Cleanup List. I'm particularly interested in Africa and I want to go clean up some articles that need it in the sort of Africa content space. So I'll choose Africa from the Wiki Project List here. And then I can get to a list of articles that need citations. And you can see that the first one on the top of the list is the example that I was just showing us earlier. And so if you want to do this work you can go through this list, clean up the articles, remove those little templates from the top of the page and the tool will then remove these things from the list and we can continue the work of getting Wikipedia cleaned up. One more thing I want to show you that BAMBOT does is it'll tell you exactly what the progress looks like for a section. So this is a statement that BAMBOT makes about the articles within Wiki Project Africa of the 81,904 articles in this project. 25,993 or 32% are marked for cleanup with 93,583 issues in total. Sounds like quite a lot, right? Well, remember that iceberg that we were looking at earlier and that little tiny bit which was actually a huge list that Nick had sent to me. Well, Wiki Project Africa is just a little tiny sliver of that little tiny bit of the iceberg. And so really I just want to give you the sense that these backlogs are absolutely enormous and they make up a lot of the work that's going on in Wikipedia right now. Okay, so part two. I want to talk to you about a specific type of backlog. Well, that's really important for running a Wiki that's open to the Internet. Specifically, I want to talk to you about new content review and the subtle judgment calls that are essential for making that work. And so for this, we're going to talk about a much larger chunk of this iceberg, a huge amount of work that goes into maintaining a Wiki. So essentially we have the Internet dumping, oh my God, a huge amount of new content into Wikipedia at all times. So in English Wikipedia, that's 160,000 edits per day that need review. One to 2,000 new articles are created every day and this is excluding redirects and drafts and all that sort of stuff. There's about 40,000 drafts that need review at any given time sitting in the draft namespace. There's somewhere around 330,000 articles that need cleanup and there's 1,600 new editors per day that are coming in with this flow of new content that will need to be trained and socialized. So if you're a Wikipedia or a Wikipedia, I guess, you might wonder, well, what about the quality of this content that's flowing in? And so generally the strategy is to put some sort of barrier in place, assign some curators to review the content as it comes in, and then their job is to make sure that the good stuff stays in Wikipedia and the bad stuff is removed and goes somewhere else. And they might form a backlog around the type of work that they're doing there. So Vandal Fighters, the editors who review new content as it's coming into Wikipedia to make sure that no vandalism gets saved in Wikipedia, they're essentially looking for edits that look like this, this edit, edit, all caps, llamas, grow on trees, and they use their human eyeballs to make judgments about it to say, hey, that's bad, and label it as bad stuff so that it can be removed from the Wiki. They've generally got along pretty well with tool developers, people like Magnus who are developing tools to help Wikipedians do their work. And so there's a huge array of tools that are used to help people do this Vandal Fighting work in Wikipedia, such as KluBotNG, which is a fully automated bot, Huggle and Sticky, which are semi-automated tools, and the Edit Review Improvements project that the collaboration team at the Wikimedia Foundation was working on to improve the recent changes page on Wikipedia. So essentially, here's how the world works out for people who are doing this type of edit review. So you've got the internet that's dumping a fire hose of about 160,000 edits per day. The first line of defense are robots, and these robots are using machine learning models to separate the obviously really bad stuff from the not obviously bad stuff. The second line of defense are the group of Vandal Fighters who are using semi-automated tools like Huggle and Sticky that also use machine learning models to separate the bad stuff that remains from the probably good stuff. And then the final line of defense are people who are using watchlists in Wikipedia, tagging articles that they're interested in and seeing when they get edited, getting notifications when they get edited, to separate the subtle problems from the stuff that's actually good and should stick in the encyclopedia. And you can see that as things progress forward, these subsequent filters deal with more and more difficult problems. And fun story too, when you do an analysis of the time it takes an edit to be reverted in Wikipedia, you can actually see the work that these groups play out in a temporal rhythm that first the bots revert Vandalism, then the Vandal Fighters show up, and then the watchlisters show up and do their work. So I want to talk about these watchlisters for a moment because I think, as I was saying earlier, they have the hardest job when it comes to edit filtering. And so to demonstrate the difficult job that they have to do, I want to have us all play a game where I can't hear what you're saying, but if you want to type an IRC then I can respond to that. This game is hopes or not. Is this edit good or bad and can you tell? Okay, so we're going to look at an edit here. I'm just going to show you the diffs because this is what Vandal Fighters see, or Ronnie Hazelhurst. I'm not quite sure who that is, but we can see in this edit somebody's adding a song here to the pop song section of the article. What do you think? Good edit or bad edit? From a Vandal Fighters point of view, this looks fine to me. I can't really say what's going on here either way. Turns out that's a hoax, did not sing that song, needs to be reverted. Okay, let's try one more. The Monarch Butterfly. All right, now I actually know what a Monarch Butterfly is. This edit is saying that it was nominated in 1989 as the national insect of the United States and is the national insect of Canada. Well, that seems likely. Monarch Butterflies are pretty cool. It's a state butterfly of Minnesota. Why not the national insect of Canada? No, hoax. Actually, totally not true. Needs to be reverted. Okay, one more. This one we might have some expertise in. This is the article about Wikipedia, and in this edit Tarzan ASG adds the note that, so I have to read the sentence as a complicated sentence, except for a few Vandalism-prone pages that can be edited only by established users or in extreme cases only by administrators. Every article may be edited anonymously or with a user account, while only registered users may create a new article. And this editor notes that that's only true in English Wikipedia. Only in English Wikipedia. Yep, actually, that's true. In all the other Wikis, anonymous editors can create pages. Okay, so the way that this works, the way that Watchlist work is essentially if you have some interest in a particular topic, some interest or some expertise, then you can click this little star thing. So I'm interested in process modeling. So I can click that little star at the top of the article, and then when somebody makes an edit to this article, I get a notification, and I can go to this special page on Wikipedia called Watchlist and see who's been editing the articles that I've starred. And so essentially this works as a routing system, where the probably good edits that make it past the Vandal Fighters then get routed to people who have some subject interest in the edits that are happening or into the topic of those articles. And this is insightful, because routing based on interest and expertise makes subtle judgment calls easier. Because I know Wikipedia and I certainly know how access rights work in Wikipedia, I was able to judge that final edit is good, whereas I had no idea who sings what pop song or honestly who thinks the monarch butterfly is very important. And so I really want to highlight this. They're doing arguably the hardest work of quality control in Wikipedia and they do it with this routing system. Okay. So when I talk to you about edit review process, I want you to think about it as a sieve. It's essentially a series of filters that allow content to flow in Wikipedia and each filter sort of solves a different problem. Okay. Now I want to talk to you about another new content filter that we have in Wikipedia. This is the new Page Patrol filter. The fire hose that they're dealing with is a fire hose of 1,500, it's usually between 1,000 and 2,000 new articles per day. So there's a group of people that we generally refer to as patrollers and they're responsible for splitting the good stuff from all of the problematic content. And so it's one group of people that have to deal with the really bad stuff, the sort of bad stuff and the subtle problems. And so this is a really difficult problem for them to do. So essentially what they ended up doing to make sure that no subtle problems ended up sticking in Wikipedia is they plugged the drain. They made it so that there's a big backlog that fills up. If nobody reviews an article, it just grows and grows into the backlog. And so after a while, they couldn't handle filtering out much of the good stuff. They couldn't really deal with the subtle problems. They could really only handle the bad stuff and the really bad stuff. And so the backlog started to grow. And after a while, it started to grow quite big and then they couldn't even really handle the bad stuff. They could only really handle the really bad stuff and a very small subset of the good stuff. Eventually the backlog in the state that it is today is absolutely out of control. And as you might imagine, the patrollers are yelling, ah, turn it off. And that's actually what's playing out in English Wikipedia right now. So there's a trial that's happening to limit the fire hose itself to reduce the rate of new article creations by not allowing newly registered users to create articles. So the Encyclopedia where, it's the only language Encyclopedia where we've restricted article creation to registered users. Now, not even newly registered users can create articles. You have to be an experienced registered user to create articles. And who could blame them? I mean, this problem is kind of out of control at this point. So really, when you look at the edit review process, I wanted you to think about it as a sieve, a series of filters. But when you think about the article draft review process, it's really a big plug drain that's overflowing because there's really only one stage of review that happens. All right, so on to part three, the bit that I propose for changing this. So I really think that subtle problems are the biggest issue here. I think this is really where the problems originate. As you could see earlier when we were playing hoax or not, subtle problems are really hard to deal with because they look like good edits to anybody around, but we don't want hoaxes in Wikipedia. We want good information in Wikipedia. So when it comes to edit review, routing was the key strategy for dealing with these subtle problems and making sure that we could get subject matter experts to look at the edits that need their review. But for draft articles, that's really hard. When somebody creates a new article, how do we know where to route it? You can actually add an article to your watch list before it's been created by adding the title to your watch list, but that's a complicated problem, especially because you might not be aware of all the articles that somebody might create. Specifically, if somebody is going to create a hoax article, you might not have thought of that title before they got there. So how the heck can we do routing here? This is where I think that we can take advantage of ORS, our machine learning as a service system and use some machine classification in order to address this problem. So maybe we can use ORS to help route new article creations by interest and expertise. So essentially the idea is we want this flow of probably good new articles to come into the ORS system, and ORS would essentially help them or help route these articles to people who can make subtle judgment calls about the quality of these articles, whether they're about notable people or whether they're a hoax. So essentially this is how we formalize the problem of topic modeling in ORS. We take the first version of an article that's saved in Wikipedia. So on the left-hand side, you're looking at the first version of the article about Alan Turing, a famous computer scientist. We send that to ORS, and we want it to predict the sort of topic space that this article existed. So it's a biography, it's about somebody who's a technologist, so it should be related to technology and philosophy, and Alan Turing is a European, so we should probably say something about Alan Turing being related to Europe, and you could imagine that if you were interested in reading biographies about technologists from Europe that these topics would help you find this article. And so we could use those to actually route this to people. So luckily for us, well, unluckily the category structure in Wikipedia is not very helpful for this, and I'll leave that to outside reading to learn why that's the case. But luckily, there is a category structure that we've found to be incredibly useful for routing new page creations, and that's the category structure implemented by Wiki projects. So Wiki projects are groups of subject matter experts who are interested in certain topic spaces on the Wiki, and they tag articles that are within their subject interest space. And so we could maybe use their tags and a machine learning model to learn topic spaces and help route articles to these subject matter experts. And we found too that there's this thing called the Wiki project directory in English Wikipedia that's really useful for connecting high level topics to specific Wiki projects. So as an example, if you look at the middle level of this directory structure that's in the Wiki project directory, these are the topics that come out. For example, under culture in the arts, we have arts music, arts performing, arts plastic, arts visual broadcasting. Under geography, we have bodies of water, countries, Americas, Europe. We have history and society, military and warfare, STEM, science, biology, technology. These are really useful topic spaces that we can apply using the Wiki project categorizations that are already applied in Wikipedia. And so just as an example for Alan Turing, we get culture in the arts, linguistics, biography, philosophy and religion, Europe, military, warfare, science, engineering, information, science and technology. So that's a pretty solid set of categories for Alan Turing. And it's another example for Emily Dickinson who's a famous writer. We get arts visual which is where writing is categorized, linguistics, biography, America's history and society and women's health. So, I'm going to wave my hands a little bit about this and sorry my machine learning friends. In order to build this model, we use the 300 cell vector based on word-to-vec. Specifically we use the Google news corpus. We just use that because it was handy and easy. In our future work, we're probably going to build our own word-to-vec based on Wikipedia content itself. And the machine learning model that we've built turns out it's very, very effective. And so we're getting micro and macro statistics in the range of ROC 9.5, PRAUC 7.5 to 7.7. So, essentially if you don't read model fitness statistics, this is really good. And it works really well even though we're only looking at the first revision of articles. So, this is how I envision the new article draft review filtering process of the future. So, there's already a model that we have deployed. We deployed in July of 2017 that we call draft quality. And it's intended to be the first line of defense. We've trained it on spam, attack, and vandalism pages. It essentially allows patrollers to filter off the most obviously bad new article creation so that they can be deleted right away. And then everything else that's not obviously bad to a queue that we can use our draft topic model, the one that I was just describing, to categorize by where they land in this sort of Wiki project mid-level directory structure. So, say for example, editors from Wiki project computer science can overlap culture.biography with stem information science to get biographies about computer scientists and review those new article creations as they come in. To look for hoaxes, to look for subtle issues, or maybe even to look for new users that they want to support. And hopefully we'll have that draft topic model released by the end of this month. It might be the beginning of next month, depending on some of the issues that we're working through right now. So, another thing that I think is really interesting for this topic model, or this view on Wikipedia, a new article creation review, is this project that's relatively new at the very least. It seems to me that it's getting much more popular recently. The Wiki project article rescue squadron, which is specifically tasked for digging through new article creations that are about notable topics. But they need help. They need content work before they'd be acceptable as Wikipedia articles. We think that they'll be very interested in using this topic model, too, to dig through the backlog for articles that they have specific interests or expertise in. And another person that I want to highlight in this talk is Rosie Steveson. Good night. So, I had a conversation with Rosie Steveson at Wikimania 2017 about this topic model. This is when this topic model was just a glint in my eye of something that I wanted to build, something that I thought would be useful for Wikipedia. And I asked her what she would do if she could dig through the new article backlog and be able to look at only those new articles that she was interested in or had some expertise in. Well, so Rosie Steveson is one of the organizers of Wikiproject Women Writers and so she told me that she would look for new articles about women writers so that she can help make sure that those articles get to stick in Wikipedia and that the newcomers who are creating those articles would get supported. And so, essentially the way that I look at this is that editors like Rosie Steveson plus new editors who are creating articles equals more new editors sticking around in Wikipedia and more new good content that we need. And hopefully, actually, less hard work for page patrollers because page patrollers would then not need to task themselves with these time-consuming and difficult judgment calls. I should add a disclaimer here. Rosie Steveson has not reviewed or endorsed this presentation or the modeling project that I have described. I had one conversation with Rosie Steveson at Wikimania 2017. She's been doing her work inspiring so I thought she made for a great example. She deserves some of the credit for this but none of the blame for any of the things that I'm talking to you about. Okay, so moving on in conclusion, so we have these models the draft quality and the draft topic models and we're just about to get the last one deployed in production by the end of this month. Now we need production integration because just having these models out in the wild doesn't mean that Wikipedia is going to reduce them. One of the things that we're targeting is the Wikimedia hackathon that's going to be in Barcelona in May the 18th through the 20th. I'd really like to talk to people about experimenting with bots or maybe some tool forage tools that can take advantage of these models and maybe give people a window into how they could use topic modeling to dig through the new page backlog. I think that there's a lot of good options for adding information to the tool, which is one of the things that the current page patrollers are using in English Wikipedia. And finally, the edit review improvements project added filters on top of recent changes. We could use these topic models to extend those filters and to look at a nice topic filter view of new page creations as they come into Wikipedia. All right, that's all I have for you today, folks. Thanks. Thanks a lot, Aaron. Very true random applause and I guess we have two questions. There's one from IRC. Yes, Tillman is asking, a system like this is already in place on English Wikipedia with article alerts and new art bot and he gives an example link. And the question is, did we evaluate the improvements we gained from the new more sophisticated machine learning approach? No, so I wasn't familiar with this article alert spot. I guess it's hard for me to comment on what the differences are without knowing about this in advance. But just clicking on the link to the page that Tillman has here. Yeah, I'm not going to be able to comment. I can't even figure out what I'm looking at here. So there's another follow-up question from Dario. The draft topic model is really an article topic classifier nothing specific to drafts, correct? And what are your thoughts about cross-language generalization issues? So this particular model was trained to be able to predict the ultimate topics of an article based on its first version. It works really well. In fact, it works better, slightly better at predicting the topic of an article based on more like a more complete article rather than the first version draft. But we didn't train it based on that. We could totally train a model based on a more complete article in order and we'd probably get a little bit more fitness out of something like that. So with cross-language generalization, that's an interesting problem. WikiProjects and the WikiProject directory afford us a really nice human readable window into the topic space in Wikipedia. But it only really works for English Wikipedia. I think that there are some interesting options that we might try to employ by cross-training a model using WikiData site links where we can actually associate the articles in English Wikipedia with the articles in other languages. I think that there are also some really interesting options to use WikiData itself as the source for human understandable topics. As an example, right now, there's nothing in the WikiProject directory that would separate men writers from women writers. Really, the topic model would really just help you find writers. But WikiData would be very good at that. And so I think that there are some good options for using WikiData statements themselves as the classes that the model would try to predict. Makes sense. Thank you. I think Tillman had a good follow-up comment on the question before. Yeah, we can take an IRC and just saying that their approach is like a CSS search terms like regexes. It's much simpler, it's kind of a bit more hand-created, but it seems in place. And do we have other questions or can I ask a follow-up question? So I think what was useful to look at is to see how much it's been taken up by editors by editors. I think that's imagine that it's about productionized and be used. You need to find these editors who are experts and I guess there are some interesting questions how we exactly achieve that. Right, so it's, I mean it's hard to say from this tool because it's, I mean to my simplistic view of the link that you sent to me, you know, I'm not seeing like a nice integration with this, it seems like somebody would have to go out of their way to find something like this whereas I would much rather deliver these predictions directly to the people who are doing page curation right now or maybe to have a bot that delivers these essentially topic recommendations to wiki project pages themselves that would sit next to their task list. If that's something that's already in place then I'd like to talk to some of the people who are using this current system to see how the delivery mechanism is working. Generally when I'm talking about something like this though I'm more concerned about getting a system that fits into Wikipedia effectively than the fitness of the actual prediction model. If it turns out that Reg X's work pretty well then let's use Reg X's but let's definitely get the system in place because we're certainly struggling right now with these time consuming judgment calls. Yeah, definitely. Baha, are there other questions from IRC for Aaron or for Miriam and Magnus? Alright, that's all for now. Okay, I think I had one question parked for Miriam that I wanted to ask before and that's basically did you have any thoughts to share about the selection with two classes that you focused on? I know we had some discussions when we set up the project in the first place and I was wondering if you could unpack your thinking about why we select these two classes and what's special about these two classes or we learn about these two classes compared to other classes that we may have on Wikipedia? Yeah, so in general the fact that we selected specific classes was a suggestion from Magnus who has been doing some work in this direction and basically the fact of having specific classes that editor might want to go through would help a lot the selection process because some editors might be more experts on something and also the fact that you don't have 40 million items to fill but just maybe 200,000 helps in general and psychologically. How we selected the classes was basically according to I would say two criteria. One is the availability of images. So the two initial classes were monuments and people. The one was the availability of images and talking to Layla basically she suggested to use monuments also because Wichita monuments brings a lot of images to the commons and so it might be easier to fill this gap and for people in general I think we choose people because in general biographies are very important topics in the space of Wichita projects and so having a well structured and well and rich Wichita item about people can help filling better biographies I would say. So this is where the reason why we chose them and what I found the main differences between the two classes is that while monuments is a class where it's kind of easier to fill this gap to find relevant images because of Wichita monuments finding images finding relevant images of people around the commons around Wichita commons is not easier. It's not easy if you want to use Flickr Flickr it has a lot more images about people than commons. So I think this is also an incentive to use more Flickr and also that definitely while ranking monuments according to relevance is easier ranking people names so ranking people according just to the name is not possibly the best way because you have many people who have the same name so it's just so there is a bit of work more work to do in that direction. Thank you. Maybe one additional note that we we were discussing in memory at the very beginning of the project how we focus on Wichita even though part of it also has not been shown there are plenty of opportunities to back forward images to Wikipedia when it's appropriate and we know there are many cases of biographies that lack good free license images and so I imagine this could become like the beginning of a pipeline trying to identify those biographies for which free license illustration is needed to just say the scope is broader than the applicability of these methods is scope is just Wichita sweet that's all for me anything else from IRC Kelvin just share the link if you guys are interested but no more questions alright so I think we're going to close it here thanks a lot to our speakers and to people participating here in San Francisco and remotely on IRC we'll see you all on the 21st of March for the next edition and have a great rest of your day bye everyone bye