 Okay, we're live. Hello everybody and welcome to the September 2019 Wikimedia Research Showcase. My name is Jonathan Morgan. I'm on the Wikimedia Research Team and I'm joined today by possibly the largest number of research team members that we've ever had in the hangout during a showcase. So first off, we have Isaac Johnson, a research scientist who will be running the monitoring IRC for questions today. We have Diego Saez-Tranbert who will also be participating more quietly in the background. We have Miriam Reddy who will be giving the first presentation today. And last but not least, we have Martin Gerlach, who's the newest research scientist on our team. He just joined within the last month. Martin, say hi to everybody. Hi. Excellent. So today is going to be a kind of a different sort of research showcase. We're going to have two internal presentations rather than external speakers. Miriam is going to be speaking first. Her work that she'll be presenting is on the development of a taxonomy and a machine learning model for citations that are needed in Wikipedia articles. And then I will actually be giving a work in progress presentation on a project that's on patrolling on Wikipedia, which will also focus on questions around content integrity, but in a slightly different angle. So I think this will be a fun showcase today and let's get it started. Without further ado, Miriam, do you want to take it away? Sure. Let me share my screen. Excellent. And I think this is one. Do you see the slides? I see the slides within the tab. And now I see the presentation both slides. Great. All right. Thank you very much, Jonathan. And so good morning, afternoon, or evening, all depending on where you are. I'm very happy to be here. Contributing to the showcase. I'm a big fan of the showcase. So very happy to be here giving this talk. So what I'm going to talk about in the next 25 minutes is a recent project that we did to algorithmically assess Wikipedia verifiability and eventually develop machine learning models that can automatically assess whether a sentence in Wikipedia needs a citation or not. This work is part of academic paper we published at the web conference this year together with the Beznik, Jonathan and Dario. So thanks to the amazing co-authors for the amount of work that they put in this project. Actually, Jonathan is also giving a talk afterwards. So this project is actually part of a broader strategic direction of the research team at the Wikimedia Foundation. And also, Jonathan presentation is part of our effort in what we call the content integrity direction. So there are a number of efforts in the team that we're doing to study ways to identify and address threats to content integrity, coordinated attacks to information quality, disinformation spreading and so on. And part of this effort is a set of tools and research work that can help contributors, editors, monitor violation of core content policies and assess information reliability. One of these core content policies is Wikipedia verifiability. Verifiability policy is very important in the context of information integrity, because it mandates that every material that has been challenged or is likely to be challenging Wikipedia must include any inline citation that directly supports the material. So basically, what these policy mandates is that sensitive information in Wikipedia should be backed by reliable sources. So citations to reliable sources as the key mechanism that we have to enforce and to monitor verifiability in Wikipedia. Reliable sources in general means something closer to a scientific article or a news article from established assets rather than self-published research or blog. So I'm sure you all have seen all these citations, citations all around Wikipedia articles. However, I'm sure you've also have seen that not all sentences that should need a citation in Wikipedia actually have a citation. And when these happen and editors monitoring and watching pages notice that our sentences missed the citation, they add this citation needed tag. The citation needed tag is a very powerful tool to monitor verifiability, because it's a sort of double message. It says to other editors monitoring the page, hey, please, if you know about a reliable source for this sentence, please add it to the page because we really need reliable sources for this sentence. But at the same time, it's also a message to readers that basically saying, hey, if you read this sentence, please don't trust it 100% because it's not at the moment backed by a reliable source. And so citation needed tag is a very powerful tool, as I said. And in practice, it's actually a template in Wikimarka language that on which you can add a bunch of metadata, including the reason why the sentence should have a citation. And while editors are doing an amazing job in monitoring information quality in Wikipedia and monitoring verifiability policy, the scale of the amount of content that get ingested into Wikipedia is huge. So for example, this is a recent changes feed. Every new line, it's like this is the real speed that you see showing up is a new edit. And so the pace of the content ingestion in Wikipedia is huge. And even for the large editor communities, sometimes managing content monitoring becomes impractical. And so having this in mind, we are machine learning researchers. And we know that for sure machines are dumb. But if they can do something is to process data at scale. And so putting all this together, we thought, why can't we help editors scaling up the fact checking and content monitoring processes by designing machine learning framework that can automatically flag whether sentences would need citations and also provide a reason for whether they need a citation, according to basically modeling a little bit that citation needed template. And so when we thought about it, we say, oh, that is a fantastic idea. However, we have a little problem that while the verifiability policy is pretty clear for every editor in Wikipedia, the definition of what needs to be cited is not exactly systematic. And we said that machines are dumb. And so they just need, they just understand systematic definitions. And so the definition of something that has been challenging or likely to be challenged is too vague for a machine. Editors then discuss it and implement this definition with policies and guidelines. But we don't have systematic definition of what needs to be cited. And so our way to go to build this machine learning models that can automatically detect whether and why a sentence in the citation was to first do a large scale qualitative analysis to extract systematic definition of what needs a citation and produce the citation reason taxonomy that can then inform the machine learning models that will be able to surface sentences needed citations. So I'm going to start with the qualitative analysis first and talk about how we extracted the citation reason taxonomy. So the idea of this part of this project was, again, to come up with a systematic definition of what needs citation in Wikipedia and what doesn't. But we didn't want to extract this just out of the blue. We wanted this to be really grounded in rules, policies and needs of the Wikipedia communities. And we also we didn't we wanted by design to be this taxonomy to be as multilingual as possible. So not only considering English Wikipedia, but try to include more language communities. One of the major problems that we have is that citation guidelines, I'm not sure if exactly is the right word, but they're not exactly ready to be systematized into a taxonomy. So among the verifiable policies, you have two different pages. One is you don't need to say that the sky is blue. And the other is you do need to say that the sky is blue. And both are valid, right? So this is the material that we are confronted with, when we are trying to find a systematic definition of what needs citation. So there was a lot of qualitative analysis to do at first. And so what we did in terms of methods to extract this taxonomy was to first perform this basically read through the citation guide guidelines in three languages, and then try to understand them. And also we asked for direct input for for editors from editors of these three language community through data annotation task, and then wrap up all this information in the final taxonomy. So the first part was really to study in detail the citation policies from three different Wikipedia's three different languages, that languages. And we speak the three languages. So what we did was basically to extract common themes among all these citations policies that are shared among the three languages, and then try to find a set of general rules, summarize them, and then group them into macro categories that would kind of reflect the macro group of reasons why people would add citations in Wikipedia, for example, everything related to technical knowledge, what was was group in in the same group. The second step was to actually so studying policies was was one way to understand what needs a station but also wanted to have the direct input from editors of these English, French and Italian communities. So we did this by designing data annotation task using the wiki label, the wiki labels platform wiki labels platform was built by Aaron health hacker a few years ago, and it's a sort of mechanical Turk for that is wiki specific. So basically, it allows Wikipedia editors to annotate Wikipedia specific data. And so we designed this task for editors where we asked them to look at a sentence highlighted in an article and tell us whether the sentence in the citation or not, and also provide a reason for this choice. And the reason was expressed in a free text form. So then we took this input from the editors from individual editors, about 100 of them gave responses from the three language communities. And then we combine them with the learnings from the previous step of taxonomy building. And we came up with our final taxonomy that contains seven reasons for adding a citations and for reasons for not having a citation. So what we found is that in general, you should add a citation if you see a direct quotation of someone saying something statistics about some facts, controversial claims, opinion that a person gave about something, a private life of a person, and scientific and historical facts, all these aspects require citation in Wikipedia. Reasons for not adding a citation is in general, if a sentence is about common knowledge, or if it's about a plot or a character in a book, you don't need to really cite every time the book. And also to avoid essentially citation clutter, you shouldn't cite a sentence that is referenced elsewhere in the article. And this is especially important in the lead section. So in general, you find very few citation in the main section of the article because all the information in the main section is unexpanded in the rest of the article. So you have citation there too. Okay. And so now that we had a much better idea of what we were actually looking at, and what was the problem, we were then much more ready to design machine learning algorithm that could help detecting sentences needed in citations. So we divided the problem of detecting citation into two tasks. The first one is detecting if a sentence is in citation, and then the second one is building on top of that to detect the reason why the sentence is citation. So the first task is eventually a very simple machine, conceptually, very simple machine learning task is a binary classification task that basically essentially it's a training, it's a classification problem that should distinguish whether between sentences need in citation and sentences not need in citation. So the workflow for this model is pretty standard. We are going to collect some data, find the best way to model it and then look at the result. So the first task is to collect data. Obviously, you know, machines need to be fed with a lot of data. And our initial idea to collect data for the citation task was to annotate it manually, even more data. But then we realized that a lot of the data we were looking for, we had it already, because basically it was already present in in Wikipedia articles. Because if you think about it, we could consider as positive examples for our binary classification task, the statements that have a citation are writing Wikipedia and as negative examples, statements without a citation. So basically the model will learn will learn how to distinguish between sentences with citation and without citation. So for a new sentence, it would say whether it looks like a sentence that has a citation or should have a citation or a sentence that shouldn't have a citation. And so in order to have the cleanest data possible, we focus on a specific subset of articles that are available in many Wikipedia editions, that is the set of featured articles. These are the best article in Wikipedia and they have to pass a lot of check in order to be promoted as featured articles. And they are definitely complying with the verify ability policies. So taking data directly from this article would ensure us to have very clean data and very separable positive and negative examples. And then we also took some other examples from other articles sets to basically check the generalizability of our models trained on featured article. In the paper and in basically in this whole presentation, we are focusing on English Wikipedia. This is just because this is our common language. But we already have model ready for French and Italian too. And in theory, same data could be collected for any language addition that has the notion of featured or quality article. And so now that we have found ways to extract data at scale, we could find the best at the best way to model it. And so we decided to implement the citation need prediction task using a recurring neural network that basically takes as input a sequence of words that are basically the words in the sentence represented using word embeddings is basically this is a numerical representation of words. For the word embeddings gigs out there, we use the globe word embeddings, but we actually in the newest models, we are using fast tax because we have fast tax trained for every language or for most languages in Wikipedia. So this structure is by nature, adaptable to other languages. So we don't we don't only use as input the sentence text, but we also use as input the words in the section title, because we have seen from our qualitative analysis that the section plays an important role on whether the sentence need a citation or not. And so we decided to add section information as well in our model. And then the output is label that's basically says whether the sentence needs citation or not. Now, does this model work? The answer is yes, pretty well. These two spaces of citation needed and non needed are very much separable in especially in the cleanest data set that is the featured article data set. We reach up to 90% when we add also the section information. So we have seen this importance of section, both from a quality perspective and from a quantitative perspective. Because of how we model the how did we design the network, the network is using attention mechanisms so we can also understand what is the network is looking at when assigning citation and not needed score. So we see that a lot of words that the network is focused on are is these are factual verbs or reporting verbs like estimated or say specific domain specific term like scientific terms and these kind of things. This is just very briefly this model is not overtraining on featured articles. So basically when we over fitting sorry on featured articles because when we apply these to other sets of articles of lower quality, the accuracy remains comparable to the models trained on the same articles. So we are pretty confident that we can apply this model to other sets of articles. Okay, and so now we have built a model that even a sentence can say whether it means the citation or not. But as we saw in the citation needed template, oftentimes editors when adding citation in the template, they also add a reason for that. And while we have the attention mechanism that can report the words that are most important to take it make a decision. This is not exactly human readable and this is not exactly comparable to a real reason why a sentence in the citation. And so what we decided to do is to basically design a second classification task that given a sentence with citation can automatically classified according to the reason why in its citation within choosing one of the reason in the taxonomy, the citation reason taxonomy that we develop. So this become a very simple multi class classification task. So that's given a sentence with a citation can say, Oh, this is citation because it's a historical fact or scientific fact or private life of a person, et cetera. And so because the task is becomes very simple, also the data collection step becomes very simple. So obviously to feed such a multi class classifier, we need again a lot of data. And so we cannot really resort to asking expert editors to annotate data according to the reason why they need citation. And so we decided to perform a large scale data collection task on Mechanical Turk. We simplify the task so that basically micro workers already know that there is a sentence in the citation, and they just have to categorize them with one of the seven categories coming from the citation reason taxonomy. So they see a sentence they say, Oh, this is proud private life of a person. And so thanks to the simplified version of the task we could classify, we could annotate about 4000 sentences with citations that are coming from from featured articles. Just to double check that we were not doing something very strange, we check whether micro workers would agree with experts on this task. So we ask both expert and non experts to perform the same task on the same set of sentences. And we saw that the agreement is pretty high is around 0.7. However, what we also discovered is that when when experts and non experts were disagreeing, it was just because these categories, these citation reason categories are not mutually exclusive. So basically, citation can be about sorry, a sentence can be about statistics of a life of a person or it can be about opinion about a scientific fact. Okay, now this opinion about historical fact or something like that. So actually, more than one category can be assigned to the same sentence and therefore, thanks to this comparison, we could discover this property of our citation reason categories. I'm going to promise keep how we model the data. But in natural, we fine tuned the existing citation we needed model by essentially replacing the last layer that had before had two neurons citation needed and dominated with a layer with seven neurons one for each citation reason category. And so this we had to do this because we had very little data. So 4,000 sentences pan over seven or eight categories. We added the other categories is not enough to train from scratch a neural network. So the results does this work? Yes, it could work better if we had more data. Essentially, what we saw here is that the citation reason prediction works very well for categories for which we have enough collected enough data. And so one of the future work in this direction is to try and collect more data so that we can have like really usable citation reason classifier. Okay, and so basically what we have seen today is I think a very successful combination of qualitative and and quantity of methods. Basically we could never develop the machine learning models to detect whether a sentence is citation and why we are having done a huge qualitative analysis to understand the context where we are we're operating it. And what is happening now, what is next for this project is that we essentially do two things. The first one is to try and apply this model at scale to understand the proportion of sourced content or badly sourced content across different segment of Wikipedia for sure different language editions but also different topics. So for example here we see that in early experiments we see that articles about medicine and biology are among the mass the most well sourced in English Wikipedia. And the other project that we're doing is that we have this code available and you will have all the all the links at the end. But obviously, you know, if you're not very familiar with TensorFlow maybe you cannot reuse this code. So we are actually in the process of finding the best design recommendation design and a recommendation to build an API around this model so that everybody can submit an article or a sentence and the API will return a citation needed score. Jonathan is actually the main lead is leading this project. So he is interviewing a bunch of people. So if you're interested in being part of this research and being interviewed by him in the future to give your opinion about this project please reach out to him. And I think with that I thank you very much and I can answer a question now or later tomorrow. I'm always around and you can reach me out in many ways. Thank you very much, Mary. Thank you. Isaac, any questions so far on IOC or YouTube? From YouTube there is a question from Sam Walton. Essentially, is there the possibility that the classification is biased for the types of articles which tend to be featured articles? I would be curious on top of that just the general for those of us who use featured articles as a training data set for machine learning models just what sorts of things should we be wary of? Yes. So Sam, thanks a lot for this question. It is possible. We haven't tested for intrinsic biases related to the topic of the articles. What we have found out is that these models are they can be applied successfully to random articles from various topic that we specifically sampled from various topics in Wikipedia that go beyond featured articles and where the accuracy in terms of detecting citation need is comparable or in the same range as models trained directly on random articles. So a good point to look at is actually which kind of biases we are introducing by looking at featured articles only in terms definitely of topics but also structure of the articles and we are sampling by randomly sampling articles but maybe these articles have some sections that are more or less expanded so we might be sampling from specific sections. So actually this is a very good point and something that if we are using more and more featured articles it's fairly simple to do but it's much needed. So thanks a lot for raising this. I have the question if you just maybe could talk a little bit more about the differences across the languages. I always think that's interesting to hear French versus Italian versus English whether you saw different patterns in the data. Yeah that is a good question. I don't have an answer for the moment but I have some links for you. Basically we have applied these models at scale to exactly to these three languages Italian, French and English. And so you will see that the distribution I can share it also on the page the link to this analysis. The distribution of topics that are more or less well sourced changes slightly across these three different languages. Yeah thanks. Awesome. I also have a question. Yes. Could you say a little bit about so how what are some of the potential use cases for these models that you that you have they've heard from people or that you're thinking of yourself. So yes. So there are a few use cases. One is specific to the tools that are trying to solve citation needed problems like citation citation hunt for example citation hunt is a tool that is surfacing is surfacing sentences in Wikipedia that are marked with citation needed tag and then is asking people to find to solve these citation needed to find sources for that. And so however obviously we know that not all sentences missing citation are actually tagged with the citation needed tag. So these kind of tools can be enhanced by by essentially being powered by these machine learning models that surface other types of sentences be that need citations beyond the one that are used by the citation that are tagged with the citation needed tag. And other types of use case basically is if for example we want to understand. So for example if we want to redirect editors or if there is a way so editors or certain certain wiki projects for example they are focusing on certain topics. So we might even suggest them what are the articles they should pay particularly attention to because they are heavily missing citation. So just in terms of recommending areas of content that editors should look at. This is this is a useful tool for that. Yeah. Wonderful. Yeah. Well I'm excited about the project but I was already excited Thank you. So my task now is to introduce the next speaker. Right. That's right. Yes. Please do. I will start sharing my slides. All right. So talking about content integrity and Jonathan Morgan senior design researcher of the Wikimedia Foundation member of the research team is now going to present his ongoing work in understanding patrolling on Wikipedia. So we are going to see very interesting patterns on how editor perform patrolling on Wikipedia. So unless we have further question Jonathan the floor is yours. Thank you very much Miriam. Can everybody see my slides. Yes. Perfect. So this presentation is a bit of a departure from what we usually do generally at the showcase for presenting research that's read some stage of completion. In this case I'm going to be presenting a smaller piece of research that's part that's part of a larger project. And this is focused on patrolling on Wikipedia. So to give you the briefest of overviews Miriam already called out that that content integrity is a big focus for the research team right now we're looking at ways in which we can help support editors address current and future challenges related to the quality integrity of Wikipedia content. For this project I'm focusing on patrolling and vandalism. And the basically what I did was I did a large literature review read a bunch of papers about Wikipedia vandalism and then I interviewed some editors who do anti-vandal patrolling work and or build tools that other editors use to detect, report and address vandalism. The general goal here is to understand how patrolling tools and the workflows of editors who do patrolling differ across small and large Wikipedia's also understanding differences between fast or slow patrolling and I'll go into what that is in a minute and also how patrolling differs whether you're patrolling strictly on one wiki or whether you're tasked with monitoring incoming changes or responding to vandalism that's happening across multiple Wikipedia projects. With the general focus of identifying limitations in our workflows or the resources available to editors who do this work that create vulnerabilities. So though one thing I'll call out in terms of the scope of this project is that I'm not I did not talk to people about their workflows specific to commons or wiki data. That's something that I think we're going to work on following up. So this project was focused primarily on people who are doing patrolling on Wikipedia's. Why is patrolling important? Well, it's an activity that many of us who edit Wikipedia perform on a regular basis. It's one of the best ways that we as editors can ensure that Wikipedia projects maintain quality as new contact comes in. It's also one of Wikipedia's first lines of defense against different kinds of vandalism or other kinds of undesirable desirable edits. So there's the basic cases of vandalism, you know, people goofing around blanking pages, writing in funny things. But more seriously, there are also threats related to disinformation. Copyright infringement is a perennial problem. Bible and slander is a problem, threat, personal threats and a variety of other ways in which people can edit Wikipedia to degrade the content or disrupt the community. And then patrolling is supported by a whole bunch of different tools, both built in tools, tools that are part of the media, Wiki software infrastructure, as well as other, other tools that community members, and in some cases, foundation staff have built on top of that for special purposes. So bots, gadgets, notice boards, dashboards, etc. And finally, there's no one way that people do patrol patrolling tools available and activities vary from project to project. So this is something that it's important for us to learn more about if we as the research team or the foundation, or as the movement in general, are going to focus our energies on improving our ability to triage and address content that comes in. Just very quickly, the terminology I'm using here. Throughout this presentation, I've been using, I'll be using some terms that are borrowed from cybersecurity like threat model, attack factor, structural vulnerability. This is jargon, but I think it's relatively useful jargon because this is in fact, the kinds of problems that we're dealing with here in terms of content integrity are in some ways very similar to the problems that people working in information security or cyber security are are dealing with. And finally, I use the term patrolling and anti vandalism work somewhat interchangeably. Their patrolling is used in a variety of ways. In the specific way, it means somebody with a patroller user right, marking things as patrolled on the recent changes feed. But it's also a term, as you can see in this screenshot to the right, that's used to describe a variety of different kind of edit review or anti-vandalism activities. And so when you think patrolling, think of the general term unless I'm calling it out specifically in the context of the user right. So there are like I like alluded to before, there are a lot of different tools used by patrollers. There's a certain kind of default tool set that's used in these activities and all Wikimedia Wikis, to my knowledge, have these tools. So special pages like special recent changes, elevated user rights, so user rights such as patroller, rollbacker, administrator, check user. And then diff history and discussion pages, as well as standard media extensions like the abuse filter extension. In general, these are tools that are used by patrollers or assist in the activity of patrolling and that are available in all of our projects. But there's also extended tool sets. So these are tools that generally are built by a community for their particular Wikipedia and they may or may not be available across other Wikis. So bots are a big one, gadgets, user scripts and custom extensions like the page curation extension, which I believe is only available on English Wikipedia. Assisted editing programs, this is a term borrowed from Stuart Geiger and Aaron Halfaker. And so these are generally desktop applications like huggle that or Vandal fighter that editors use to quickly review a bunch of different methods. I'm also considering things like database reports generated by bots and triage boards, often maintained by Wiki projects as tools, as well as notice boards like the administrative notice boards on English Wikipedia and external communication channels, IRC channels, mailing lists, public and private, and then web applications. So like the X tools application that's hosted on Wikimedia Labs, these are tools that that fill in gaps in patrolling workflows or help people do things that are important, but under supported with the default toolset. And they may or may not be available across different Wikipedia's and generally they're volunteer maintained. So the first aspect of patrolling that I want to highlight here that I think is important for us to think about is we think about how to how to improve content integrity on Wikipedia and support patrollers is that there's a rough difference between what I'm calling fast and slow patrolling workflows. This kind of came came out of the interviews and the literature review. And so I want to take a minute to kind of describe what I mean by fast and slow patrolling. So fast patrolling is generally the kind of patrolling that you perform if you are on, if you are reviewing the recent changes feed of edits as they come in. And often you have a particular user right and often you're you're taking a session to patrol a bunch of edits that are coming in. So this is the thing that you are doing right now. It's generally a more instinctive heuristic based decision making process. You're working fast. You're trying to just quickly, you know, separate out the wheat from the chaff to use a colloquial isn't that probably doesn't translate across cultures. You're trying to quickly get rid of the worst stuff, but let let other things through. It's generally an individual activity often is performed by people who do this work a lot and self select into this and they may have a patroller user right or some other advanced user right that gives them access to this. And the workflows are fairly well defined. The purpose is to review most of all of the new changes as they're coming in. Get rid of the obvious vandalism. And then if there are, you know, attacks happening, stop those in real time. There's also a kind of patrolling that I'm calling slow patrolling, which covers a wider variety of different activities and is performed by a wider variety of editors. And this is basically reviewing stuff that's not coming in right away. So this might be reviewing thing, reviewing historical edits or it might be reviewing, say a batch of edits that were not from say the last week that were not patrolled as they came in, but you're going to go through them and check to make sure they're all right. This is often you're dealing with more time consuming judgment calls in a context like this, things that aren't obvious vandalism, but maybe problematic in other ways. This can this is done individually, but it might also be a more collaborative activity. And by collaborative, I'm thinking particularly for example cases where you see a pattern of behavior that looks suspicious and then you need to go report that to an administrator. And it's as I said, it's done to fill in gaps. It's also supported by a different set of tools. So often people who are doing slow patrolling we're using our watch lists, right? The articles that we're already interested in and we're checking for changes to those. You know, this is also a situation where you may need to bring in a check user to examine whether there's a series of edits that are being made that look like they're coming from different people, but maybe maybe a sock puppet. And you're using things like the related changes feeds. You're looking at editing editor and page logs and you may be interacting with other people on notice boards, I or C channels and mailing lists to figure out what's happening or deal with something that's a little more sophisticated or complicated than simply making a binary judgment about whether a single edit is vandalism or not. So here I have and throughout the rest of presentation, I'm going to talk about threat models. So this is a terminology borrowed from cybersecurity, which basically describes the way in which malicious actors could take advantage of the way that our are socio technical systems are on patrolling or designed in order to subvert those systems and get around the procedures we have in place. So there are different threats threat models, depending on the different kinds of patrolling work that's being done for fast patrolling. Often this is mediated by having a special user right patroller rollback or administrator. If it's too easy to obtain some of these user rights, it's possible that vandals could sneak in and look like they're acting like legitimate users. But then once they've gained the patroller right, use that to give cover to other vandals. It's also true that if the user right is too hard to obtain, if it requires a higher high enough edit count in order to enable it, or if it's just something that requires too much review, there may not be enough people who who apply for this user right in order to patrol the edits that are coming in real time. With slow patrolling that the primary threat model here is that this tends to be a fairly serendipitous and ad hoc activity, which means that it's not it's not systematic. You're not trying to patrol everything. You're generally patrolling things that you notice in the context of your work. And and so therefore, whether or not any any edits get reviewed, you know, after in the time after that they they were added to the wiki depends a lot on whether there are people watching those pages, watching those editors, watching for that particular kind of activity. And so it's possible for things to fall through the cracks. So I also want to talk a little bit about the phenomenon of patrolling on a single Wikipedia versus patrolling activities that span multiple projects. So there's a lot of ways in which patrolling looks different, depending on the Wikipedia that you're on or depending on other factors. So some of the factors that influence how your patrolling workflow goes is that as an editor and the types of tools you use will depend on the type of animals and you're looking for the size of your project in terms of its content. And then also the size of the project, especially in terms of the number of registered active editors, number of editors with permissions like patrol or administrator and the local availability of specialized patrolling tools. There are a lot of tools that are very, as I said, a lot of tools are very wiki specific. So only available on, say, English Wikipedia or Dutch Wikipedia, but not on others. And so the key tools that might be involved in your patrolling activities on a single Wikipedia might be bots. You know, the local wiki projects or notice boards that are available, whichever gadgets or user scripts or specialized extensions are available on that Wikipedia, whether there are assisted editing programs. So for example, I think Vandal Fighter works for French and Italian Wikipedia's, but is not available in English, whereas Huggle is available in English, but I don't believe it's available for any other Wikipedia's. Whether there are external communication channels, IRCs, listservs available that you can interact that you can access and whether there are particular web applications that are available on your Wikipedia and that support the kind of patrol you do. So thinking about the threat models for patrolling a single Wikipedia, you can kind of you can kind of divide this into the threat models for large Wikipedia's, large in the sense of having a lot of active editors and maybe also a lot of page views and a lot of content versus the threat model for small Wikipedia's. So for large Wikipedia's, a lot of a lot of us are going to be, I think, familiar with these kinds of sock puppets or people who are using proxies or VPNs to hop around on IPs, which makes their vandalism harder to track sleeper accounts, people who create an account and then do some legitimate work and wait a while before they start vandalizing in order to kind of get out of the, you know, remove suspicion from themselves. There's also issues with people hacking accounts, whether those are administrator accounts or they are accounts that are no longer active, but have been active in the past. And then, of course, there's a possibility of people different individuals coordinating their vandalism activity and either pretending that they're that they're the same that multiple accounts are actually different users when they're really the same user. In the case of meat puppets or brigade, which is which is the term which I'm using to describe a bunch of people coordinating their activities offline, you know, a Reddit or a Discord channel or something and then coming en masse into a Wikipedia with the goal of making some sorts of nefarious content changes altogether in a big group. For small Wikipedia's, there are a lot more vulnerabilities. So small Wikipedia's are subject to all of the like all of the threats that large Wikipedia's are are at risk of, but they also suffer from often a lack of the best available tools. So there may not be assistive editing programs or anti-vandal bots or even or even incident notice boards available in your language on your wiki. You may have fewer people capable of building and maintaining these tools. You may be have fewer admins, fewer patrollers or even just fewer editors with expertise in particular subject matter. If you're if your wiki is subject to a sustained high volume attack, you may have fewer people able to respond in real time. There's also the possibility of your wiki being hijacked by people who are already members of the community. And by hijacked, I mean a group of people who are in the community who are established editors potentially with with elevated user rights deciding among themselves that they want to they want to skewer, misrepresent the content on that wiki in, you know, in violation of our policies around, say, reliable sources or neutrality. You may have fewer abuse filters available in your language because developing new regular expressions is challenging. Your wiki may be a target of opportunity for vandals who have been blocked on other wikis. And you may have a lack of places to go if you as a regular editor see some sort of issue and need to report it to somebody who can do somebody something about it. So under to underscore here, small wikipedia's are much more vulnerable in many ways to different kinds of vandalism, especially more sophisticated or kind of strong approaches to vandalism than our large wikipedia's are. In addition to this, one of the things that came out is a really major issue in in both the literature of you and the interviews is that if you are trying to patrol activities that occur across multiple wikimedia projects, whether that's multiple wikipedia's or say a wikipedia and commons, there are a host of other issues that you have to deal with. And here is where there are some some real gaps in some of our patrolling tools and workflows that I think it's important for us to address and that hopefully the research team the foundation can provide some sort of technical support for. So why is cross wiki patrolling difficult? Well, first of all, it's very easy for somebody, whether a reader or a or a new editor or a vandal to jump from the English version of an article to the French version of that same article to jump between projects is fairly trivially easy. In addition to that, there's content like images hosted on commons and values of wiki data items that are actually hosted on one project but then transcluded into another project. And finally, new micro contribution workflows, such as adding image captions or adding article descriptions. I'm thinking in particular of the the app based workflows, these contribution workflows actually allow, you know, people who are viewing content on, say, the English Wikipedia to add content that is then appears on another Wikipedia. And they might not even know that they're adding content to wiki data at the time. And the people who watch that Wikipedia article, the editors who watch that Wikipedia article may not be aware that content is being added related to that article, which is showing up for readers, but it's being hosted on another project. So we have some tools here. So we have global logs for IP range blocks. We have central off logs. We have people with user rights that aren't specific to Wikipedia, global check users, global admins. And there are also some community created tools. So there are IRC channels and IRC bots, which help people, especially global check users, global stewards and administrators monitor content changes happening across multiple wikis. They're doing this in public and private IRC channels and mailing lists. There are also global support request notice boards like the stewards, stewards, stewards notice board on meta. There are some global contribution viewing tools hosted on labs. And there's a global spam URL blacklist. So there are some tools that are specifically designed to support cross wiki patrolling. However, there are a lot of gaps. So for example, if I'm on the English version of Wikipedia article, if there are changes happening to to the French or Tagalog or Chinese or Dutch versions of the same article, those related changes are invisible to me. There's no way easy way for me to monitor when that when that activity is happening. In addition, most of the blocks and bands that we use to prevent people who are actively vandalizing for local. But it's trivially easy for somebody who's locally banned to go and make mischief somewhere else. Our global notice boards like the stewards request notice boards aren't fully international. So there are the instructions on how to participate in those boards, as well as the fluency of the stewards who are more maintaining and monitoring those boards, isn't universal. You have two hundred and something languages of Wikipedia, but not we don't have stewards who speak all those languages or instructions on how to submit a you know, an incident report in all those languages. And in general, the workflows we have for addressing cross project project vandalism like the stewards were notice board are high touch and they're time consuming for all parties involved. So it's it's it's difficult and takes time to report an issue. And it also is is difficult and a very manual and high high touch process for stewards to investigate and address these issues. So here in general, we have an issue with the speed and proficiency with which people can vandalize across projects is on a different scale as the speed and efficiency with which we as a community can address cross project vandalism. I also wanted to call out that Wikidata and Commons have have somewhat have some additional threats when it comes to cross project vandalism. So for example, there are what I'm calling unpredictable attack factors when you have say imagery located on Commons, which is being surfaced within a Wikipedia and somebody on that Wikipedia who's patrolling that content may not know what's happening with that image on Commons and vice versa. People on Commons may not know how that image is being used across Wikipedia's there are and so this can result in say somebody uploading potentially uploading a new version of an image that was, you know, that was offensive or against policy with which would then become visible in the Wikipedia articles, but there's no local edit trace. So Wikipedia editors wouldn't know necessarily that content has been changed. And finally, the new micro contribution workflows that we've introduced, we have seen that there's been an increase in edit volume on Commons and and on wiki data through these workflows. But these many of these contributions are being made by people who don't necessarily work on those projects may never have visited the project. And it's not clear yet how those edits are being patrolled, how the communities have adapted to this increase in contribution volume. So this leads to a couple of recommendations I have for things that the foundation can explore. And I'd love to get feedback on this. So the link to the project is on on the research showcase page. If you have feedback you'd like to provide on on this, please, please provide it on wiki. You can also reach out to me directly jaymorgan at wikimedia.org. So a small set of technological interventions that I think are probably fairly uncontroversial that I think are worth exploring. First, something editors have been asking for for a long time cross wiki watch list potentially also cross wiki related changes. I want to know what changes are happening to to the articles on this topic in other languages. Providing a a substantial incident database for vandalism, a structured database where people with the right permissions can look up historical incidents of vandalism and record incidents as they happen so that we can develop more of an institutional memory around who's who's performing what kinds of vandalism and when and what the results were. Current workflows for this include, you know, private wikis and mailing list archives, which doesn't make it easy for people to to look up things that happened in the past, which may look like incidents that are happening now. And finally, some sort of social media inbound traffic reports. We know that a lot, especially in when it comes to disinformation or disruption, a lot of a lot of vandalism, serious vandalism is coordinated in other spaces than Wikipedia, particularly on social media sites. We also know that sites like YouTube and Facebook use Wikipedia articles for fact checking controversial user generated content, which probably has the undesirable effect from our perspective of sending a bunch of people who believe conspiracy theories to the Wikipedia article that's being used to debunk those conspiracy theories. And so it would be useful to know which articles are kind of in the target in the crosshairs at any given time where traffic is coming from from social media, especially if it looks like there's a lot of traffic to particular articles coming from particular social media platforms and making that information available to editors so that they can so they can watch those articles. And finally, there's a lot of because I'm a researcher anytime I do a project, my first thought is, OK, what do we research next about this? So putting putting out some ideas for further research and I'm just going to highlight a couple of them because I want to give us a couple of minutes for any questions before we wrap up today. First, there is ongoing work led by Srijan Kumar and with the support of Leila Zia, head of research on sock puppet detection. I think that there's also ongoing work on the impact of micro contributions on Wikidata and Commons being led by a more market one on the product analytics team. I think that one potential extension of this kind of work might be looking at how these micro contributions, which have increased the added volume to those projects are being patrolled. And finally, I want to call out that I talked a lot about cross-wiki vandalism, and this is something that came up in my interviews with people as a phenomenon that they were concerned about. But I haven't found any research on the extent of cross-wiki vandalism. To what extent is this something that vandals actually do? Do they jump around on different Wikipedia's? And if so, what are the characteristics of cross-wiki vandals or their activities? And so I'll leave the rest of this here. The slides are currently linked from the research showcase page. I'm going to upload them to Commons and put the Commons link there after this presentation. And that is it. All right. Thank you very much, Jonathan. Maybe Isaac, can you tell us if there are questions from the music channels? Nothing from YouTube or IRC. I have my own question, but I'm also willing to step aside. My other people ask questions if they have them. I think you can ask the question. I also have another question, but maybe you can go first. Is this one? OK, that works. Good job. My question and maybe kind of more of a researchy one. But like you hear, you know, so as like English wikipedia, for instance, crew and many of these wikis grow in size, there tends to be this trend towards kind of locking down, you know, as far as increasing, you know, the number of ed account and things like that to get special permissions. And this kind of work makes me wonder to what degree would better patrolling tools help. That not be necessary. Like is as a wiki grows, does it need to lock down or if there were better patrolling tools? Do you think that alone would be sufficient and like wikis could stay largely as open as they are when they're when they're much smaller? I think so. I think a lot of people think so. So there's a lively discussion and generally, in my opinion, a productive discussion happening on the talk page of the research project on IP masking that's that's, you know, in the in the early, I want to stress as far as I know, very much in the early exploratory stages. I know there's been some anxiety around the impact of this change. But in those in that discussion, as I read it, I'm seeing actually a lot of people coming up with good ideas for OK, you know, we want we want the wikis to stay open. This is kind of core to what Wikipedia is to the mission of wikipedia projects. But for a variety of reasons, including potentially like legal compliance, we may need to stop showing IP addresses you know, by default within the interface to everybody. How can we make it easy for the people who need to use that the IP address as an important, you know, source of signal for patrolling activities to still perform their work? And so these kinds of discussions need to happen in order for us to move forward on this, both in that particular case and in general. But Aaron Haffaker has done a great deal of work and others have also done a great deal of work on the impact of some of these of some of the changes that have been that have made particular in the English Wikipedia that have reduced people who are not currently core members of the community's ability to to make contributions. And often the finding and I would say make a hell of it. You can see Washington's also done a great deal of work on this. Often the finding is that you end up losing more. You end up doing more harm by preventing peripheral contributions, kind of from even taking place. Then you gain from reductions and say, like the workload of Vandal fighters. But it's a tricky thing. And so I think that it's really important that this work be done in close collaboration with communities and that we work on supporting communities coming up with solutions that work for their community and not just kind of, I think, coming up with global solutions where the Wikipedia Foundation builds a thing. I don't think that's ever going to be the entire answer. Thanks. I maybe have a final question and it's mainly to connect to our friends and colleague researchers in the competition of social science. So a lot of them, they might be working in these information spaces, mainly focusing on social media. So what advice would you give them if they were interested to move to understanding these spaces in Wikipedia? So, for example, what is like, where will they start from? Among the list of projects that you have listed there, some of them they're very quant heavy. And like, what is the most basic or among the most basic things that we need to do to understand better these behaviors from a quantitative perspective? So I think that one area where people who've been doing work in disinformation and social media can really help us is around understanding reliable and unreliable sources. So one of the things that people who are looking at, you know, disinformation networks on Twitter or, you know, disinformation on Facebook, we're finding that often, you know, people post links to things that, you know, to websites that look reliable or that they, you know, that they are asserting are reliable, but are really either intentionally or unintentionally containing like wrong or misleading information. And people who have been working in this space know what the signatures of those websites are. They have large lists of, you know, websites that are commonly host this kind of information. And so an open question that I'd like to know is to what extent are people actually trying to insert these links to these websites, site these websites as sources on Wikipedia articles? And in particular, I want to see this, I would love to see a multilingual approach to this. Because as I mentioned, I feel like many of our smaller languages are at greater risk for a lot of this than say, you know, the top five Wikipedia is just because there are fewer people there to vet every source that comes in. That makes a lot of sense. And I think there is more in the in the presentation that you shared more of inspirational ideas on people who because there has been done very actually, not so much work in understanding these spaces specific in Wikipedia. So that deck is full of ideas on where to start from. But this is definitely an important problem to tackle. Isaac, if we don't have any further questions, I think Jonathan, I let you conclude the showcase the final recommendations. Thank you very much, Maria. So wanted to thank Emerald Ross and Janet Layton for helping with the showcase this week. Thanks again to Isaac Johnson on IRC to our presenter, Mary ready. And also once again, welcoming Martin Gerlach to the Wikipedia research team. I'm sure you will be seeing Martin's work presented here in the not too distant future. And with that, our next showcase is going to be October 16th. That's about a month from now. And the focus of that showcase is going to be similar today. We're going to be focusing on disinformation. So be prepared to hear a lot more about disinformation on Wikipedia and the web next month as well. Thank you all very much and have a wonderful rest of your day.