 Right, so hello everyone. I'm Claude Alejandro for those of you who haven't heard of me I'm a volunteer software developer and also an English Wikipedia editor Funnily enough to get the Singapore I had to go through an airport which had a power outage that I wrote an article about earlier this year I am mostly dattle in user script and tool development Some of you might know me as the current lead developer working on red one and ultraviolet So I'm sure that many of you know the slogan of Wikipedia Wikipedia the free encyclopedia The free in the slogan of course means free as in free speech But there's some cases where the users who edit Wikipedia would copy things that aren't free so things online like news books research and What we could when the when they get imported into the English Wikipedia We then call them copy vires or copyright violations So on average every day there are around a hundred new cases of automatically detected copyright violations on in the English Wikipedia On some very bad days. We get around a hundred forty, but on some good ones we get around twenty four per day And but that's mostly the automatic The automatically detected copyright violations. So what happens when some of those violations somehow fly under the radar? well Whatever a contributor has a longtime history of placing copyright violations on Wikipedia they get sent to contributor copyright investigations So every time a new case is opened on CCI, so you call it It brings dread to almost everyone that's participating in this space And that's because we have to crawl through every single edit made by that editor to confirm whether or not it's a copyright violation So as of now there's 206 open cases and among all of those cases we still have to check 500,000 diffs and that's after removing all of the minor edits and the reverts that they've made So here's a visualization of all of the currently open diffs as you can see here The largest case here actually has 48,000 diffs in it. So we're just that size You can just imagine how many things they still have to check CCI is one of the largest administrative backlogs on the English Wikipedia and it's really hard to get into since there's not a lot of editors and There's so many to do for every day Not only do we have to fix new cases of copyright violations that we find but also we have to fix some very old ones the oldest case on CCI is around a decade old But that's not the only backlog that we have there's also copyright problems Copyright problems is where cases of suspected copyright violations go Unlike the name suggests it's also a notice board and what happens is when we find a suspected copyright violation We put the this copy via template onto the page. So that temporarily removes the text and hide it from public view and And So the Notice is pretty old, but currently it's there's plans to redesign it so that it's easier to read and it's easier to follow for a new for newer editors and so that it looks better and unlike New page patrol or counter vandalism copyright is mostly manual But thankfully there are tools that are helping the process So those are for our copyright editors that are working hard on trying to fix all of these cases So I want to give an introduction and a shout out to some of the tools in the space And I want to introduce something that I've been working on in the past year to help copyright editors with their work So first in the list we have earwigs copy via a detector This is one of the most important and very used tools in the space When you give it an article or a revision It will search the internet for existing copies of that. So we'll be able to find text that's been copied from a website without attribution in some cases if the The source that they happen to copy from is linked in a page Earwigs copy via detector could also find that there's also an option to use turn it in but we barely use that since we don't want to use too much of the resources that is provided to us and of course if you use the Google search option too much the tool currently gives up because There's that we make too much of these requests every day because of the amount of work that we have to do Next is who wrote that so this is one of the tools that are what one of the tools So we're made by the rock stars at the community tech team at the wiki media foundation so it currently Finds all of the changes that an editor is made on the page it highlights it in yellow and then Clicking on a specific part of the text will get show you the specific diff that introduced that change so thankfully the This tool was recently expanded to other wikis as part of a community wishlist survey this year So it's currently in the process of being expanded and opened up about four more weeks to use Next is the contribution survey. This is made by one of the long-time copyright Editors that we have on Wikipedia in mercy It finds the substantial edits that are made by a specific user and this is what we use to generate all of the pages We have on CCI so using this we're able to find our substantial edits so that we could Be able to check them and have them all in a centralized location and Last I wanted to point out was copy patrol. So copy patrols are also made by the community tech team The top three of the editors I checked this morning was Niharika who happens to be in the room music animal and Sam Wilson and It shows a feed of recent changes which may have been taken from somewhere else Uses turn it in so every single diff is sent through turn it in for checking and recently It's been having some big updates So recently the back end was updated to Python 3 from Python 2.7 Which was end of life like two years ago and then the front end was also being rewritten the symphony So hopefully we'd have that newer version the production very soon So now I want to talk about The specific the sorry the specific tool that I made for helping copy editors So I called it deputy. It's named after well Made you have a main investigator and of course you have a deputy to help you with it So initially somebody came up on the user script requests a page on the English Wikipedia And once they were asking like it could there be a CCI user script that could help us in doing cases and Really made me think like how do we really do that? I mean, there's so many things to do in C side or as many different things that you have to check So what part of that could be you know, what could what part could we help with? so I Went through the process. I did a few cases and then I noticed that something that was taking quite a while or something that Took quite a bit was having to Go through all of the diffs check the diffs or opening them on new tabs and then loading them in and then Going through the next one then erasing them from the page manually because you have to edit the actual on wiki page itself so I Was trying to think of a way to turn that from a single Wikipedia pages are just a page made out of wiki text and turn it into a usable interface So I made this mock-up. This was originally made around February and 2022 and then I as soon as I finished it I sent it over to the CCI channel on the wiki media discord. That's where we hang out usually and There were some people that were excited about this because this was actually one of the first times that a User script for this specifically for the CCI process made there's currently no other user script that does this and A friend came up to me and told me you should request a grant for this so I went to the I Filed for a grant and I had had a talk with the wiki media foundation senior program officer Jacqueline Chen who also does the grants in the EC app region and thankfully that grant was approved so that's where the development of the tool begins and of course like every Every development of a tool then it's obviously they're going to be roadblocks. I mean this is a tech tool We're talking about so one of the main issues that I wanted to point out today, especially to the developers here in the room is that The I wanted to warn you in advance for some tools that you might be making so one is there is a lack of TypeScript typings for the for OUI back when I first wrote the script so there's no official TypeScript declaration files because it it's mostly written in JavaScript and We also can't automatically generate those types for two reasons So first is that it uses JS duck instead of JS doc JS doc was a JS duck who was abandoned over a decade ago and It also use doesn't use a imports or exports it instead modifies a OO dot UI global and Automatically generating types doesn't like that so Luckily nowadays this isn't much of an issue because a volunteer named this dance created a type library for this It's now available on an NPM on the under the library at type slash OJS dot UI So I mean Claude if you wanted to use types then why couldn't you just use codex? Well codex wasn't actually mature at the time that I started this project Zero the version 0.1 just came out after the project started So of course that still lacked a lot of the features that was just currently available no UI so I couldn't use that and The support it has for user scripts is still a bit iffy So it like I could do that, but it'd be a big of a bit of a problem And of course there was also interference from the real world. I mean after all we have real lives as well I had to graduate high school at the start of the grand period and at the end of the grand period I have to moved to Manila for college But in the middle of all of this I was still giving out alpha releases I was giving out a special standalone parts of some of the some of the modules I'll talk about later and I released the first bit of the tool this September and I'm continually working on it ever since So here's just a quick summary of the features This is what the normal CCI page would look like as you can see There's a few pages that the users who worked on and in a long list of diffs So every single one of these edits has to be checked. It has a content that they added in they have to Check every single one of these so from having a page like this where you have to manually edit it It now looks like this when you have the tool installed So over here you have a drop-down to see or to indicate What status is if you found a copyright violation or not and then you have a checkbox to Remove the diff from the list if you were able to check it and then some comments And you could also click this button if you want to check mark all of it There's red if you think that it's a complete clean and Aside from this I also imported two of my previous user scripts into deputy So this is the this used to be the copy template editor and that was because it Was able to edit a copy template. So this is the template right here. You can edit that template using this interface and This is recently expanded when I imported it into deputy and now it's able to edit more than just the copy template but also The split article backwards copy and a bunch of other templates along with it Aside from this, this is another imported part. So this one on the left actually specifically this is an infringement assistant I made it that back in the day and It has the ability to blank an entire page or a specific section of the page so that it could be reported the copyright problems with the template that I showed earlier and Aside from this it now has When I put it in the deputy it now has another feature where so this specifically for a copyright clerks They could immediately respond to existing cases on the copyright problems notice board So there's a predetermined list of responses that they could make so that makes their life much easier And aside from that for the editors who are currently working on fixing one of those pages They could see the content without having to edit the page They could just click a drop down and it'll show the death for them So now let's go into how effective this was of course when you were making a tool you want to make sure that the editors were satisfied with the work so I ran a survey around June to Find out how people liked the tool so I was posted on a few of the copyright cleanup project pages so talk pages and like and Unluckily the small sample sorry the sample size is pretty small and that's Not because people didn't want to answer the survey, but that's actually because the copyright space is extremely small There's like only oh, there's only like a 10 to 15 people active at a single time so Thankfully they were positive on forming metrics. So that's stability. So how buggy it is This is a bit of a problem. Well Deputy really was in beta when I first started with it. So that was pretty much expected Tool ability so it could pretty much do a lot of things So thankfully this a bit high speed is pretty slow for large cases So when you're working on a page with a lot of diffs, so it's pretty much the same thing It's viewing a page history with like 500 revisions on it Obviously gonna slow down your browser because there's so many elements on it and for user experience I'm pretty proud of this one. It was pretty high. So as a front-of-developer. I'm really happy that Users were happy with the user experience So six of the responses still eight of the people who use the tool by the way This is six of the respondents say that it made the work faster He strongly agreed they rated that a five out of five and then two of them rated it a four out of five So that's still pretty good Most of the editors use the CCI capabilities Some of them also used anti so the attribution notice template editor, which was what I renamed it as and That's mostly for fixing attribution issues and then there There was also a few people using the infringement system But not as much since you only get to use it sometimes when you're actively working on cases or fear a clerk who's actively participating in the copyright spaces So aside from this there were also I also provided a list of features is that I thought of as a We could probably prioritize them one of the top ones here was integration with earwigs tool So some editors wanted to see the results like the percentage of copy text off or a specific page from within the tool and there was also a Lot of people also wanted a guided process for requesting CCIs So currently they have to read through a bunch of instructions and then they have to file the case manually So I have to add that section so that takes a while So people wanted to guide an experience for that and then they also want to article top page tagging so this was actually one of the features I originally planned but I couldn't do because a lot of the features of this was already present in Twinkle and a bunch of other tools and There was also some requests for internet archive bot integration And that's because when we were dealing with a case especially particularly old ones they're you're gonna hit some dead links and the earwig Copy via detector will not find those dead links because well the pages are dead So what do you do is you'd go on the internet archive or you go to internet archive bot to find the archives add them into the page and then the earwigs copy by who could check that Check those archives to see if something's been copied from them And then there's a bunch of other features here, but I won't get into that the there I have a limited amount of time here, of course So a bunch of other honorable mentions from the survey There were a total of ten respondents actually so eight of them were using deputy One of them didn't use it But another one of them used to use it or specifically the CCI part of it all of the respondents agreed that it was at least helpful or at least a bit apathetic to The how helpful this to the CCI space thankfully no one said that it was not helpful Some of the users a few of the users requested image CCI so that's a handling a possible Cooperated images that happen to make their way on to comments or on the English Wikipedia Many editors like the UI so again, I'm very happy about that because I'm very Proud about the user interfaces that I make and sometimes it's buggy that is expected since it's a beta But I'm actively working on it just a few days ago at the hackathon room I was fixing a bunch of bugs that it had as well All of the respondents as well I'm good to note all the responses from the or were from the English Wikipedia and that's because right now it caters specifically the English Wikipedia processes, but I do know that there are many other wikis that are experiencing copyright issues So if anyone wants to reach out then sure go ahead and talk to me after the session So I'm still in the middle of compiling all of the survey responses, but eventually I'll have a summary and a bit of the details in a Post on meta wiki about the total research data and a bit of analysis on that So of course there's still things that are needed because the work isn't over number one is a Following of what they've requested so first is I'm going to plan to integrate the earwigs copy via the detector into the tool so it makes their job easier and Then the CCI request will actually I've been working on already you could see the Starting look out of it. It looks pretty much like one of the wizards you'd have in an old computer But there's actually I found this to be very helpful for specific for your users who are new to the space it because if you give them a guided experience there have a much Better experience trying to do it and of course a lot of bug fixing and features Of course developers will always spend their time doing a bug fixing because well, you don't want a buggy program Don't you and then about the what about the copyright space in general? Well currently there's a Active RFC that's being planned for the copyright space So one of the things that we're trying to get is a bot to automate the copyright problems notice board So currently all of the archiving and all of the case filing is done manually So we need to work on that next is the ability to scan old pages So pages that aren't newly edited for copyright violations especially for books and newspapers since usually the earwigs copy via detector doesn't have access to that Especially with paywalled sources, which we also have no access to and sometimes some texts is copied from those as well and Most importantly to the few people in this room and hopefully some other people watching virtually We need more copyright editors to help out. We have just a small section of editors working so if you could Help come in and help learn the process and then maybe help out in this space because there's a lot of backlogs So we have to chip away at You'd be we'd be very much appreciated to have you and we'd be very much appreciated to help you with getting started So that's all from me and that's all for this part. Thank you very much for listening Have a wonderful rest of your work in India. Oh, and if you have questions Yes, feel free to just reach out to me after this. Thank you