 UV, as I'm not going to repeat the introduction. I work as an operation engineer at Wikimedia Foundation, so we keep the site up as such. And I specifically work in a department called Wikimedia Labs. We provide free, as in free of cost, as well as free, as in free software slash open source, computational resources to volunteers for them to run scripts or run queries or publish a web page or a web-based tool related to Wikimedia stuff on it. So that's where my primary interest comes from. And this talk is titled, Stealing Some of Wikimedia's Principles to Democratize Programming. And I have an agenda slide, which should come up when I click things. No? Where is my mouse cursor? Here is my mouse cursor. OK. No. OK. OK. Agenda slide. So I'm trying to define what I mean by democratizing programming that probably has a million definitions, but I'm going to pick one and stick with it. And also talk about which particular Wikimedia principles I am going to be talking about. There is also a million of those to choose from, depending on who you ask and when you ask them and what context you ask them. And then we're going to talk about two studies as such. I think they're not academically peer-reviewed published studies, but more like I'm going to talk about the experiences we had designing these, why we designed them, what were the problems we were trying to solve, and the emergent properties that have evolved from these systems that we did not think of when we designed them. We only designed them thinking, like, these are the principles we stand for, and we think this is how access should work. So we will put it in there and then see what happens. And then we're going to talk a bit about what did happen. And then, as you said, there's going to be questions at the end. But if something is not legible on screen, increase font size, or if you don't understand what I said, or if I'm speaking too fast, just feel free to pipe up immediately. So this is a dictionary definition, or at least one of the dictionary definitions of the word democratization. And it's, I think, fairly simple, might be controversial. I don't know, but I'm going to stick to this one. It just says to make something available to all people. It's a very inclusive definition. It just says all people. It doesn't restrict to whoever it is. And this is a nice definition of programming that I've never heard of before. And it says to just do something in this case of democratizing programming is just to provide a computer or other machine with instructions for the performance of specific tasks. It doesn't talk about software engineering at all. And I think that's a distinction I really want to draw, is that I'm not talking about software engineering. Software engineering is a complex principle. It's a complex field. And that is going to get more complex over time. Because the things you are building are getting more complex over time. So people will build abstractions that help with that. And there's whole fields of study in making that better. But that's not what I'm going to talk about. I like this definition a lot. It basically says, programming is the ability to bend a digital computer to your will. In the same way, you would bend words to your will when you're writing a note to your mother. Or just like a letter to someone. Or just filling in an online form or tweeting. And I think it's very similar. I like the analogy to writing. Not everyone needs to be able to know how to write a novel or a thesis or a project plan or a poem. But if you cannot write anything or read anything, it will significantly affect your life. If you cannot write a shopping list or read what a person's political candidates stand for when they are voting, it is going to significantly affect love. And I think programming is very similar. And if it is not right now, it will be very soon. It is already the case for a lot of fields that if you're like, oh, I need to go find a programmer. You've already lost. You've already given up power. And I think this is also a very important facet of it, which is programming is power in the very intense sense of the word, which is it allows you to do things that you would otherwise not be able to do. It gives you this agency that doesn't exist if you're like, oh, I'm not a programmer. And so I cannot do this. Then that means there is several things that you then decide are OK for you to not do. You've given up agency. And I think that is the default now. You need to be able to program, or there is a lot of things that you just cannot do. And if it is not the default now, it will be very soon. And I don't think I need to dwell too much on that. This is Beds. I'm sure everyone here. I'm not the first person making these things. I will not be the last person making these things. I'm also pretty sure I'm not the most lurk one person making these things. So yes, so I think we should take that for granted that everyone should learn to program. The only additional point I would like to make is that you should be able to program without it becoming your identity. You shouldn't have to be a programmer with all the associated baggage that comes with that label to be able to use, to be able to program. You shouldn't have to sacrifice two years of your life learning the command line tax, like figuring out how Git works. This is the history of UNIX. And it's the best and forming opinions on language wars. And that whole thing, it should not be required for you to be able to program. It should be much more low cost, much more open, and you should be able to do it without having to become a clergyman. And so with that, with some hopefully clearish definition of what I mean by democratizing programming, I'm going to talk about the Wikimedia principles that I'm going to talk about. So these are not written down anywhere. Wikimedia in general is not particularly like, OK, we will write these principles, and then we will follow them as the way it is to be. But more like, we have some core values that we all share. And we will make decisions based on what we feel is right and based on consensus. And over time, we will distill them into things that we can call principles. So these are very fluid. These are not set in stone. These are not written anywhere. These are just me discussing with people who are around when we're building these why we made specific decisions that are caused by, this is the culture we are. And this is our identity as Wikimedians. And this is why we made decisions. And then trying to distill that into sentences and words that I can use to explain it to people who are not in that culture. A lot of this would be self-evident if you had been Wikimedian for 10 years. But thankfully, not everyone else. So the first one is be as open as possible. Open has multiple meanings. And in this specific case, I'm talking about access. You shouldn't have to go through multiple steps to get access to a resource. In Wikipedia's case, everyone knows Wikipedia, I assume. That's a given now. Anyone can sign up. You don't need to ask for permission. And in fact, you can actually edit without having to sign up. We basically keep it as open as possible. No approvals required. No proposal need to be submitted. And if you go through the projects that you're going to talk about later, which provide you access to computational resources, they are all open to anyone. You just need to create an account and you have access to this. You don't have to be in someone's inner circle. You don't have to write a proposal that gets approved. You don't have to make your case. It's just low cost. It's just there for you to play around with as you see fit. But abuse does happen. And that's the as open as possible part of it. And when abuse does happen, obviously, the original Wikimedia's case was very naive, right? They're like, everyone's great. Abuse will never happen. And then after six months, you realize, actually, abuse does happen. And then because you have run into specific cases of it. So then you have two choices. One is you can build this to be locked down from the beginning so that you know you will not run into abuses. The other one is you keep it as open as possible. And then as you run into specific abuses, then you build systems to fix those. I'm only talking about technical ones because these have far less of a cost than social abuses. This is not like a free pass for designing your system to not have any anti-trolling features or anti-doxing features or basically be Twitter from 10 years ago, right? But this is like technical ones. I think it should be clearer soon. So this is, anyone can edit. So you're not logged in. So this is like the Wiki article for Wi-Fi. And anyone can just go hit Edit and then make their changes. You can log in if you want to, but it's not necessary. But talking about the abuse of stuff, if you want to change, make change to Barack Obama article, then you do need to be logged in. And then I think right now you also need to have had at least 15 edits or 20 edits to other pages and not be banned because of those edits that you made before you can do this. And if it gets really extreme, then you can, like Wikimedians will lock it down so that only administrators can edit it. But this didn't exist in the beginning. When Wikimedia started up, it was just free for all. Just come at it. And then of course, after the fifth time, you've reverted the same thing. You're like, okay. I think we're gonna put some technical limitations in place now. But I think this is a very important distinction which is we didn't have this from the start. We didn't be like, well if you open it up, people are just gonna keep saying stuff that's not true or like just like completely trolling. No, we're like, it's okay. When people troll, we will build specific defenses against them and not throw the baby out of the bathwater because the rest of the stuff is very useful. So that's principle one. Second one is be as public as possible. This is different from as open as possible. I mean, again, public and open have many different meanings. So in this case, what I mean is as soon as you create something, it's open. There is no separate published step. There is no polishing. There is no, I will do this now and once I feel it is ready for the world to see, I will give it to the world. For many, many reasons. Like the common reason is you are not the best judge of the value of your work. Something that is completely like, oh, this is just something I did. I'm gonna throw it away. It's actually probably very useful to someone else and you have no idea. And also, this is the most important of these principles, I think, because this leads to some really fun emergent behavior that we'll talk about later. And this also, again, process for dealing with abuse in place, Wikipedia example, again, in the beginning, there was no way to delete anything. Like if you delete something, it would still be in the archives. You can go back in history and see what was there. And then people started doxing people on Wiki or like adding like child pornography. So then we're like, okay, we do need to have this in place and we need to have social systems and technical systems in place to fix this. So again, this is not a free pass to be like, but we are mutable. This is just our principle. So screw you is not a valid answer. You have to be accountable for the abuse that happens on your system. But again, that doesn't mean you have to go all the way out. You don't like say, no. So you have to have a real name, for example, is like one way to combat abuse. But then again, this is the throwing the baby out of the bathwater setup where you're not going to get the advantages of this. Again, examples. This is the first ever version of the Barack Obama page. It was in 2004 and someone had just written this. And it's useful as is in 2004, but I am sure that if I was writing this from scratch, I wouldn't put this out like in 2004. It feels natural to me now, but that's because I've been in this culture for six years. But this is totally okay. This is great. People build on this. It's where it is now. It's a very well-written, very well-resourced, really large article with many thousands of people editing it. But this is how it started. And now we can go see this. And again, like emergent properties that come out of this we'll see later are very useful. And this is actually a work log from one of the researchers who work on the media and had references to it as well in it. And this is from 2012. It's just like someone's work log that they were writing about. And I find these very fascinating to read. There is no reason like if you don't have this culture already, you don't make this public. It's just like even if you write it, it's like in your Google Docs somewhere, or you write it at home. I mean, like in your home directory and then you just leave it here. But because the culture is so much like everything is public, you do this. And then I found this like in 2005. And these were like very useful for me to read through. And I think for like the people who wrote them as well in 2005, it's like 2015, it's like very useful for them to read what they have been thinking of in like three, four years ago. So I think that's also very useful. And it's like comes out of like this public by default setup. Because if it's public, then it's like not going away anywhere. You're not gonna lose it. You're not like, oh, I don't know where that is. And again, you don't know what utility it has to other people. You only have a guess of what utility it has to yourself at that moment. Right, so the third one, which kind of comes from, it's like, I think the third and fourth are kind of derivatives with the first two principles, but I think are important enough that I must call them out. First is attribution, not ownership. This is an important thing. Like if you see this, there is no byline. This article does not belong to anyone. It is like everyone's. But if you go hit history page, you can see what edits each individual made. So this is attribution, but there is no ownership. So this means you can't say, you can't use it like this. You can't do it in this way. You can't have this being done in this context. It's kind of a gift economy setup without going to the other characteristics that that label provides. You're basically gifting your work to the world so that other people can use this. And this is a two-way street. This also means that you can use other people's work. And in this Wikipedia case, the end goal is you want to build a great encyclopedia. And this, I think, is a very useful property for doing that. If your end goal at all times is like, I want to do this thing which is really nice and really complete. And then I think, at least in the Wikipedia universe, subsuming your identity, like not like attaching it to your ego is important. You shouldn't be able to control this. You shouldn't have to worry, oh, but what if someone else does it better? Then that's a good thing. If someone else steals an article you started and then makes it 10 times better, then that's good because the collective aim is to have an article that's 10 times better. And then you still have your attribution. You go back to your history and it's like, oh, that's there. And there are lots of tools that display that in very wonderful fashion, oh yeah, this just happened. And maybe that other person wouldn't have done this if you hadn't started it the way you did. You never know. Maybe they would have been like, oh, maybe nobody's interested in it. Or like, oh, I don't feel like starting, right? Like always like editing something to make it slightly better is much easier than starting from a clean slate. So that's why this is very important. And like, you know, ownership is also like complicated because if you have to ask for permission to do anything, the mental effort of doing that is just so high that you just are like, I'm not gonna bother. And just like, let it die. And that deals into our second point, which is low cost. Everything should be really low cost. You shouldn't have to think twice before you do anything. You shouldn't have to be like, I really wanna do this, but it involves like these 40 steps and I'm like, ah, I'm gonna go to something else. And this becomes internalized after a while. And you're like, I'm not even going to attempt to think of the things that I can do with this because I know that the steps I need to take to actually use this or like get permission to do this are too high for me to do anything with it. And I think it definitely morphs the way you think. And I think that is like the real negative of having like individual ownership of things. Unlimited undo is better than capsule pre-wetting. And like B bold is I think the closest to a universal Wikipedia motto we can find. Like we have lots of mugs with the word B bold on them, lots of t-shirts. I think it's like, of the many values we agree and disagree on, B bold is the one thing I think that has fairly common agreement. And that's just like, just do it, it's okay. If someone doesn't like it, they will tell you or they will revert you because undo is cheap as well. Like creation is cheap, undo is cheap. And then you can have a discussion as to what's going on and call each other bad names. So yeah, so this is undo, undo is really cheap. So if you just are like, I don't think this is a good idea, then you can go undo. But then if there's a community consensus that says no, it was a good idea, then you can undo the undo. And same with like creating a new page. If you saw the work log earlier, it was a page. And if like you had a quarter of a hundred pages you can create and after that you had to pay for it. Or like, oh no, the system is filling up. Then that work log would have never existed. And maybe Barack Obama's page in 2004 would have never existed because it's just some guy who's running for office. Why would I bother doing that if like space is precious? So having a low cost for your actions is very important. And it also is very important for the emergent properties that come out of it. So right, so that's the values that I want to talk about. There's clearly a lot more things that could be called Wikimedia values that I'm not gonna go into. Stuart who's there has done a lot of research around these. And so if you're interested in talking about those, he would be a very good person to talk to. I'm gonna move on to the two systems that we built that were focused on giving people more programmatic power and how we built them using these principles. The first one, I am just going to, they're both live. They're both accessible to whoever wants it. So you can try it out as you go if you want. The first one I'm gonna use like slides. I have screenshots that I'm gonna use to show what that's like. And the second one I'm gonna try to do a live demo. And if that fails, I have gifts. So right, so query is the first of the projects I'm gonna talk about. I'll explain what that is in a while. This is the problem it is trying to solve. Run queries against Wikimedia better data. Why would you do that? I'm gonna show some of the example queries that I just found going through what people have created. Someone wants to find articles that have no categories. Maybe they should be deleted or maybe they should have categories added to them. Someone wants to find blocked editors with the most edits. Maybe these people are redeemable, who knows. Articles with title is containing an opening parenthesis but not a closing parenthesis, which is really annoying. And I'm sure I can totally see myself running that query and making sure that it is always zero. Same thing with codes as well. And then the fourth one is a real query, which is Wikimedia has Wiki projects that are sub-organizations and they worry about, like they're working groups. You worry about a specific subset of articles. So someone has written a query that finds articles that should be in Wiki project women so that they should worry about it. They should care about it. They should monitor it but are not. And then the last one, I have no idea what that is. It's just like the title of one of the queries on Quarry and I just copy pasted it. I don't know if it even renders correctly. But I think that is the good part of this. I don't have to know. This is all done by the users themselves and you'll see. And like so the ideal way this would work is all of this is built into MediaViki itself, right? Like MediaViki is a software that powers Wikipedia. So you would just have a user interface that does these things. But that's literally impossible because you can just think of so many things and if you want to engineer them in such a way that they run properly on Wikipedia which is like a top five website, right? Like you need to think of scale, you need to think of all of these things and then you're doing software engineering which is a lot more difficult than just trying to find out. I just wanted to know which articles don't have a closing parenthesis. I don't want to basically complete a university course to do that. I should be able to do that much more simply. So this has been a common theme for many years. And we solved this a decade ago with this thing called Tool Labs. This used to be called Tool Server. What we basically did was we'll set up computers and we'll provide people access. We'll provide whoever wants to access. So this is run very similar to what I think in the beginning was run by people who ran like universities, like computational departments. So you like get SSH access and there is a grid engine type setup. You can, you have like a screen and then you can run SQL queries. We also provided people direct access to our MySQL data. So we had a live replica of our production MySQL data so you can run these queries against you. Wikipedia has always been developed by volunteers from the beginning, right? Like so until like 2007, like there was like what 15 people paid to run it. And like so most of it was volunteers. So nobody really had the time to like develop individual things that would answer like these individual questions. So they were just like, here is the SQL of all of these. Go like have fun yourself. And people did. There was lots of people who were volunteers who did not want to spend time and effort like learning the entire stack, like trying to get something deployed live but would be very happy to just like run a query on this. And we still have this. So you can go to tools.wmflaps.org and then sign up and then you can set up SSH and then like you have access to a large amount of computing resources that are completely free. You don't pay any money for it. There are requirements of course it needs to be related to Wikimedia stuff and it needs to be open source. Which again is like the whole principles things again. But this has worked well for like several years but I think it's like at some point it's just like too much. If you look at this, which is the steps to query SQL before a query. This is a very simplified setup, right? I'm sure like the actual list is far longer. I've sat down with people and tried to walk them through this and it is not 10 steps. It's like much more, but I wanted to put it on a slide. And this is like an ideal setup, right? Like except for number six. It's all like, okay, if this works, it would only take you six hours to do this. And then you will forget it because you're not using SSH in your life for literally anything else because you're an editor. And you don't know what SSH keys are because there is literally nothing else that's using PKI to validate credentials in your life except SSH. And this is the whole command line tax. People who are used to this are used to this. And then you're like, of course SSH is easy. Sure, if you have already spent the two years of your life that is required in sacrifice to the command line dots to do this. Five, right. Any X number of years where X is like a large integral. But if you look at this, only number nine is an essentially complex problem. Like that is the only thing you need to know to actually answer your question. And that itself is hard. SQL is not a real programming language as the church says, but it's still fairly complex. But the complexity is directly related to what you are doing. Like while none of these other things are, like figuring out how to use a screen doesn't get you, like it is not directly related to finding out which articles don't have closing parenthesis at all. It's just like a hurdle for you on the way. So what we are trying to do is to get rid of all of these accidental complexities. And then like so that you can increase the number of people. I think it's like fairly truest. It's a truism that if you reduce the number of accidental complexities that people have to jump over to do something, then the number of people who do it will increase. I think that's just like true by definition because like each of these steps are gonna lose some number of people. And then once this has been around for enough time, then this whole thing will get a reputation of like, oh yeah, you need to be a programmer. You need to have your identity must be subsumed by that of like, I am a programmer before you can even attempt it. Otherwise people would just like, look at this and glaze over it. That's not for me. And then move on. So that's also what we are trying to combat. It was like, this is for everyone. So we built Quarry. You can go to this. It's on quarry.wmflabs.org. It's open to anyone. So this is the first principle of action, right? It's like open to anyone. You can log in with Wikimedia account, Wikimedia accounts are free. And then you instantly get here. And that's just text box that lets you write SQL. And then you hit run and then you get the output of it. And so this is like, you've completely removed all of the rest of the parts from this. But this is not very new. Like you already have a lot of similar things that are in use privately, right? Like I don't think business analysts in most places, hopefully not, are still using SSH to access SQL tables. And there's lots of products that do similar things. But the difference here, again, is the application of the Wikimedia principles. So if I go here, you'll see the first arrow is that that's a URL and that's public. I can pass this URL to anyone. So this is like, from the top right, it says log in, I don't know if you can see it. But this means this is a logged out view. This is public. So this is the principle too. It's public by default. It has my name, so I have attribution. But I can't say, no, no, no, no. I don't want people to see this yet. It's public from start. Like this is a very useless query, but it's already public. And it has the results and it has the query in it all automatically. Like I don't have to do anything at all. And this is very useful. It leads to some, so this is like, I think I've demonstrated pretty much all of the principles. Like low cost, it's super low cost. Like you can create a new query very easily. There are like, I think, 13,000 queries now. Because you don't have to like, oh, I get 10 queries. I have to like manage them properly. Or like, I have to like file a form and wait for it to come back before I can ask a query. No, just you click a button and it's there. And so this is like, so these are the decisions we made when designing it. Like this has been around for like almost two years, three years now. And I want to see what effects it's had. So this is a very, I like this conversation a lot. This happened on English Wikipedia a few months ago. Oh no, almost exactly a year ago. That's suspicious. Between three people, this is like in the middle of a much larger conversation about admin tools. I don't actually know what they were talking about. But so the first person was trying to answer a question they had about administrator use. And then they had tried to download this, they had tried to run it, and then they had given up. Because it was too complex for them to do and it is also like probably requires a lot more resources than they have on their laptop. And I think that's a fairly common thing. And I think that person actually went about and beyond what most people should do, which is like, oh, I can't do that, that's sad. And then maybe file a request and then give up. But then they tried and then they fail. And then the second person, so they tried, they were like, oh, we can do this with query. And then they were like, I'll ask someone technical. VPT is Village Pump Technical. It's where the people who are technical are supposed to hang out. So he's like, I'll ask someone at VPT who's good at SQL to make these queries and then maybe we can do this. But then the very next day, this person was like, I've put this together now and then we can use this. And then a few hours later, there's another one that is also very useful. And then a completely different third person is like, oh, wow, these two are very useful. This actually answers what we're gonna do. And I think these are all possible only because of these four principles. So this person was like, oh, no, it's technical. And then I'll try it because it's already there. And then they were actually able to do this. They had to learn SQL. We'll come to how that's also easier later. But still, they were actually able to do this only because it was open to everyone. They didn't have to ask permission. The turnaround time between that saying, I'll try to find someone and actually doing it is less than a day. And most of that, I bet, was actually writing the query. And it's also public, so they could just link to it instead of having to like, okay, I will email it to you because of the fact that they link to this, this third person was able to actually also read it. And me looking at it a year later was also able to see what they were actually talking about. And there was also countless of people who was able to read it and just didn't leave a comment. And I think that's very code to the Wikimedia principles. And again, it's low cost, so they were like, I'll just make another query that does this other thing I want instead of like, oh, I already have this. I'm gonna let that be. So I think this is a very good outcome for what we were trying to do. We have this person, they don't consider themselves a programmer, they don't consider themselves technical, but they've just written SQL. So they are a programmer, but that's not their identity. They're still an editor, they're still a Wikimedia admin, but they've used programming as a tool to do something that they wanted to do. And I think that is ultimately what democratizing programming is. And I think this is like, I feel very happy every time I see this screenshot. So the one caveat in this is that SQL, writing SQL is still a complex activity, right? Like it's still like, oh, you have to like, it's not particularly easy, even though we've taken away all the accidental complexities, it's still fundamentally complex. And the one thing that does help with that is again, the attribution on ownership principle and the public by default principle is this. We find out, so if I go back here, if you look at the person's second response, which you can see wonderfully in our wiki pages, way of discussion where everything is indented because that's so obvious, it says you can adapt to that to any time period you want by creating a new query and adjusting the timestamps. So what we found was, this is not something we thought of in the beginning, what we found was people would share a query with someone else and they would just then copy, paste the query into a new query and then fiddle with it to do what they want. And this is just super common and it's super easy as well. Because if you look at a query and you can see a timestamp and then there's obvious for you, oh, I just need to change that. And then that's good. You've written SQL. Or if you see this username or if you're like, oh, okay. And then I feel like once you made that first edit, then the second one is easier. And then you're like, oh, okay, I want to change this where clause. And then it's much easier for you to go ask someone, hey, how do I change this where clause than to be like, so we then provided infrastructure for this. And this is very common. So if you can see that, so there is an original query called often talk pages on comments, commons is a Wikipedia project by someone. And then this entirely different person had forked it and made it often talk pages on TR Wiki, which is, I don't know what TR language code is, but they are just like, oh, this person has done this for their Wiki. I want it for mine. And then they were very easily able to just fork it. And before we implemented the fork functionality, they would have had to copy paste it and we would have lost the provenance, right? Like we wouldn't have known where this come from. And I think like it's a UX, it's adding fork as a infrastructure support thing is good for UX, but also is good for provenance. So you know where someone is getting their stuff from. And like it also provides attribution in a really nice format. And this is very useful. Lots of people do this now. So there's now a generic query that someone has written for finding new pages created during an editing workshop they held. And then when someone wants it, they would just fork it, change the dates and they're done and they are programmed. And they can also like change it. Like, oh, we don't want, we want things to match only this criteria. And then you have a very specific question. I have this SQL query and I want to change this. And that's much easier and people do that a lot. And this is super empowering. People are able to now, like this previous conversation, right? Like if quality did not exist, this person's options are find someone who is technical, convince them to that of the 400 million things they could be doing. This is something that they could be doing. Because if you are technical, doing this is kind of boring after the fifth time. Because you just, it's just an SQL query. And I'm like, well, yeah, but there's all this surrounding stuff, right? Like that's partially my selfish motivation to writing it because I got bored of people asking me to do this. And I'm like, okay, now you do it. And this also gives them a lot of power. They are no longer beholden to me. Power is more distributed. And I think that's very important as more and more power derives from the ability to change things in a computer. So yeah, so this has been going on for three years. I have a bunch of statistics. This is, we have 1,500 users who have written at least one query or forked at least one query or at least ran one query. We have 13,000 queries and 119,000 results at. So each time you write a query, you get a result. And we all save that, it's all versioned. And like, we didn't really do any advertisement. We've maybe given a couple of talks, but mostly it's just people doing this and then sharing it with someone. And then people are like, oh yeah, I want that. And then like, if you go on Wiki and look for links to this, there's like a bunch of links of people just discussing something completely random and using Quarry to support their position or oppose someone's position. Or like, oh yeah, we're gonna do this from this list. And it's like completely organic. There is a community discussion page for people to basically go look at it. And like, you know, I just created a page saying this is the discussion page and now it's full of people asking SQL questions and other people answering them. And I think we have gone full circle in that there is now someone there who's answering questions who was asking them a year ago. And I think that's really good. And I didn't have to do anything. We just created a space for it and made sure that like, this is like something that you can openly discuss and do all the stuff with. And then it just happened. So that's the end of Quarry. So Quarry is the first tool that we built. And the next one I'm gonna talk about is pause. Let me see if I got the ordering of this right. Right. Quarry only lets you analyze the Wiki. It's just a read only view. You can get like information out of it, but you can't really change it. Pause lets you change it. This is a very famous quote of like, I like this a lot. I only found this out yesterday. But like Quarry doesn't let you change anything, right? Like I think power comes from the ability to change things. Like knowing things is important, but you should also be able to change them. And bots allow you to do that on Wiki. I'll explain what bots on Wiki are and what they do in a minute. And people who write them are crucial and have a lot of social capital and power. Like this is the technical person I was talking about earlier, right? Like, if I'm writing a bot that does anti-vandalism stuff, I get the power to decide what is anti-vandalism. If I'm writing a bot that welcomes newcomers, I get to decide what welcome means, what newcomer means. I have a lot of power. And like in outside of the Wikimedia context, like Facebook has a ton of power and it's just all completely un-transparent because of the fact that like it seems to design its systems with the opposite of these four principles in many cases. And being able to do it yourself is very empowering. You are like, oh, I am not like this, like it goes from this is for these other people to like I can do this too. And I've seen this shift happen in place in like people in person and it's like really nice to watch. So this is like dictionary definition of bot. I'm gonna assume everyone knows what it is, but the primary thing I wanna talk about is it's a power multiplier. If you could do something at the rate of one per minute or five, one every five minutes or one every day, a bot lets you do that much more efficiently. And that's the part of it that I wanna talk about. It's just like if you are able to write a bot, then your ideas, what you want to do, you just have so much more power than like it's much easier to be like, I wanna do this and I did it, then I wanna do this and now I have to convince this person who holds slightly different views than me to do that. And I think that step, like eliminating that step and being able to do that is just so much, so powerful. Right, again tools, you could run bots before and I'm pretty sure this goes a lot lower than that. Some of it is similar and again, like only number 10 is what you actually need to do and that itself is complex. Wiki things are complex, finding out what is anti-vitalism is complex, finding out like defining what welcoming means is complex, defining what newcomer means is complex. It's an essentially complex activity and if we add all of these additional complexity, the accidental complexity, what we do is we only allow people who are good at the accidental complexity to take a crack at the essential complexity. There is a lot of good people who would do probably a much better job at what welcoming means, at what this means and then they're just like, but I don't wanna deal with all of this bullshit. You can just be like, okay, so it's like artificially limiting people and it's like bad for the health of the community as well. So this tries to go out of all of that. Everyone knows Jupyter Notebooks, I guess. I Python Notebooks, Jupyter Notebooks. So this is based entirely on that and this is also possible now because of advanced Linux containers and large scale use of it. We have, I think at least 200 users now on only 20 machines and they have a lot of autonomy in them as I'll demo in a little bit because of advances in this container technology that I am happy to talk about later and when I'm here. So, right, so I was going to show this as a set of slides but because things seem to be going okay, I am going to show them as a live demo. So this is the website, you can actually go there and like log in yourself. You can, this is a very new website. We've been working on this only for like maybe three to six months on like our spare time and you can see that it's like very raw but I can sign in with the media wiki. It's open to everyone, again, principle one and I can hit allow and then that's it. I now have a Jupyter Notebook server that's running on the cloud with a lot of power in it. As you can see, I've been using it a fair bit. So one of the most important things is like principle two, right? Everything must be public and in this case, everything is public. I should have queued up the URL as well but it's called pause public and if I go to my user page which as you can see is very well designed. This is the entire contents of my home directory and it's public from the start. You can see all the things that I've been doing, all the crap that's in there and all the not so crap that's in there and you know, like things like this is like, it's just like a half incomplete thing that I would just like playing around with and it's still there or like this which I don't even remember what it was but it was me playing wiki at a query service trying to find list of asteroids apparently. I don't know, maybe it's not useful to me, it's probably useful to someone else. So this is like rendering notebooks and we'll add more rendering to it and this is true for everyone. As soon as I create a notebook, as soon as I hit save, it's public by default and you can download it and you can do things to it and that's very important. Going back, one of the things I said that was important is the ability to change the wiki and give people the ability to do this. So we have, so Jupyter notebooks have a terminal that I think are not very well used, not many people use them, but I think they are super crucial for the things we wanna do because we have had this tool at enrollment, we have a large history of scripts that are bots that already exist, that have a big community around them of people who are self-identified software developers who have built all these things and they have written documentation for them and we have tried to get more people to use them but it's really difficult as anyone who's tried to like get like, oh yeah, I'll just install a virtual end on your laptop, oh it's Windows, well we'll spend a week on it and then it'll break after a month, right? And then like if you come to a workshop and we install it for you and then it's gone in a month, what's you gonna do? It's like, it's stuck and like it also like doesn't empower people, it's like still this thing this person did for me that I can use until it breaks and then I have to go back to this person. While this is a lot simpler, it still has the accidental complexity of the fact that it's on a terminal but getting to this terminal and we have, like we make a lot of things easy. We have, PWB is the PiWiki bot which is the most common Wikipedia bot framework and that is fully installed and that is fully authenticated. You don't need to type your password. We basically integrate that whole thing into you and we keep it up to date. We make sure all the libraries you need are in there and that you're logged in by default and then we have lots of documentation for this so I'm gonna just do this. I don't know if you can, I probably can't see it but oops, oops, oops, I did something bad. Okay, it just runs a script called add text where I'm trying to add the text hello to my own user talk page and then it asks me do you wanna do this and I can say yes and then it would do it and then it's done. Like this seems normal but if I was doing this in the old school way with like the large list of steps, this is a day's worth of work and the scripts are super powerful. This is just the simplest one because I didn't want to run any of the more complex ones but you can have mass replace. The example, mass search and replace across the whole thing. An example I like to give that makes me also very happy is we did a workshop for this in the Tamil Wikipedia community and Tamil is a nice language in that it has very strong rules. You cannot have certain characters follow certain characters and so they wanted to have a bot that looked through everything, made sure this is never the case and if it's the case, there is always a correct replacement for it. But then you also want to like, and like they had one person who they considered technical enough to be a programmer and this person also has a day job. So like this person is the single point of failure for all the things that are like technical in that community. So what people are doing is they were going through page by page, control F-ing for this, changing it manually. This is a complete waste of someone's time and especially like Tamil is like not a particularly big language. Say that it's 60 million people but it's not big language on the internet. And like if you have like the small number of people who are interested in making the wiki happen we shouldn't be wasting their time like this. And when I showed this to them they do not really speak very good English but they read and understand enough English to be able to do this. And now they're actually running bots off of terminals doing all of this work and the person who is technical gets to spend their time doing more meaningful work during more novel work and not burning out because they're writing search replay scripts for the last two years. And I think that's like two important things. You don't want to burn them out and you also want to not give them all the power. Like again, you know, like if you are the one writing the script, you get to decide this is not like like in a language like Tamil where like there's controversies as to which characters are real. If you're the person enforcing this you have a lot of power. And in English Wikipedia as well like color and color and how you spell it for example is like, well maybe there's community consensus to do this but if there's only one developer and he's like what color is of course a spelled color then what's he gonna do? You're like, okay, you had to find out the technical person but this just gives you the power to do that yourself. And like we're slowly what you're trying to do so quarry this took two years for it to happen where people are like, I just used it I'm not a programmer and with pause it's not happened yet because we need to have a lot more polish a lot more documentation and it's still a new project but hopefully the hope is that if we keep doing this and we keep doing workshops and we keep making this better like for example until a few months ago you couldn't type Tamil characters into this terminal because of a JavaScript bug and nobody's ever run into this before because nobody's tried to write Tamil characters into this JavaScript terminal. And so we're trying to fix all of that and this will just make this better and empower all of these people to be able to use this as a tool to do the things they want to do without wasting their time and act as a force multiplier so that you're not like, oh, we want to do this but let's just not think about it because the one technical person doesn't like it and I don't want to argue with them about this. And this is also, it's a notebook so you can do all the notebook things so you can also use this for research but again, I think the more powerful thing comes from actually being able to make modifications to Viki from it and I'm going to just show a couple of things here. One is where is examples, examples, right? You can see that I'm very well organized on my home directory. So this is like, this lets you access the MySQL things that you saw in Quarry directly from Python and then this gives you a lot more power because you can mix and match now. You can be like, I'm going to get this result from here and then I'm going to hit the page view API and we are in the process of building easier helpers for all of those and then I'm going to find out like which of the Viki project women articles that are the smallest have the highest page views and so that we can focus on them, right? Or like you might decide, oh, I want to find out which of the articles on women that are like women scientists that are Iranian and have the highest page views, I want to do that or whatever it is you want to do and like eventually we'll make this easier and easier with the process I'll show next but like the fact that these fundamental things are already available to you makes this whole process much, much more easy to just like go on. Like we've reduced the cost of doing this so much. You don't have to figure out SSH, you don't have to figure out all of this. This, so this is available publicly, right? Like, and we don't have a fork button yet but we will soon. So you can just be like, oh yeah, I saw this link and it is doing this for like, you know, like women and I want to add an Iranian thing. So I'm just going to hit fork and then I'm going to fiddle with it. Like, and then like, oh yeah, no, no, I'm not programming. I'm just fiddling with this thing to make this work. Then they have it work and I think that's very powerful. The other very powerful thing that we are trying to do now is, let me open both of them. This I think is very, very powerful. So what I'm doing is I'm importing a function from another notebook to do something for me. So what this means then is like if one person writes a function that queries SQL properly then everyone can just use that and you don't have to keep reinventing it yourself. And if one person writes a thing that, you know, like you give it a country name and then it lists like, I don't know, the list of scientists in it or whatever and then you can use it as a compostable building block into many things. So that list that, you know, like if you're listing, I don't know, like pages with opening parenthesis, you know, that might be a notebook that uses the SQL function from there to do that, right? Or if you're looking for like banned users who have most edits, then you can plug in a notebook that basically gives you a list of banned users and a list of, you know, like, or like the number of edits they've made in a namespace and then you're basically doing a function call. And I think that's a lot more easy to understand. Like if you already know what namespaces are and you already know what block editors are and because it's a notebook, you can document this in line and then you can just be like, I'm just gonna pull these two together and that's my answer. And then someone else can then pull those two together to do what they want. And like this is possible because it's all public, right? If I write something that gives me the list of blocked users, I'm like, ah, who'd want that? But then people might and people would and you will come back in two years and go like, whoa, what has this become? It is like, I changed this and it broke some 400 people's scripts. Which is what happens on Wiki now. So Wiki, if you look at it, has lots of templates. So you can include templates in places. That's how info boxes all look the same and if you change one, everything will break. So that's why this is not publicly available yet because I'm working on versioning because we don't wanna repeat the mistakes we did. You only wanna learn the good things and the mistake was definitely the lack of versioning means that it's very hard to move because you might break something. And if we have versioning, this will not break. So we're working on versioning for this composability bit. And it also has a lot of other uses. Like if you look at this, this is Ipy widgets. I'm sure people have seen this. So this is like, I have this function called names, and I've just wrapped it with a function that basically gives me this UI and I can hit run and it works and I can change the user to someone else and it will work. And we're working to find a way to actually be able to deploy this eventually as a web application so that people wouldn't even need to know that it's running off of a notebook. I do have, I'm slightly over time, but I'm almost done. Right, oops, what do I do? WFM Labs, well, if only it worked for me. There we go. So this is a prototype of one of these things we're working on. So this is like, if I look at this, this is backed by a notebook. This eventually calls a function that I think is the same as this. If you look at this function, it has users start, end, and database list. And then we have code that basically like uses naming conventions or you can put this in a doc string and then it extracts it out and then it generates this UI and when I fill it in and then I hit stop it, it basically says, oh, I know this notebook because it's a public notebook and it's a versioned notebook. And I'm gonna just get that IPNB file, execute it, give it my input, find out what output it gives me and then do this. This is a much more powerful way of basically being, so you just, people who can write a notebook now can write a web application. Like you can just expose this to whoever. You just maybe made a new metric for efficiency in a wiki editor-thon or whatever you wanna think of. And then it's suddenly now usable by anyone who can use web interface and they don't even have to see a single line of Python code in their lives. Or if they want to, they can. They can just look, I mean it's not linked to here, but because it's backed by a notebook, you can just click to the notebook, fork it, make the changes you want and then see how that looks like. And it's super transparent and also super empowering because you're not like, oh, this is magic that these people decided. Metrics are really important because humans tend to optimize for what the metrics are defined as and when they are defined in non-transparent ways that you can't play around with, it doesn't end well, usually for the people who are bound by the metrics and if people who define the metrics usually don't know, they have good intentions, hopefully. So this makes that much easier and this is possible, this is again emergent behavior from the fact that all notebooks are public and like Forking exists and you don't have to ask at ownership for this and when we do have this, you can edit it because it's open to everyone, you can just click through and it's low cost. So you don't have to like, oh no, I can't create a new notebook because I'm out of this space or whatever. It just happens. So I think, okay, so I'm gonna just, okay. I had screenshots here for the same demo I just showed just in case that didn't work out. Right, and then this is lots of low hanging fruit, all of these things that we're gonna work on, like bots can have cron jobs so you're like run this notebook every day, update this. So then now you have a dashboard that just updates every day that you wrote as a notebook and you didn't have to learn like distributed system processing and cron and stuff like that. Web services I just showed, all of these things are stuff that we are gonna do and take away as I would like to think of it is this can work if you do this, like if you want to eliminate accidental complexity then using these principles is a very good way to do that. And it also gives you lots of emergent stuff that you would have not thought of in the beginning and you would keep, it's a gift that will keep on giving as long as you have people doing it. And it is viable right now. It's not like something that's far off in the future. If people want it, you can have it. None of these things, like they're Wikimedia specific in the sense they're all running on Wikimedia right now but the principles are not like tied to Wikimedia. I could have a query that like queries the PLOS database or like a genomics database or whatever and the same thing for POS. There is no reason that it has to be tied to Wiki. Like the concept is the same and we are developing this in a completely open and like redeployable fashion and like hopefully like there would be more of these deployed in more places. I think the data aid stuff is like somewhat similar to POS but like obviously doesn't have as much openness because students and you don't want people copying homeworks but I think like having an open POS style setup will is definitely useful in the long run for a lot of things. And that's it. You can go look at these. So that's a query URL, that's a POS URL and Aaron Huffaker, Stuart and Alice Koy helped me set up this presentation. Ori Levne is a programmer at Wikimedia from like we talked about IPython notebooks like three, four years ago continuously and that's where POS was born from and Aaron Huffaker was very useful in the birth of Quarry. So I wanted to credit them there and that's it, I'm good. Thank you for listening.