 Hey everyone, we're going to go ahead and get started in about 10 minutes, so this is the 10 minute mark. If you want to go ahead and start finding your seats, that would be awesome. Thank you. Hey all, this is the 5 minute mark, so we're going to get started in 5 minutes, so if you want to find your seats, that would be awesome. Thank you. Get a little closer. Now don't be shy. I'm Hither. Thank you, thank you. So FYI, anyone on Team FYI, WMFS, I'll go over it shortly, but free open knowledge, it'll be on a slide as well. Thank you. Here it is. I feel like it's been 5 minutes. Shall I begin? Okay. Well, thank you all for coming. I know you have choices what to do with your evening and you chose to be here. We appreciate that very much, so thank you. And also thank you for tolerating our elevator situation. I had a premonition that was going to happen and it did. Probably my fault. I owe you a beer. There's some beer back there. Okay, so I am Amy Elder, Director of Recruiting for Wicca Media Foundation as of about six months ago. And I wanted to go over a few logistics, just kind of housekeeping stuff so everyone's comfortable and settled. But before I do, I'd like to thank Maria O'Neill for planning this, concepting, execution, extraordinaire. Thank you very much. We're heading it with the support of Sarah Rudlin, again, Nisler for their guidance and support. Erin Hapaker and Moriel Schotlander are our speakers tonight. We thank them as well. Brendan Campbell in the back, DJ, AV, for his AV expertise, and of course our well-loved Lee May-Lee and Janet in the facilities group, so thank you very much. All right, going backwards here, okay, so here we are. We are at 149 New Montgomery Street on the fifth floor. It took you 25 minutes to walk up the stairs, but you got here. If you need the stairs again, which I don't anticipate, they are back there, spiral staircase to the right. And also to the left by the elevators. Oxygen mask, I'm just kidding. Okay, piece in the bag, trash, recycle, all that stuff. The bathrooms are down the hallway to the right. And here's the map. All right, the agenda is definitely off, but that's life. Changes happen and we roll with it. So we are rolling with it, it'll be good. This will give you an idea of the flow. So you know, we are photographing the event and it is streaming live on YouTube. If you'd like to join in on that, fun. The Wi-Fi password is free open knowledge, hashtag, Wikimedia tech talks, feel free to, while you're at it, like us on Facebook as well. And if you have any questions, go to myself or anyone else in a red name tag or shirt. I edit Wikimedia. Thanks again for coming. Next step is Erin, happy curve. Thank you. All right, let me just get plugged in here. All right, okay. Well, and so there was a little bit of a lie in the setup for this. I thought that I could show you detecting vandalism in Wikimedia in three easy steps. And it turns out it's going to be for regretfully. And with no further ado, let's get started. So I'm Erin Halfaker. I'm actually a Wikipedia editor. I started working on building tools for Wikipedians. Gosh, it's been almost 10 years now since I started doing that. Epoch fail is my Wikipedia persona. I'm also a senior research scientist for the Wikimedia Foundation. And so today I want to talk to you about this interface between Wikipedia and the Internet. This has worked out pretty well. The Internet basically spills human attention into Wikipedia and that turns into encyclopedia articles. You know, it's great that anybody can edit Wikipedia, but of course this brings up questions about quality. We might remember from around, I don't know, like 2006, 2007, this was a big question about Wikipedia. How could there be any quality in encyclopedia that anybody can edit? We don't really ask this that much anymore, at least I don't, because it seems that Wikipedia is high quality. And so now the question is, how did it become high quality? And that's actually something that I'm going to talk to you about today. One of the core technologies that enabled Wikipedia to be high quality even though anybody can edit it. So it's important to note that there are a lot of initiatives that Wikipedians do in Wikipedia. Wikipedians are the people who edit Wikipedia. It's a bit of jargon. I'm going to give you a ton of that. I'll try my best to define it. Anyway, so I wanted to tell you about two projects that are really important to quality in Wikipedia. The countervandalism unit, which is a group of Wikipedians who make sure that no vandalism that gets added to Wikipedia sticks around for very long. And the Wikipedia 1.0 initiative, which was all about rating the quality of articles in Wikipedia so that at one point maybe we could cut a version of Wikipedia, call it 1.0, and say here are all the articles that are great. It turns out that that didn't work out very well. But a whole bunch of these subgroups inside of Wikipedia that were interested in medicine or Denmark or playstations took over this article quality rating system and used it to help them organize their own work. So it's really important for me in this talk that you get a sense for how much there is behind the scenes in Wikipedia. I really like to use an iceberg to talk about this. The tip of the iceberg is adding new content. This is actually, it's the first thing that you think of when you think of anybody can edit Wikipedia, you might add some content to it. But it turns out the stuff hidden beneath the water, quality and maintenance of what's already in Wikipedia is a huge amount of the work that people do. And it turns out that Wikipedia is a fire hose of new content, so there's a lot of quality and maintenance that actually has to happen. So here's just some numbers that I can throw at you. There's about 160,000 edits that are saved to English Wikipedia a day. That's EnWiki by the way, that's code that those of us who do a lot of development for English Wikipedia call it EnWiki or English Wikipedia because EnWiki is the database name. So I might say that later, I just want you to know that's what that means. So we get 50,000 new article creations per day and we get a ton of new people on the site, about 1,400 new editors per day. So the trick with a machine learning model, which is one of the things that I'm going to talk to you about today, is to take this fire hose of 160,000 edits and split that into the edits that need review that might possibly be vandalism and the edits that are probably okay and are almost certainly not vandalism so that we can spend less time reviewing the edits so that we can hopefully actually spend time contributing new content to the encyclopedia. So how do we build an AI that flags edits that like we need review? Alright, time to hop over to my iPython notebook. There we go. Alright, and so in this talk I'm going to show you the actual code and the overall strategy for basic damage detection in Wikipedia. Damage is a word that I'd rather use instead of vandalism because there's all sorts of damage that people do on purpose or not on purpose. When you do damage on purpose, that's vandalism, but sometimes you make a mistake and we still need to review that as it's added to the Wiki. Okay, so actually who here is familiar with iPython notebooks or Jupyter notebooks? Okay, so we got about half the audience. How many people have written any Python at all? Alright, so we got most of the audience. So this is going to be mostly familiar to you. If you haven't picked up Python, I would recommend it. I like working in Python. This guide will actually show you the commands that you can run to replicate all the work that I do around the systems that I'm building to detect vandalism in Wikipedia. But also a wonderful thing about Python is it's relatively easy to read if you're not familiar with the language. We can talk about any of the things that are kind of confusing after the talk. I believe there's some question and answer afterwards. Okay, so the basic process is going to follow this. In step one, we're going to gather some examples of human judgment, human judging whether an edit is problematic or not, damaging or not. In this case, we're going to take advantage of reverts. Reverts are what Wikipedia's do when they find an edit that's problematic in the Wiki. They revert that edit to a previous state of the article, removing the change that was causing a problem. In the second step, we're going to split this data, this human judgment data into a training and test set. In the third step, we're going to train our machine learning model on the training part of that set. And then finally, in the fourth step, we're going to test the model. And so the fourth step is very important. We get a machine learning model in the third step, but we don't know if it actually works until the fourth, so that's four steps. Okay, and then once we get to the end, we'll have some fun applying this model to some edits that are actually happening live in Wikipedia. And so, whoops, of course this down arrow is not doing what I would hope it would do. So this is a general overview of the process. We're going to get our label observation. We're going to make the training and test set. We're going to train our machine learning model to get a prediction model. And then finally, we'll run our test set through that prediction model to get our statistics to tell us whether the model actually works. We'll actually be taking chunks of this diagram for each part as I move forward. Okay, so, regretfully, I can't run an SQL query directly against MediaWiki's production databases right from my Jupyter Notebook. Although it's actually quite close to being able to do that, there's a system that we're standing up right now where you can just log on to a website that we have and run queries against our database servers and run your own Jupyter Notebooks. That's called PyWiki Baptist of Service. Regretfully, it's not online quite yet, but if you're really interested in playing around with Wikipedia data, I'd like to talk to you about that and then I can let you know when that's online. So, in the meantime, I'm going to use a separate service called Query, which does let us query the production databases directly. And I'm going to run this query. This query connects to English Wikipedia. And we're going to select a random sample of revision IDs which represent an edit in Wikipedia that happened between February 2015 and February 2016. We'll randomize them and limit it to about 20,000, which is about enough data to train and test our classifier. So, if you actually want to look at this query, which I kind of recommend, so we're going to pull up this page, this is actually our query interface. This is totally public. If you have an account on Wikipedia, you can just show up here and run your queries against our data. Great. You can see that it runs this query, the same query that I just showed in the iPython notebook. And here is an example of some of the revision IDs that are returned by it. I can click on this download data link and go to TSV and that will actually download my data for me. So, if I hop back to the iPython notebook, this link right here, this link is the one that I actually get from that TSV download. So, now I'm going to use a little bit of iPython notebook to actually get this data from that URL and throw it into this RevIDs file, extract it with a little bit of Python list comprehension, and now we have our revision IDs of 20,000. Excellent. Okay. So, now that we have a set of revisions that we want to work with, these revisions again represent edits in Wikipedia. We want to label them as whether they were reverted or not, whether somebody saw fit to remove them from an article. In this case, or sorry, so we want to exclude a couple types of reverts, the kind of reverts that are probably not related to damage or vandalism. In this case, it's going to be, sometimes you'll revert yourself, you'll make a change and then you'll see that that change actually wasn't good and you'll throw it away. Other times, somebody will revert you, but then somebody else will show up and revert back to your edit and say, actually, no, this did belong in the article. And so, both of those we want to exclude from the set because they're probably actually good edits in the end. So, this chunk of code here, which is a little bit complex, is going to get that for us. We're going to use two libraries that I've written for extracting data from the MediaWiki API that's behind Wikipedia, this MWAPI and MWReverts. So, I'm going to walk you through this really quickly because we don't have a ton of time, but if you want to dig into this later, this iPython notebook is online and we can definitely talk about it afterwards. Okay. So, first thing we're going to do is we're going to set up a session for talking to the API. You can actually see the URL here. This connects to the English Wikipedia. If you throw this into your browser, it'll show you the English Wikipedia main page. We're also going to send a user agent along with this so that if I do something bad, the operations people at the Wikimedia Foundation can say, hey, a half baker at wikimedia.org, stop hitting the API so hard you're causing problems. So, anyway, that's all that's about. Okay. So, now we're going to go through a loop with the revision IDs that we have and actually check to see if they were reverted. These few lines here actually run this call using this MWReverts library in the session that we just set up in order to check to see what the reverted status of this edit is. So, if the edit was reverted, then we're going to check to see was it reverted by the same user who saved the edit in the first place. We're also going to check to see if it was reverted back to by somebody who's not the same user who saved it in the first place. And if it was reverted and neither of these two things are true, then we're going to mark it as probably a damaging revert, like need to be vandalism. And otherwise, we're going to mark it as false. Okay. So, these are true and false labels to our set of 20,000 edits in Wikipedia. And so, if we actually try and run this, then we'll, because I have a little bit of logging at the bottom, we'll get some dots and we'll see an R for the edit that was actually reverted. This actually takes too long to run across 20,000 revisions. So, I've uploaded a data set that will just load into place. So, these next few lines of code are going to get us our pairs of revision ID and whether the revision was reverted or not true or false. So, it turns out that after running this, we only actually get 19,868 revisions back. This is because between the time that I ran the query to get the edits and when I ran the query to check to see if the edits were reverted, some of the pages that were associated with those edits were deleted. And so, we can't actually check to see if they were reverted. A lot of pages are created in Wikipedia that are deleted pretty quickly. It turns out that if you create a page about your favorite garage band, it might not survive. And so, but anyway, this 132 edits is probably an acceptable loss, so we'll move on. But one thing before we do, I just want to spot check to make sure that the edits that we marked as reverted are actually probably damaging in Wikipedia. Again, we don't have a ton of time to go through this, so I just want to give you a quick overview of what I found when I looked at this. So, this is a bunch of links that will actually take us to Diff's in Wikipedia. Why don't we actually load up the first one? That'll be kind of nice. Maybe. All right. So, in this example of reverted edit, we can see a massive amount of removal of content from the article. So, on the left-hand side, we're seeing the previous version of the article, and now we're seeing the current version of the article on the right-hand side. And we can see that a ton of content was totally removed. Huge sections of stuff in the article about Chris Jericho. It was apparently a percent. My keyboard shortcuts down, so I don't have to keep on out of full screen. But anyway, so I looked through each of these edits. Again, these are edits that were actually marked as reverted, and I generally saw damage here. So, section blanking, unexplained addition of nonsense, vandalism with the comment, I don't know, adds non-existing category, test edit that removes some punctuation, adds a spam link to the article, adds nonsense special characters to the article, unconstructive link changes, vandalism around a pay-pay-may-may, and of course, some other nonsense that were added to the article. So, it looks like when we label things as reverted with the strategy, we're generally catching damage, so it looks like we're pretty good. All right. Part two. I did. I added these comments based on what I saw in these edits. We could have go through them one by one, but I might take the rest of the 15 minutes that I have. So, all right, on to part two. So now we're going to take this data set that we have, these revision IDs and whether they were reverted or not, and we're going to split them into a training and testing set. This is really important because when we train a machine learning model, we don't want to train it on the same data that we tested on afterwards because the machine learning model could do something very stupid and learn something not actually to do with vandalism and Wikipedia, but just has something to do with the training set that we actually gave to it. So it's really important that when we test it afterwards, we give it observations of edits that it's never seen before. So we actually test its ability to see new data. So the first thing that we're going to do here is split this data set of about 20,000 edits into a training and testing set. Because this is generally going to work okay, I'm going to split this data set into 15,000 edits for our training set and about 5,000 edits for our testing set. So it's about 80, 20 split. This is relatively common in machine learning. All right. So the next thing that we need to do is actually extract the features for these edits. And so features are what a machine learning model uses as signal to make its predictions. Features are numerical statistics about what happened in the edit that might tell us something about whether this edit was damaging or not. And it'll make a lot of sense when I actually go through some of the examples. So for example, the first feature that we're going to look for is the longest repeated character added. So it's actually quite common in Wikipedia to add vandalism that's just KKKKKKKKKKK. I'm not quite sure why. But anyway, so we can catch these repeated characters over and over again and flag that as potentially vandalism. The longer the repeated character, the more likely to be vandalism. We also do some basic gift features. So the number of words that were added in the edit, the number of words that were removed. We're going to do some information theoretic measures. These prop delta sums are generally just measuring how many bad words, how many informal words like hello, ha ha. We fund these sort of things were added to the article. And nonstop words are really actually just a bit of jargon for words that actually mean something. And so we're going to use these information theoretic measures to look for the addition of curse words, informal words or words that mean something that didn't exist in the article beforehand. All right, three more features. We're going to look at whether the user who saved the edit was anonymous. We're going to look at whether the user was in a group that was generally trusted in Wikipedia. So our system administrators who are the sysops, those are the people who generally have advanced rights in Wikipedia. They actually go through a substantial vetting process. These are the stats which are nonhuman editors that have also gone through a substantial editing, or a substantial vetting process. Finally, we'll look at the time since the user registered their account, because newcomers are usually the people who do vandalism in Wikipedia. Okay, so those are our features, and you can kind of see how all of those will turn into a numerical value. But I don't want you to trust me. I actually want to extract some of these features and show you what they look like. And so this next chunk of code is going to demonstrate how we can use this rev scoring system. By the way, I forgot to mention this. Almost this entire IPython notebook is going to pull from this rev scoring library that I built in Python specifically to make this work of building machine learning models or Wikipedia easier. So anyway, that's what I mean when I say rev scoring. Okay, so we're going to import a feature extractor from rev scoring. We're going to give it the same session that we actually used earlier to detect whether edits were reverted. This is an API session for accessing Wikipedia's back-end API, which by the way is open to everybody, so you can run this IPython notebook for YouTube. And we'll check a couple of the edits that we saw in the reverted case to see what kind of features actually get extracted. And so you can see we're calling the API extractor. We're asking it to extract features for a particular revision. This features value is actually this list that we defined just up here. Okay. And so for these two edits, we can see how this turns into a list of numerical statistics. Of course, some of these are Boolean. Booleans, of course, are numerical as zero and one. And so we can still use those as predictors just like any numerical value. So any questions about how we got the features here? Yeah, so the question is, if we make all of this stuff public, does that make it easier for people to program something that might help them vandalize Wikipedia? So this is a question that I get a lot. The really cool thing, or at least the maybe cool thing that we have with vandalism in Wikipedia is that vandals generally aren't very clever. They want to put the same sort of nonsense over and over again in Wikipedia, and so that makes them easier to catch. If there's somebody who's actually reading through my IPython notebook figuring out how the machine learning models work and then using that to vandalize Wikipedia, I don't really actually have much hope for catching them. And so I'm kind of betting that vandals really aren't going to even know that this exists or go through that trouble to do it. We have other mechanisms in Wikipedia for catching more nuanced vandalism. I'm really just trying to target the people who are not very clever. Okay, so on to actually extracting features for this whole training set. Again, this is something that's going to take some time. It actually has to gather data from a few different parts of Wikipedia for each edit that it has to look at. And so I limited this one to actually just looking at the first 20, and I uploaded a data set that actually contains the entire extracted feature set for the training set. This block of code just pulls in that already extracted data from this training set. And so again, we've missed a few edits here. So we've dropped our original 15,000 training set edits down to 14,979. And this is because sometimes we don't just delete the page that an edit appeared in, but we also will delete individual edits within a page. You know, sometimes somebody will actually put inside of an edit something that's so egregiously damaging that we can't even have it in the history of Wikipedia, such as I learned your mailing address and I wanted to cause harm to you. I might put that in an edit in Wikipedia. Those are the edits that we delete. And this is about the rate that it happens. So we lost about 21 edits here out of 15,000, probably because they were actually deleted out of the article that they originally appeared in. So we're pretty close to the 15,000 number, so I think we're good to go. On to part three, actually using this data to train the machine learning model. So now that we have a set of features extracted for our training set, we can give this to our machine learning model and end up with a prediction model that we can actually use to make predictions about new data. So the rev scoring library provides a lot of algorithms that we can use for training machine learning model, but from past experience I know that our gradient boosting classifier works quite well in predicting reverted edits, so we'll just use that. And so the next few lines of code are really just showing how to construct a gradient boosting model with some of the parameters that I've learned actually work quite well for this particular context in this particular model. We used a hyperparameter tuning strategy for this, which is a really fancy name for try a lot of parameters and see which one works out best. If you want to see how that actually works out, we can get into that in the question and answer period. But really the most important line is right here. So we construct our model called is reverted, so this is going to predict whether an edit is likely to be reverted. And then we give it our training features reverted set. So this is our features, plus whether the edit was reverted or not, and this model is going to build correlations between those features and the thing that we wanted to predict. Turns out that this takes about 16 seconds, which is what my output reads. And now we have a trained model that we can play around with. So let's try running it against a few edits from our test set. So in this block of code, I'm going to grab a few edits that were reverted, a few edits that were not reverted, and we'll actually extract feature values and then run our prediction model against those feature values to find out what our prediction model says. So I'll just run this chunk of code real quick. And we can see it turning through each of the edits, trying to figure out, first we're going to go through the edits that actually were reverted. And so we can see the prediction that's actually coming out. It keeps predicting true, so it's actually catching these edits and figuring out, yes, in fact, you should be reverted. The number on the far right is the confidence that we have for them. And so as we get to the false values, the ones that start with false, we're getting into the set of edits that actually weren't reverted. And you can see that our prediction model is also predicting that these edits are probably not going to be reverted with relatively high confidence. I actually didn't give the false prediction here. This is the true prediction again, so lower is better. Let's see. Oh, and we have a false positive down there. So let's take a look at this false positive and see what the heck is going on there. All right. So we can see in this article about Craig Phillips, who's maybe he's a TV personality. Yeah, there we go. Phillips has presented as resident experts on a large number of TV shows. And it looks like this editor removed Sex in the City, series four, episode seven from the set of episodes that Phillips has appeared on. With the comment, Craig was never on SATC, which I'm guessing is Sex in the City. So I'm not sure if this edit is actually vandalism. We probably have to do some research to find out if this was actually a damaging edit in Wikipedia. But it seems like this is probably worth review, so I don't feel too bad about it. It seems like this is something that somebody should probably see if it was actually causing damage to the article. So it seems like we're doing okay. Generally, we're predicting that the reverted edit should be reverted, and the non-reverted edits probably shouldn't be reverted. So it looks good. All right. On to part four. So the analysis that we did above is great, and then it gives us an intuitive sense for whether this model is doing anything useful at all. But it's not great, and it's hard to compare two models in this way. We just really just took a random sample of edits that we think it should predict one way and edits that we think it should predict another way and check to see if it generally was doing what we suspected. What we really want is a statistic, something that we can say this model does better than this other model by comparing two numbers. And so that's what we're gonna do here. So we're going to run our test set through the prediction model and use that to generate some statistics about how this prediction model is working. So again, generating the feature set for this test set is gonna take a long time, and so I pre-generated this in advance, and so we're just going to load that into memory here. These few lines do that for us. We have these 4,862 observations that we're going to test our model against. These observations that we know whether they were reverted or not, the model doesn't know whether they were reverted or not. Remember, again, we withheld this test set so the model couldn't see it when we were actually training it. We're going to train on five test statistics, the accuracy, the precision, which is the proportion of correct positive predictions, the proportion of times where we say this is vandalism, and it is the recall, which is the proportion of vandalism that we actually catch, the receiver operating characteristic, which is hard to describe but hard to game to, which is an information theoretic measure of the true positive rate and the false positive rate, and then finally this filter rate at 90% recall, which is a measure of how much of the recent changes feed the 160,000 edits per day, we can mark as not needing review and therefore save people patrolling with media the time of actually needing to review them. So higher numbers are better in all of these cases. So in order to do that, again, we're going to draw from the revision scoring library and pull in some test statistics, accuracy, precision. Maybe we can scroll to the right. Yeah, recall, ROC and filter rate at recall. We'll pass those to the test function for our model along with our testing features reverted set and ask it what we get. And so we see that we get an accuracy of about 80%. So about 80% of the predictions are right. We get a precision of 20%, so about 20% of the time that it predicts that something will need to be reverted, it actually will. We get a recall of about 82%. That means that we very easily catch about 82% of the vandalism in Wikipedia. ROCAUC, I'm just going to tell you this is good. It's not great. We can do better in our production systems. This isn't bad. This is useful. And then finally, looking at our filter rate, we can filter about 63% of the edits in Wikipedia and say these actually don't need review. They're actually pretty good. All right, so we have built a machine learning model. We've checked to see if it works in a qualitative way by looking in a quantitative way by actually generating some statistics. So now let's actually try and do something useful with it. So the bonus round, let's listen to Wikipedia's vandalism. You might have noticed on the screen when you came in because the listen to Wikipedia thing running, which makes a big noise every time Wikipedia gets edited. I'm actually going to connect to the same feed of edits that the system does, called RCStream. If you want to read more about it, there's all sorts of links in the SciPython notebook. This code is actually really just lifted from the documentation for RCStream for how to connect to it. But we're going to import the Socket IO client. We're going to define this namespace thing. We're going to tell it what to do when it sees a change come through. And when it sees a change come through, it's going to check to see if it's an edit. It's going to get the revision ID, which represents the edit in Wikipedia. We're going to extract features for that edit. We're going to ask the prediction model to score it. If the prediction is bad, we're going to print these review. And if the prediction is good, we're just going to print the edit and move forward. All right. We'll run this for 120 seconds and see what we get. All right. Oh, wow. Right away we get a couple. Not very strong predictions. 62 and 64%. Let's see. I'm going to show you some of these tabs and we can just look and see what we get. Okay. So here we see an anonymous editor who's making a change to name of a player. It looks like on a sports team. And it doesn't look like the name has changed substantially. It might be an alternate transliteration. This seems like it's worth reviewing. It's hard for me to tell that it's fandalism. Somebody who's more familiar with this might be able to help with that. But it seems like it was good that I applied this. This is something that definitely needs review. Cool. Okay. Let's try and find one that's higher probability. Maybe we can find some egregious vandalism over at it. Ah, there we go. Okay. So this looks awful. This is definitely not good. So I'm just going to hit undo here. I'm going to say that this is vandalism, which the Wikipedia encode for is RBV. Revert for vandalism. Save the page. And then we've reverted some vandalism. All right. So it turns out that with this model, we can filter out about 64% of the edits that are coming into Wikipedia. So we can reduce the workload of somebody who's patrolling for vandalism by about 64%. It turns out that in our production system, where I use a lot more features and we do a lot more tuning work, we can get that up to 75%. So reduce the patrolling workload down to 75% in Wikipedia. And now it comes time for the pitch. I'm not a volunteer organization. I'm not a volunteer. I'm actually paid. But I need your help anyway. Y'all are technologists. And we have a lot of technological work that we need to support the system to make it easier for our real volunteers, the people who write Wikipedia to have their work be easier. So we want your support. We need people to help us label edits in Wikipedia. If you're a Wikipedia editor, we could really use your help there. We need people who speak more than English, because I'm only an English speaker and we're expanding this to as many Wikis as we can. We're right now up to 16 different languages. It would be great if we were at 100. We need people to help us write code. This is big data. We need to be horizontally scalable. Right now, we can review and edit in about half a second. It would be great if we could get that down to a quarter second. And of course, modeling, the things that I was just showing you, features optimization and evaluation. We're completely online. We're a completely open working group. In fact, most of the people who program this system with me, the system that actually catches vandalism in Wikipedia, are volunteers. We have an open GitHub group on Wikipedia that we call the BAI, where we're building a lot of technologies around detecting vandalism in Wikipedia and predicting the quality of edits in Wikipedia. And so you're very welcome to get involved. Please let me know afterwards if you're interested in checking it out. Thank you. Thank you very much, Erin. Looks like we have some raffle winners. All right. Okay. Pull out your raffle tickets. I believe Maria has the goods. Thank you. Yes. Okay. So we've got 812015. 812017. 812020. 81205. 812022. Lots of lucky winners. Hopefully those of you who walked up the stairs. Very good. Let's see where we're at. Okay. Looks like we have a little bit of a tiny break. If anyone needs to get up, get a slice of pizza. Otherwise, shout hi, Terry. Shall we? Oh, yes. Thank you. Thank you. Thank you. Hello, oh my, that was way louder than I expected. We're going to start back in like a minute. So hello. My name is Moriel, I work for the collaboration team. We're at the Wikimedia Foundation, and I'll talk to you about OJSUI, which is our UI library. And it's object-oriented JavaScript user interface library, which is why we abbreviate it. So how did things really start and why do we need no library like that? So basically, it all started from Visual Editor. So if you don't know Visual Editor, this is more or less the way it looks like today. This is the way for you to edit Wikipedia articles without knowing Wikitext and just edit what you see is what you get. So this project, Visual Editor, started in 2011, and very quickly it was very clear that we need to do something about the UI. Like this is a system, it's starting to get really big, it's JavaScript. We need a lot of buttons and dialogues and widgets and things that we didn't really have so many of in any of our other interfaces. So we started developing a lot of that inside Visual Editor. And then towards 2013, or actually in the late summer of 2013, there was a decision to split the product into three different repositories. So two of them were Visual Editor product itself. One of them is the core product that can work on any piece of software on any blank HTML page. On top of it came Visual Editor, MediaWiki, which has a layer that is specific for MediaWiki stuff. And then out of both of those, we took away the library that we used to create all those widgets and buttons and everything. We're going to call it OJSUI, and we split it out as its own library so that we can actually use it in other places in our code. So why do we need to build anything new? Why aren't we, you know, using something existing? There's jQuery UI in 2011, why didn't we use that? Well, you know, we're engineers, so I guess the first answer is why not, you know, but more seriously. So MediaWiki is very different than many other, you know, programs out there and many other platforms. MediaWiki is super, super extensible. What do I mean about that? Well, we allow for extensions with, you know, many other products called plugins. Okay, that's fairly usual, but we also allow for user scripts. So if you go to Wikipedia and you sign up, you can go to one of those two pages in your namespace. So common.js and common.css, those are actually, you know, pages under your name. And if you go to common.js and you write JavaScript, that would actually affect most of the pages that you see from your user page. So you can actually add the scripts, you know, that change stuff in Wikipedia to your user space. We also have gadgets, which are very similar, except these you can actually share between you. So a lot of people, you know, if they have a really cool thing that they created for their own user space and people start getting interested in that, release it as a gadget. And other people can add it, you know, to their user space and have some script that does something on Wikipedia. And it can be something like, you know, show me or, you know, mark certain words in an article or give me links to definitions in Wiktionary. Whatever it is, if it's done in JavaScript, you can do it in gadgets and your scripts. And we have 280 languages of support about that, which makes it 895 Wikis that we support. That's a lot. So these things really create a lot of challenge when we try to figure out what we, you know, what kind of libraries or what we actually need to use when we have, you know, front-end stuff. So what are the constraints? Well, first of all, we can't really trust the DOM for the data storage. We don't know what happened to the DOM between, you know, us reading it and us analyzing it for whatever data it is. Some of it is because we have user scripts and some users can have something that manipulates this DOM and does something different. We don't know. So we don't really want to trust the DOM. On top of that, we don't think it's a good idea to trust the DOM for data, but that's a different issue. We have to make sure that everything is cross-platform and cross-browser. Wikipedia has to work, has to work in as many places in the world as possible, right? So even if you have a really weird phone from 10 years ago, you will turn on Wikipedia. At the very least, you'll be able to read it. So then on top of that, we would like you to also be able to do a lot of other things. And there, we kind of like to figure out, okay, where is the limit between what we can do with the technology that we support and what we can do with, you know, the devices that are out there. But at the very, very least, we want to make sure that people can actually read the Wikipedia. So it has to be cross-platform, it has to be cross-browser, has to work everywhere. It must be theme-able. And we need to have one place for UIs and widgets components so that users that create gadgets and create, you know, plugins and create extensions can have one consistent UI elements or a bank of elements to use. So these are kind of like the constraints that we have. And then we looked at a lot of things like jQuery UI and we ran into a couple of issues. And jQuery UI, it's not that it's bad, it's very good. It doesn't answer the things that we run into. So what are the existing problems with things out there? Well, first of all, for the most part, data is stored in the DOM. So if you have, so this is an example from Bootstrap, for example, you have, you know, like a div and it grabs some link and then you have things like data toggle and data something and that is something else. And a lot of times the JavaScript would then reach into that DOM element and check its status by whatever data attributes it has and change those data attributes and keep on, you know, preserving the state in the DOM. And those are the kind of things that we really don't want to use. So that is the first problem. The second problem is that DOM events in general can be really messy. So here's an example. We have an input field. Let's say I want to do something once the, you know, text changes. So I need to listen to an event that tells me, you know, the data has changed. Okay. There are about five events that I need to listen to. Keydown is not bad, but it doesn't answer all cases, as I will point out. But on top of the fact that it doesn't answer all cases, I also need to make sure when I intercept it that we're not talking about arrows or enter because those do not change the content, right? Change event is great, except it requires a focus change. So I need to get out of that input field, or at least for the most part. You know, if I want, while I'm typing, that's a little bit of a problem. Input is fairly new. And cut and paste events I have to listen to because what happens if I right click, so I select and then I right click the mouse and I cut. No keystroke happened ever, right? I did not change the focus, so it's not change event. I have to listen to that event. The same thing with paste. So I have five events at least to listen to. But the plot thickens because some of those events are emitted before the content actually changed, right? So the events are really messy. So what we actually said is, okay, we're going to take these events and we're going to wrap all of that behavior in a widget. And we call it text input widget in this kind of case. And really what the widget does is it intercepts all those events. And then it says, okay, I'm waiting for the stack to clear. That way I'm actually going to see what the actual situation is with that input, right? Even if I got the event before the text change, I'm going to wait for the stack to clear and then checks if the content changed. If it didn't, it ignores it. And then if it didn't, it emits change. So then your code can just listen to that widget, ignores what happens in the actual DOM. If this event, that event, the widget just abstracts it out. So that is something that we really wanted to make sure that we can do so that nobody will have to deal with all these events and then events are added, removed some browsers because the plot even thickens even more because some browsers treat that event better than the other event can get really messy. We wanted to keep that messy part away from the users that want to use these things in the interface. So we wanted to abstract it out. So what do we want out of a library? We wanted object oriented. We wanted event emitter support. So I know I got like this reaction of jQuery has event emission something, very, very basic I think, right? We wanted something really event emitter, proper event emitter support. We wanted to make sure that it's componentized. Basically, we wanted to build bigger widgets based on smaller widgets so that everybody can build better components. We want to abstract the DOM, you just saw why. So we don't want to mess with all the stuff behind the hood. And we wanted to make sure that we have a library of mixins that everybody can use if you're creating a new component. And the one thing that I mentioned in the beginning, we wanted a separation of data and UI. We wanted to make sure that our data is not in the UI component, it's outside of it. And the last thing is we wanted to make sure that we have no JavaScript support for those cases where we have devices that don't support JavaScript. They still are supposed to read stuff. They're still supposed to interact with stuff. They might not interact with the JavaScript full experience, but they have to have the same kind of experience. So everything is OUI, the top ones are OUI JavaScript and the absolute last one that no JavaScript support is obviously not JavaScript, it's PHP. So we have a layer of PHP that we also created on top of that, or actually on below that. So what does it look like? OJSUI and OJS. So it's actually kind of like a split two library system. OJS is the one, is the basic one, is the one in charge of object oriented stuff like inheritance, mixins, event emitter, and all the utilities that we need to do. And then on top of that, we have OJSUI as the widgets, as the abstraction of the DOM, as the themes, all of that. And then we also have OUI PHP. And that is basically widgets and the DOM structure for the widgets that fit both the themes and the behavior layer of the JavaScript. So what that lets us do is start with OUI PHP. If the user doesn't have support for JavaScript, it stops there and just shows the user whatever the inputs or whatever we need to show them. And if user does have JavaScript support, we have something called infuse, where then the JavaScript takes whatever already exists and infuse it then basically uses it as if the JavaScript created those components. So I'll show you an example. This is a screenshot from Echo. It's our notification system. And it's just a popup with some notifications. If you read here, you see I've been experimenting a little bit, so my other user notifies another user of mine, there's a notification, that's great. So what's nice with that is that this popup, if we only look at this list, so this list is an existing OUI widget, well the select widget, because I have items in it and you can select stuff, so all right. So I have a basic behavior of some list of stuff, some items that I can select. This single item, so that humongous name, it's basically it's just in the namespace of the Echo extension, but basically it is extending the option widget of OUI and the option widget basically is in charge of all the events of which item is being chosen inside the select widget. So I have select with options, but it doesn't just extends option widget, it also has inside it label widget and icon widget and button widget. So it's really easy to kind of construct something that looks relatively very advanced with just existing components already. And that's what we basically said before, building bigger stuff based on smaller widgets, which is great, but the entire widget, we said that the data should be separated and that's exactly what we're doing. So we actually have this DM, which is the data model object and it keeps its own state and all the logic that has to do with it say. So it is the data object and the widget accepts it and listens to events. Okay, so the data is completely, completely separate from the DOM or from the UI. We can theoretically and practically just switch the UI on event and all the logic, everything else exists somewhere else. And in the code, so this is of course just a snippet, it doesn't look like that completely, although more or less. So basically the widget accepts a model, that model and then stores the model and listens to it. So you can see this model, is there like a, aha, this model, okay, connect and we're connecting to the events and then we'll react to the events. And we're creating a button widget here and everything is event based. So we're creating a button widget and we're attaching it and then we're saying, okay, listen to click event. Okay, when we click that mark things as red and this is our internal method here that changes the model. So everything, this is basically a view model that you probably all know, but everything is done with the separate model and events, which is part of the principles that we wanted to keep. So we have tons and tons of widgets, tons and tons of widgets really. I actually created this lecture yesterday, adding more screenshots because I was actually surprised about five more widgets that were added in was cool. So these are just very general kind of examples of stuff we have from simple stuff like buttons to fairly advanced stuff like draggable stuff, search widget, this entire thing, this entire thing is one big widget. So you can basically just use that and just replace the logic and you're done. It's really cool stuff. At least that's what we think and we would love to share it with everybody. So it's MIT license. You're all welcome to use it, contribute to it, give us feedback, of course. I'm very disappointed about the colors here. Apparently it's not very visible. Sorry about that, but I will share that PDF. There's no way we can increase the, is there? Maybe, okay. Well, I will share this PDF. Oh, select the text where anyone has access. Where's the computer? Behind me. Okay, let's see if I can. Oh, the mouse is moving. Oh, you're doing it, okay. That's not better. Only. Well, I will definitely share this. I'm very disappointed, I'm so sorry. All right, I'll definitely share this. We have the official documentation. We have the code documentation. We would love to, the demos are up and we would love to see you join us in the repository full request or welcome or in our case, patches for Garrett. Thank you. What is next? Sorry. So now we're going to transition into a Q and A section for Aaron and Morial if anyone has any questions. We just asked that we use the question mic over there for the sake of folks on the remote stream. And thank you. So this is a question about OJSUI and that is for applications which don't need to support these old hazards. Do you need to use PHP or can you just use the JavaScript? So applications that are either strictly JavaScript like Visual Editor, for example, Visual Editor is only JavaScript. There's no real meaning for it outside. We don't use the PHP part. For features that you don't need, there are extra dialogues, stuff that will not appear for a user that doesn't use the JavaScript. We don't use the PHP side. The PHP side is there to help us with the original version not being duplicated again by another code now for JavaScript. So setting up the stage and then everything else that's above that will not be. Thank you. Where can we go to find out GV2.js? So if this was legible, it would show you. Unfortunately, the only thing I can try and do is either copy this. Can you copy this to like a blank page? I did not anticipate that all of them would look like they're visited because I visited them. I'll try to put them up on a blank page and I'll definitely send this to all the participants and share this. My question is about the presentation. Do you do anything about edits by bots? Yeah, so we generally don't like worry about edits by bots as a needing controller review, but we still include edits by bots in the predictions that we make. We actually set up our service so that you can run any edit, whether it's a bot or a human or whatever, through the service. But one of the things that we actually struggle with quite a bit is that a lot of our wikis, we run, gosh, 180 languages of Wikipedia and 180 languages of a bunch of other projects. And so many of those wikis are primarily edited by bots. And by and large, any bot edit is very unlikely to be vandalism. At least it's not intentional damage, which it's maybe accidental damage at a large scale. And so when we have people reviewing edits by bots, it's just overwhelming and it's pretty much all good. And so usually we ignore those in practice, but the system that we build could potentially actually be used to detect damage caused by bots. I have not seen that actually happen in practice. I've not seen people actually find that useful, but it doesn't mean that it can't be done. So are the bots doing the edits just in one or are they by contributors, legit contributors? So most of our large wikis, like English, German, Japanese, Italian, Spanish, French, Chinese, like the big wikis, they have robust processes around bedding people who are given a bot flag, a flag that marks their account as a bot and they have policies against anybody else running an automated tool that actually edits for them. And it's relatively consistently enforced. And so most of the time when you see somebody who is running an official bot, then they've gone through a substantial bedding process before they get to that point. Thank you. I had a question about the NPM packages. My understanding, I don't know much about NPM, but it's a server-side JavaScript framework. Is that right? So you can use it in Node, but NPM might also help you with things like Grunt or... With what? With automated, I don't know how to define it, automated JavaScript tools, like Grunt, for example. So you can include it in your... I see, for unit tests and stuff. But it's also Node.js is NPM, so you can also use it there. Especially since I include both OJS and OJS UI, because you can also use OJS separately and just get mix-ins, extents, all the utilities and stuff like that without all the UI. I see, yeah. Have you tried running your server code with Node.js on the server-side, or is PHP better for that, or it doesn't matter all that much for what you're doing? Well, Meteor wiki is PHP, so we're not going to switch now to Node.js. We do have Parsoid that works on Node.js, but then it doesn't do the UI part. So for the moment, as far as I know, we don't really have things that really need the UI part in Node.js. Everything works on PHP and Meteor wiki, and we can't really replace that. That's why OJS PHP was born. See, thanks. Hi, my question is for Aaron. And forgive me, I don't write in Python, but I'm just trying to grab my head around some of the logic. When you're in that for loop there, when you guys were looking over all those revisions, what stood out to me from what I think I understand is that your distinct value of some of those edits and those revisions, are you equally in that whole algorithm, like whatever you're comparing, is it's almost the frequency of the edit you're evaluating that the frequency and the user and then the string value, like it almost seems those are more important than the value itself. Is that fair to say? If I'm answering, if you understand my question. So I think I'm understanding it as like the history of the user is important when you're trying to figure out if an edit is vandalism. Yeah, that's even more important than the content or the string value itself. Is that true? So there's actually a really substantial history of vandalism detection literature for Wikipedia. And so the vast majority of the predictive signal we can get from the basic features that I showed you, this is really like the most basic feature set that I could possibly show you. Right now in production we don't use historical features about a user for predicting whether their next edit is gonna be vandalism. But research literature suggests that we can get a substantial amount of additional signal from this. The thing that I'm really concerned about is having somebody get into a feedback cycle where like let's say your first edit on Wikipedia was a mistake so it got reverted. And now we're super skeptical of your next edit and so maybe we're more likely to revert that one and now by the third edit we hate you and we want you to leave. So instead what I've been looking at is not looking at how people reacted to your edit but instead applying the per edit vandalism detection historically. And so rather than asking are you the kind of editor who gets reverted to ask are you the kind of editor that does edits that look like vandalism? And so I've done some work, I actually have a little bit of published work in that area that suggests that this works but I haven't tried it with, I haven't tried it to see how much additional signal that you can get on the, is this edit actually vandalism use case? But I think that there's a lot in that direction. Okay, yeah, I just wanted to ask, I think. That's a good question. Cool, yeah. You're saying machine learning model. Thanks. Going for the machine learning question. Do you ever see a point where you automatically reject an edit? It's 90% and if so, do you spit back an error message saying it's been not only rejected but here's what you tripped up, right? I think I'm worried about the feedback cycle and that. Those seem to be the machine learning applications I think about the most is giving feedback to the user other than just being told the algorithm hates you. Yeah, so like the first question, have we looked at automatically your reverting edits with an algorithm? So this is actually something that's happening on the English Wikipedia right now. As far as I know, this is the only wiki that has a system that does this. It's called QubotNG and so only for those edits that it's extremely confident about, it will automatically revert. And they claim a very low false positive rate. I'm a little bit skeptical of their analysis there. But anyway, they're really only targeting the egregious edits. Like really you have to put a lot of racial slurs that otherwise Qubot's gonna mostly ignore you and then it goes to a different set of tools that are these, more on the human computation side of things where like the tool does most of the work of showing you a diff and you just have to click go to bad and it takes care of the rest. Sorry, I forgot the second part of your question. Do you see giving feedback why? And maybe I didn't even add to that. Do you see community involved in the definition of the features and the threshold for something? Yeah, so I mean, okay, so talking about like the features and being able to say why and edit was scored as likely to be vandalism, that's extremely difficult. Like these algorithms are generally black boxes. Like there's some hidden correlation that's happening inside of there that tells us something that's often surprising about an edit that might be damaging. Surprisingly right or surprisingly wrong. This is sort of an active field around these sort of algorithmic strategies to make them easier to describe. But on the other side, so you asked about like community involvement in this sort of stuff. So this project actually, I was not paid to work on this for a year until like three months ago. And so this was like, this was my volunteer work. And so most of the people who work with me, I was saying they're volunteers too because we just sort of thought this was important and so we started working on it. And so like, it's hard to like wave my hand and say the community. But there's a lot of people who aren't paid to do this who work with me. We are, most of us, you know, speak one at most two languages. And so to have the language breath that we have, we have to get collaborators who work on those wikis who are familiar with them who will actually tell people what the service does. And so they don't just do that. They also give us feedback, they make suggestions, they help us figure out what the model is doing right and what it's doing wrong. And so I'd say that we have a relatively high community involvement. Of course, we can always do more. Hello, I was wondering, are you moderate pages with the virtual content? Like at edited really recent pages? Have you been in control or? Yeah, so the question was about like dealing with vandalism around controversial content, right? So one of the things that we do do, and this is relatively a recent development. So we used to count the number of bad words that were added in an edit. And so like, let's say you're editing the article about a curse word or about a sexual position or something like that. Then these types of things, every single edit to that page will be flagged as potentially damaging. And so that's problematic. And so I kind of waved my hand earlier about this prop delta sum thing. But that's a different way of thinking about the number of, like how you actually change the content in the edit. So let's say that you're editing the article about a curse word. If you add another instance of that particular curse word to the page, you'll get a really low score for this prop delta sum because it's taking into account how often that word already happened in the page. But if you add a different curse word to that page about this curse word, I don't want to offend anybody by using examples, but I think you understand what I mean. Then you'll actually get a very high score. So if it didn't already exist somewhere in the page, then it's going to say, hey, this is a bad word and it's unusual. I didn't expect to see it here as opposed to it's a bad word, but I see that happen a bunch, so it's not a big deal. And that gave us a substantial boost in fitness and I think that that's exactly where we're getting it around pages that will tend to have content that shows up as potentially bad. I think that like pushing into, like so say we were going to work on catching damage in the article, the Israeli Palestine conflict, that will be much more tricky. I think even catching damage there is tricky and so we're probably not getting a lot of false positives there yet. Hi, this is a question from Ariel. You mentioned jQuery and I was wondering if the OUI used any other libraries or frameworks and kind of why and what makes that suitable? Those ones suitable for working on. So yeah, OUI uses jQuery. OJS has actually two versions, one with, one without. You have to remember that when we started this, this was 2011, there was not like, a lot of the libraries that now are kind of like, why aren't you using that new? I think jQuery has a lot of power in terms of kind of communicating with the DOM. It has a lot of things that it is missing, which is why we wanted to build something on top of it, but we're definitely using it as much as we can and to simplify our work within the library. It definitely depends on jQuery, not on jQuery UI. Different things. Does that answer the question? Yeah, I was wondering if there was anything else as well that you guys are using or why or why not? I don't think so anymore. I know that we used to have things that dealt with some language stuff, but not anymore. Like we tend to do a lot of those ourselves, especially since a lot of the other stuff tend to not give us the exact thing we need. We had something with Unicode, if I remember correctly, it was an external library that we ended up replacing with our own thing. I don't think nowadays we use anything other than jQuery externally. Awesome, thank you. Okay, so the question is, okay, so the question is, I kept saying we can't trust the DOM, so what is the alternative? The alternative is to use a data structure that keeps the data for you. I don't trust the DOM for the data itself, so obviously I need to build stuff on there, so I trust it in that level. But in order to give me the data that gives me the state of my interface or my product, I don't want to trust that. So I use other objects that are not part of the interface. Some either data objects like I showed before or a container that just, that's where the data is stored. So the UI will follow either listen to events or query the data object directly in order to get information and not query the DOM. That's right, the data objects are not part of the DOM. That's correct. This is a question for Aaron. How do you validate and coordinate specific features? So most of our language specific features are too less, bad words and informals. We also have lists of stop words, which are usually words that don't carry meaning like articles, prepositions, that sort of stuff. Just trying to think now. Oh, we also have stemmers, which usually come from the natural language toolkit library for Python, but only some languages have those. And so if the language doesn't have a stemmer, then we just don't use features that have anything to do with stemming. And so using the features that we can get from the language, then we'll use, sorry, using these data structures that we can get from a language, we'll then use the features that derive from them. We also have a few features that are not exactly language specific, but they'll be wiki specific. So for example, we don't just have like, is this edit damaging as our models? We also have like, what sort of quality level is this article? And one of the things that's really useful in figuring out how high of quality of an article is, is how its references are formatted. Wikipedians use templates to format references. And so you need somebody who's local to that wiki who knows what templates, which are probably not going to be in English, it's not English Wikipedia, are actually used for this sort of stuff. And so we'll have local collaborators from that wiki help us understand what those templates look like, and then we'll turn the presence of those templates, the count of them that appear and that sort of stuff into features. And so we sort of have a hierarchical structure for how we generate features. So we have language independent features that are actually, that's actually another level of the hierarchy because there's wiki text and there's wiki base, which is our structured wiki data, wiki stuff. Then we have language specific features and then finally the wiki specific features because there's English Wikipedia, English Wiktionary, wiki data has a lot of English in it, comments has a lot of English in it, and so it gets specific to that level too. And that seems to be working out pretty well for us. It makes it easy to add new wikis, especially new wikis, that already draw from language assets that we have. Thank you. Yeah, so why use Python instead of Lua? Lua as an example is a language that we have implemented on top of media wiki before building these things called modules. And so Python is really, it's a nice environment to do data science in and that's really what brought me to it. So I'm sort of a computer scientist that does behavioral science work that just happened to realize that building machine learning models were important. And luckily I had been doing my work in Python up to the point that I realized that I needed to build machine learning models and Python has a really, really nice library for building machine learning models called sklearn. And so this was a nice opportunity to be able to draw from a lot of work that people had already done, building libraries that have these machine learning models available. Like really the revscoring library that I was demonstrating here, and in many cases is really just a thin wrapper around models that are already made available through this sklearn library. So as far as I know, there's nothing like that in Lua. But really I think an important part of your question, like why not Lua? Why not make this inside of MediaWiki? And I think that that's a really, really good question. I think that it's really important that the technologies that we develop are under the power and control of the people who are working in the social space that the algorithms are affecting. And I don't actually have a good answer to that question, but I agree with you that it's important. It would be great if Wikipedia, who are working on Wikipedia, can directly affect the algorithms that are part of this machine learning model. Right now they'd have to learn Python, maybe 20 years from now if we do good planning, then they'll do it in the same language that they use on the Wikipedia. Any other questions? All right, yeah, thank you so much. Looks like are we, we have a little bit of a mixer after this. Okay, yeah, so this will end the Q&A section and feel free to hang out and ask a couple more questions.