 All right. Good morning, everyone. Welcome to the November edition of the Wikimedia Foundation's research showcase. I'm joined today by Jigsaw's chief scientist, Louis Dixon, who's visiting from New York City and a bunch of awesome people here in the office and remotely. So I'm gonna introduce briefly, Louis Tuck, we've been collaborating with Jigsaw for quite a while on the study of personal attacks and toxicity and negative interactions at large on top pages on Wikimedia. And some of you may remember the work that was previously presented on the showcase from last year around understanding personal attacks. We've been continuing collaborations with Jigsaw and with our team at Cornell. And we're really excited to have Lucas here to present some work that Jigsaw has been driving on their end, talking about the methodological challenges in studying the nature of online interaction on top pages. As usual, we have a live stream. You should all be following on YouTube. Our discussion is on IRC, where Jonathan Morgan is our host today. So he will be relaying questions at the end of the showcase. We have about 45 minutes presentation today. So we'll keep questions at the end of the talk. And with that, I think we're ready to start. Thank you so much. Thanks, Dario. It's a pleasure to be here. And yeah, I'd like to present to you again. I'll be talking about a bunch of broader work on a research project called Conversation AI. And I guess I'll start with the image that is on the first slide, which is a kind of, you know, slightly new topic science fiction vision of what the internet, you know, makes me think of and made me think of. And, you know, I'm not the first person to have such a vision when you've had it before. This is Fresco from Raphael from 1511. It's showing the school of Athens, which is a beautiful picture full of mathematicians, scientists, philosophers, all held together under this one roof. And there's something a little interesting about this picture, these people were never actually in the same place at the same time. It's a kind of fictional dream. And in that light, it kind of is one of the promises of the internet that we can all be, you know, together breaking through the barriers of space and time, sharing ideas. And there's a couple of things that, you know, more things that are surprising about this picture. You know, one of them you might argue, well, you know, this isn't the picture of the internet today. This is more like 20 years ago when, you know, it was full of academics and, you know, the racial diversity wasn't so high and there wasn't that many women there. So, you know, the, maybe it's not, I got a presentation of the internet. And you know, there's another really surprising thing about this picture, which is that everyone seems to be getting on really well. They're all sharing these ideas and it looks great. And, you know, if I tell you this is a portrait of, you know, the dream of conversations on the internet, you might actually, I wish to thank you for not laughing at me because, you know, maybe when you think of conversations on the internet, you think of something that looks a bit more like that. And, you know, it actually could be a little bit darker than that too. Maybe it's this, right? So the reality, unfortunately, is often that, you know, journalists are threatened into silence. Ethnic violence can be normalized, you know, promoted and even organized online. Harassment can push people into suicide. So it's a very much darker image at some end of the internet. And if we look at some of the research on this, it's quite shocking. A quarter of women are, you know, stalled sexually harassed, physically threatened online. So this is pretty terrible. And we can look at some more stats of the more recent report from 2017, which highlights something very interesting, particularly around the debate of, you know, free expression and challenges of speaking freely online, which is that after witnessing harassment of others, 27% of people, this is another Pew Internet report, a very good one. And yeah, 27% of people refrain from posting online after witnessing the harassment of others. 13% have actually stopped using an online service altogether, you know, 41% of people said that they were personally subjected to online harassment. They were the target of it. So it's a pretty prevalent problem. And in fact, it's sufficiently prevalent that many organizations give up on online discussion. And these are organizations whose, you know, reason to exist is at some level to have a global discussion. These are news organizations very often. So that seems pretty terrible as well. So, you know, we might then flip it around and ask ourselves, well, what, you know, what might we do about this? And we return to an image like this, you know, the notion of conversations being difficult to have is not new. The utopian vision of having better conversations is not new either. So people have been wanting to do this for a long time. So what is new? Well, one of the bits of technology that I'd like to talk about today in which we'll dive into a little bit more depth on is machine learning. So machine learning broadly is the, kind of, comes out of computer science and its distribution of, you know, how can we learn to identify patterns in data? And so a kind of very classic example is given millions of pictures, which say whether there's a cat in the picture or not, you can train a machine learning algorithm to slowly, you know, find the patterns in the pixels that correlate with you having a label says that this image doesn't have a cat in it. So, you know, over time, slowly the machine learning must recognize images of cats. So that's machine learning at some level. So then there's been a kind of revolution recently in what's called deep learning. And deep learning is basically the branch of machine learning that can recognize much more complicated patterns. And it's quite recent. It's come about with a wide scale prevalence of large amounts of computing, large amounts of data and some of the breakthroughs in our understanding of how to deal with it. So we might ask, you know, well, okay, so what does all this mean for conversations? So for conversations, this means, well, what might it help, might it help us? Maybe the machines can help us, you know, help journalists do a better job of curating the conversations they want to have on these topics. Maybe it can help authors understand, you know, the emotional impact of their language. Maybe we could imagine it a bit like a friend reading your email or the message before you send it saying, hey, you know, that bit is kind of quacking to come across as, you know, a little bit heavy. And, you know, maybe, maybe you're making too much of this might come across as saying, you know, this bit is too light. It's a rather serious issue. You write it like that. People may be insulted by it. Maybe machines can help us, you know, have some deeper understanding of the emotional aspects of language. And at this point, you might go, well, Lucas, you're crazy, right? In science fiction, we've seen even, you know, data from Star Trek, you know, the super intelligent Android, you know, people's science fiction's imagination of what machines might be. Even they didn't have emotions. They couldn't understand emotions. So why is it we might think that machines would be able to understand emotions? So I'm going to give you, first of all, an argument for it. Then we'll look a bit more and I'll leave you with it and then we'll look a little bit at some data. So here's an interesting argument. Animals and people at some level develop an emotional understanding before they develop the ability to solve logic problems. They seem to have an emotional ability to recognize emotions without necessarily being able to understand the meaning of language. So that's kind of interesting. Why is it then that we think that machines would develop differently? So to give a little bit more insight into that, we might ask, you know, well, okay, where did this come from? If we look back at the history of programming, it's very, very deeply related to logic. Things like lambda calculus, which I'll really encourage how it morphs and link logic to programming. And indeed programming has historically been largely a specification. This is what I want my machine to do. It's very, very specific, very, very logical. There is not, you don't see much emotions there. So that's perhaps why science fiction has gone along this route. It's been looking at what traditional programming does. But machine learning is actually very different in character. And it's much more evolutionary in character. So maybe this is why we've had this kind of a new view. So maybe actually, you know, machines will actually develop some illicit on-time emotions before they can understand the semantic aspects of language, right? Maybe that's interesting to the whole. And another counter argument that's, I think, important to bring up here is, wait, you know, we've had machines, you know, beat people at chess, beat people at go. Surely we know they're ready that they're super smart. Well, they're super smart, but they're super smart in an extremely narrow way. They can do a small number of things with amazing computational capacity. And the generality of that intelligence of that, so really hear the arguments about, you know, maybe we can have a broader understanding of emotional characteristics before we have a broader understanding of smart robots. So that's a really fun idea. And I'll not touch on it much more than now, but I'll leave you with it because it will be important later too. So, okay, so I've said something about machine learning. I've kind of, you know, I'll tease you by saying that, you know, maybe machines will be able to understand some aspects of emotionality. Then how is this going to help conversations? We've talked a little bit about it already in terms of, you know, it might help authors and so on, but you know, what does it actually take to build machine learning that might help us? So I'm now going to go through four of the challenges and I'll dive quite a lot more deeply into machine learning bias, unintended bias in particular. So the first thing you need to do when you want to build machine learning is you have to ask yourself, what exactly is it you're wanting? So I broadly wave my hands and said we want to help conversations be better, which is, you know, our broad goal of conversation AI and it's a broad effort with many people collaborating as Dario mentioned at the beginning, we're working with Cornell, with Wikimedia Foundation and many others. So but what is it we want to do? So our starting point was, well, you know, if we're going to try and have good conversations, we need there to actually be conversations. So if people are leaving the discussion because the conversation is so unpleasant, we call it emotionally toxic at some level, then we are failing. So our first reasonable starting point we thought was let's see if we can try to identify the kinds of things in language that are likely to make people do the discussion. So we call that toxicity. Can we identify toxic comments? So the next problem that we have is, you know, people don't agree on what's offensive. So, you know, maybe this endeavor is, you know, impossible from the start. And this is actually a very common problem in machine learning. You don't actually have to have golden truth. There doesn't have to be a single correct answer. What you need though is you do need that there are places where there is consensus. And in conversations, luckily that is the case. There are many things that are widely agreed to be offensive and many things that are widely agreed to not be offensive. And yes, there are things in the middle. What does that mean for machine learning? That means that the machine learning is going to have a broad range of results that basically indicative that it does not know. So, okay, so our first, you know, observation in the challenge of building machine learning is not going to have the opportunity to say, you know, I think this one's fine. Most people will think this one's fine. I think this one is problematic. And these ones I'm not sure about. So here is an example of a, you know, insulting comment, but actually, you know, it's not an example of an insulting comment. It's actually a description of a pig. It really is a very fat pig. And so context matters. Where a comment is happening is important. So this highlights the second problem that happens in machine learning broadly, in particular in these challenges, which is that you need to know where the comment is happening to know whether it's likely that you've received this toxic or not. Now, it's actually a very similar situation to the lack of agreement on a comment. If you don't have the context and you never have all the context, right? That means you're going to make errors in those situations. So if you're going to build machine learning models without much context, then you're going to be failing in the cases when context matters. And we'll come back to that at the end, because it's a really interesting question. How often does context actually matter? Can you still do something? But what this tells us in the short term is that, and in the models we've been exploring with as a starting point, is that these models have only the language that is in a comment to look at. And this means that in certain contexts, it will get it wrong. So now we know that machine learning is, you know, okay, it's gonna sometimes be confused and not know what the answer is with, especially where people don't know the answer is. It's going to be failing in situations where it has the wrong context as well. But there are still many situations when the context doesn't matter very much. So then I want to jump on to the next challenge. And this is that, while that picture of the internet as a giant garbage dump is many people's initial perception of comments on the internet, it's actually surprisingly hard to find sources of horrible comments. And it was actually a funny starting point for this work. You know, everyone said, oh, just go here or just go there or find whole loads of horrible comments. You can do that. But actually the ratio of horrible comments, you know, is typically between one and 0.1% of overall comments. The percent that almost is a paradox, right? Why do people have this perception? And like, we think they have this perception of the toxicity of comments because they, you know, you can read quite a lot of comments. And if you read one horrible comment, you know, you can be left with a pretty unpleasant feeling. The other reason is that, you know, there's, the reality is that there's many gradations. Something is not either toxic or not toxic. It's sometimes like a little bit unpleasant and you have the kind of a whole emotional texture. So, but really like toxic comments occur not very often. Now, this means a really big challenge because deep machine learning, which I mentioned earlier on, it means a lot of data in order to be able to train machine learning models. Now, what this means that if you're going to go looking at random comments, you're going to have to look at a lot of comments before you find a large subset of ones which are toxic. And so what have you done? So one of the classic solutions to this is crowdsourcing. You know, you basically have a lot of people who work together and identify, you know, it's unlikely to be as many as you would like. So now we also know that, you know, we're not going to have as many examples of comments as particularly toxic comments as we would like to train deep machine learning, but we'll try and do this anyway. So then on to the next challenge, which is that, you know, a model, a machine learning model is intended to be biased. It's been tended to be biased in the case of a model for a toxic comment. It's intended to be biased that when a comment is toxic, it gives a high score for that comment, but it's not intended to be biased for other things necessarily, right? So the problem of unintended bias is that when you train a machine learning model, very often there are correlations in the data that lead you to give high scores to other things. So in the case of, you know, Maple Leafs, if most Maple Leaf pictures are, you know, are in your bound, then you may not learn that, you know, a Maple Leaf might be green. When we get to the situation of learning about toxic comments, things get a little bit more problematic than Maple Leafs. So if you naively train a model on comments on Wikipedia, and this work is, I'll share it, there's a link a little bit later to the, where the work is, this is all on GitHub and on Wikipedia datasets. You know, if you train a model naively on this, you start seeing some funny false positives. False positives are examples that, you know, get surprisingly high scores. And it's important to understand what surprisingly high means here, right? So these scores are roughly approximating probabilities. So that means like in the range of 0.5, it's basically the model telling you it has no idea. But why does a model say that has no idea for a sentence like, you know, a Muslim is someone who follows the practices of Islam. It really doesn't seem to be any problem with that. And by far the most things get scores of less than 0.01. So it's definitely like a surprisingly high example. So I'm gonna call these false positives in the sense that they have scores higher than we would expect. So here are some examples from an A between model. So the question is, how did this happen? Why does it happen? So it happens fundamentally in the case of building models on Wikipedia for two reasons. One of them I mentioned is insufficient data. And the other one is what we call a skewed distribution. If you look at comments that contain, for example, the words on the right hand side here in this table, you see the fraction of them that are labeled as toxic. Now remember that the real fraction overall over a real distribution, in the case of the dataset we collected, which was intentionally biased to get more things out of toxic. In real distribution, it's about 1% to 0.1%. In our collected distribution is 22%. But for comments that contain the word queer or gay, we're at 70 or 67% respectively. That's a much higher percentage. If you don't have enough data, the model will end up over-generalizing. It will think, well, okay, most comments with these words in are toxic. So if a comment has this word in, then it's probably toxic. So that's the wrong inference. It's called over-generalization. And actually it's not just, of course, these are identity terms. So these are the ones that in some sense we care most about. There are other unintended biases in the machine learning models too. That picture of a pig actually illustrated one earlier, pigs get a very bad reputation on the internet and do donkeys too. But those ones we don't care as much about. So what we're doing here is we're creating a small set of words that are frequently used to refer to the people's identity and then we're looking at some of the fractions. This is giving you some indication of how it can happen. It's not the only way that unintended bias can occur in a model. It can also happen because of annotators having systematic biases or for other reasons too, leading questions and so on. But this is the one that we found was most problematic so far. So now at this point, I'm gonna dive into a little bit of a technical distinction, which is a distinction between unintended bias and fairness. And then I'm gonna dive into a little bit more of what we mean by unintended bias. So we distinguish a little bit, but what is the characteristic of a model from what's the characteristic of an application? We call unintended bias as the way of referring to a model giving scores that don't seem right, in particular to, we care about identity terms, but in more generally, it's the subset of examples. Unfairness is an impact on society of an application. And the reason this distinction is important is because you can actually have a bias model and that can be used to, in a way that increases representation of that bystand or decreases it, or which actually has no effect because of the nature of the application. So a good example of this is actually comic moderation. So if you imagine, for example, as we've seen in these models, for example, the word feminist, which the naively trained model will give unreasonably high scores to those comments that contain that word, that means that comments containing the word feminist will have unusually high scores. If you do pre-moderation, if you review your comments before you publish them, and then you publish them incrementally, this means that good comments that contain the word feminist will be published first. You will have a higher visibility for comments that contain the word feminist. If you do post-moderation, and this is post-moderation by human, sorry, if you're doing pre-moderation and you're reviewing the comments which have high toxicity first, then that happens. If, on the other hand, you automatically will stop them from all comments from being published, and then you do what's called post-moderation where you then review them, but you don't actually look at them in the same ordering, you've immediately held back all the comments that contain the word feminist, and now you review them. That means that the other comments got published. This means that the visibility of comments contain the word feminist is less, not more. If, on the other hand, you do your comment moderation in batch, you review all your comments once, and then you publish them in batch, then, of course, it makes not much difference. So this notion of application versus model, but between model is important because it means that actually it's not just a problem in the model, you have to think about the application too. You have to think about what is the actual effect gonna be of this particular application in machine learning? So that's, I think, a requirement on every application of it. And then for the people developing models, that's the space where we have to work hard on figuring out how can we get rid of unintended biases. So now let's dig into what we actually mean by a little bit more technically. This is about as technical as I'll get in this talk, I think. So the notion of AUC is a common measurement. It's actually receiver operating characteristic area under the curve, which is a long technical term, but really what it means is how good are we distinguishing in this case, good from bad examples, in class, amount of class. So AUC as a number tells you, given two examples, one that is in class and the other one's not. So for example, a toxic comment and a non-toxic comment. AUC is the probability that the model will get a higher score to the in-class example, to the toxic example. So okay, so that's AUC, and it's used very, very generally to measure model quality. It's particularly good when you have very balanced data set where accuracy would be misleading. So pinned AUC is a concept we introduced, particularly for the measurement of unintended bias. So pinned AUC says, you know, you're going to measure the AUC on a set of terms, and you're going to measure them just considering the terms, the examples that contain a particular term T. So for example, we can ask the question about the pinned AUC per term or comments that contain the word feminist. And we measure the AUC on that subset of examples paired with an equal number of other examples. So what this means is that the pinned AUC is going to give you a lower score. It's actually going to have a hard time distinguishing, a harder time distinguishing when you mix the examples about this particular identity group with other examples, if either the model gives unusually high scores or unusually low scores to comments containing that term, it will become harder to make that distinction, harder for the model to correctly distinguish the in-class from the out-of-class, the toxic from the non-coxic. So what this means is we can say, well, if you know, if the AUC scores higher than the pinned AUC, which is almost always the case empirically, then you can take AUC minus pinned AUC for that term, and that will tell you something about the bias towards that term. It's not a fair distribution for that term. So now we can ask ourselves, well, what's the unintended bias or set of identity terms where you can basically run the pinned AUC on all the sets of different terms on a balanced test set of examples containing different identity terms? And this is basically telling us, how equal are we to treating all the different identity groups the same? So this is interesting, right? So this is actually a notion that we can actually provide a metric for a model for its unintended bias with respect to identity terms. And this is a different kind of approach. Other approaches typically look at, can you change a threshold given knowledge of demographic? But the challenge with that is that in real applications or text classifiers, you typically don't know the demographic of the people involved, but this method you can apply. So you can measure the unintended bias here. So then the question is, okay, great. We found a way to measure it, but how do we make it better? Well, so to make it better, the approach we've taken is actually, let's fix the original data set. The original data set was the problem. In the original data set, we have a really unfair number of toxic examples containing each word. So let's look for examples that are not toxic or rebalance the data set. So the way we did this is actually going over, you basically need a reviewed source of comments or text where you have a very, very strong belief that this is not going to be toxic. Because we already know that the terms you're looking to try and fix are the ones which occur in reality with the most toxic comments. You can't just take top comments. So in our case, we took Wikipedia article pages, but the same approach actually works surprisingly well for other reviewed sources like, reviewed comments from your clients or texts from articles. And so then you add those to your original data set and you know that they are, you make the claim that they are not toxic. Of course, you don't have to be absolutely right here. You just have to be right by far the majority of the time. Common thing in machine learning. So of course, there's a potential issue to this, right? You know, the text in article pages is not the same as the kind of text that is in tool pages. So there's an open question for all this work. And then of course, there's another question where you have enough data to be able to do it, to balance the examples. So the answer to those is a broadly, yes, and this is a little bit of what it looks like. So this is the original fraction on the left-hand side, that table I showed earlier. The three charts here on the left-hand side, this is a chart of comment length along the x-axis to the number of comments on the y-axis. So most comments you can see are in about 10 or 15 words. And when the comment is shorter, there's actually a higher fraction that are toxic. And you can see from the overall distribution at the top there, and then you can look at the next distribution down, which is like, if you take comments that contain the word gay or transgender, for example, and you see they're vastly more red here. So those comments are much more often toxic. And actually, the skew is even stronger too. Short comments that contain the word gay or transgender are even more frequently toxic than if you just look across the thing. This means that comment length is really important. So on the right-hand side is an example of what you can fix the distribution. You can pour in additional comments from in this case it was from New York Times and Wikipedia to rebalance the distribution, and then you can retrain your model. So the good news is that this actually does work. And the even better news is that not only does it work, it also, and when I say it work, it doesn't solve the problem completely. It mitigates it, it makes it a lot better. So this is a practical method to do something that makes the models a lot better. What's especially good about this is it also seems to not damage the overall quality of the model. So that's a pretty nice result. So, but it's only really the beginning of our work on text classification here, right? So there's lots of open questions. One of them is, or where do you get your, where do you get your test set from? You know, I mentioned that we're using pin there you see effective test set. And we're actually doing this with respect to templated set. So we made up a bunch of, you know, sentences which are templates, which we swap in different identity and so on. And we make the sentence template something that is going to be the toxic or non-toxic. And that provides us with a test set that can be balanced between all the identity terms and so on. But it's not necessarily very representative of real toxic examples. Another question, of course, you know, we're here, we're doing a sum of differences. Normally when you do some of the differences you swear the errors. And, you know, we could do that too. It's a very beginning stages of this research and it's very unclear what the right metrics are. And then of course we know that this doesn't make the model perfect. And in fact, it's actually important to dispel that myth right at the outset. It looks to me like, you know, models will not be perfect. That's, we're going to continue. That's why I was emphasizing like looking applications. We are going to have to continue looking hard at the applications of machine learning as well as, you know, other algorithmic methods as well and their impact on inclusivity or quite a long time still. And models will certainly be giving different scores with different identity types for a long time also. And the real goal is to make those within a small margin that, you know, doesn't actually have negative societal impact. So this work, you can find the GitHub link down there at the bottom. And we very much welcome collaboration contributions. We want to see more data sets and solutions to these problems. So, okay, so I've shown a bunch of challenges with building machine learning. At this point, we might be worried about whether it can be any use at all. So let's actually have a little bit more of a detailed look at what it looks like in practice. We've built classifiers for toxicity. We've built APIs that allow people to play with them. We've got open source machine learning models that you can build yourself. What does it look like? So I'm going to show you a bunch of examples that come from Wikipedia talk pages. And which is one of my favorite places. And in particular, this is the first few comments and that you can see this is actually a blog post that we published a little while ago. We took a day, this was the 4th of September, 11,000 talk page comments on Wikipedia happened that day. And we can just look through them and you can imagine trying to go through these comments and find the examples that are toxic to try and revert them and try and help Wikipedia be a little bit less horrible. And it'll take a long time because there's a lot of comments there and there's not really very many that are toxic. And so the next slide I'm going to show does contain toxic comments. So if you are not interested in seeing horrible comments, I would advise you to wait until I've gone past the next slide. Look away, turn off your screen, we just went into the background, whatever you care to do. If you're ready for some horrible comments, this is the same set of comments sorted by the toxicity score. And on any given day, there are about between probably 20 and 100 comments that people would consider toxic. So really not very many, but they all come at the top when you sort them by toxicity. So you can go and look at lots more of them inside that blog post if you care to. But what this seems to say is actually, there's something interesting going on here. We are actually picking out something about comments that are being particularly unpleasant here. So maybe there is actually some price on the internet. So if you were looking away, you can return to your screen now, I've gone past that slide. So maybe there is some promise here that we can do. So what I'm gonna go over next is a bunch of different applications of things we're experimenting with building and trying to understand can we make better conversations on the internet? So this next screen is an illustration of what we did in New York Times. This is a comment review tool that they use. So this is selecting different machine learning models. You can get a histogram of the scores of a model. You can then select which comments you want to look at. You can, for example, in this case, select the most likely to reject one. It shows involved the reason, the bit of text that is most likely to reject it for. You can quite quickly scan through them and go, yeah, we shouldn't reject those ones. Those ones are okay. You can reject the rest. So this is basically using machine learning to change the user experience of a process that was already happening. New York Times has for a long time had a pre-moderation. All comments are reviewed by professional journalists and then they decide whether they get published. They view comments more like letters to the editor. They're looking for, if you just say great article, your comment will be rejected. It'll be rejected because it's not saying very much. They require all comments to have some substance to them. They have to be making a point. They have to be on the topic of the article. They must not be an attack on another commenter. They have quite a high standard for this. So what this has done is this has enabled them to turn on comments on all articles from the front page of the New York Times. Previously it was on 10% of articles that they were able to turn on comments. So this is great. We're able to make some progress maintaining the same standards in curating a better conversation. So maybe there's a little bit of hope here for actually having better conversations on the internet. So the next example is data visualizations. That was basically looking at how we curate conversations. But we can also help ourselves understand conversations better. So on the right-hand side here it's a chart that we, from a paper that Dario and I worked on last year with Ellery and Nitham. And this is showing the percentage of attacking comments by the editor activity level. And this shows that editors who have done more than a hundred contributions are responsible for 30% of personal attacks on Wikipedia according to that measurement device. So that's interesting. We're telling us something at a broad scale about large data set. On the left-hand side is a visualization from Wyatt and Disgust looking at comments on Disgust. And this is fun. This is showing us the time of day with respect to the toxicity of the comment. So the length of the bar is, how many comments they receive at that time of day. The darkness of the bar is related to the fraction of the toxic. So you can see that, late at night, particularly in the late early hours of the morning, the toxicity level rises much, much higher, up to 11% of comments that they were seeing contributed they considered to be toxic. So when I say they considered to be toxic, this is at a threshold of 0.9 on the classifier to be very precise. So this is interesting, right? This is saying that late at night, maybe you want to hold off sending that angry comments about how wrong the internet is even if it's true. Or you may want to review it in the morning when you're in a slightly lighter mood. So this is telling us something at large scale about the nature of comments. So the next visualization is another thing we're working on with media, which is a visualization of comments on Wikipedia and when they're reverted. So this is comments in October. This is talk page contributions, which interpreted as comments. And of the 1000 or so that were toxic, about half of those were reverted. So the gray dots are once which are reverted, the red dots are once which are toxic and not yet reverted. This is very much work in progress. So the idea is, can we develop some understanding at scale of the nature of this? Can we think about ways to incentivize people to go in and revert comments that are toxic or like jump in and become a bit more of a diplomat in conversation. So, and there's the link to the get out of that code as well. So this is another idea. So this is that's a large scale viewing, but we may also be able to put power into the hands of the person looking at the conversation. Machine learning models also allow us to think about what is the user experience of someone who's reviewing conversation. This is an example where you get a slider where you can choose what level of toxicity you wish to view. If you're interested in looking at the insults today, you can drag it to the right hand side and view comments that are much more aggressive and insulting likely to be perceived as toxic. If you're like not in the mood for that today, you could choose to view the other comments. Machine learning lets us have the different kinds of user experience that we look at comments. It also lets us think differently about authorship. Maybe as we're typing, this is back to the original, like one of the original dreams, maybe you can tell us something about what we're typing and give us some feedback, give a moment of reflection. So this is a demo of an authorship experience. You can find the open source code there. You can find the demo at perspectiveapi.com. It's an exciting experiment for how can we give feedback and what people think of it. It also incidentally functions as a fascinating point where people send in feedback. People love showing how a robot is run. They love putting in examples and they tweet about it. And you know, every time you click about it, it actually ends up providing a bunch more examples. And this has actually been a very big contribution to improving bias in these models too. So if you're inclined to play and prove a robot wrong, we'd be delighted for you to go there and put in examples and show it wrong. And you can send us feedback with that same drawing. So this is going back to context. So I'm gonna talk briefly about the next question, context. So I mentioned earlier on that context is important. So some of the recent work that we've been working on with Cornell and with media is how can we learn more about that aspect of context? How can we understand where conversation happens? So in particular, we're looking at conversational context. And we're interested in, you know, how important is it? We've got a lot of experience now with crowdsourcing jobs and asking people if the comment is toxic or not. But how much does the rest of that conversation matter? How do we measure it? So the first step we have to do is actually reconstruct the conversations. So for that, you know, we have to look at talk pages. Talk pages are a series of revisions, right? They're not, and on each talk page, it's, you know, a different time structure in different ad hoc ways. So a very, very common thing that people do is write a parser for a talk page to try and interpret it as a conversation. So we went back to this problem and have a look at it again. So there's two broad ways that we know of to do this. One of them is to look at the snapshot. So a snapshot is you take a particular page at a particular time and you use the inlantic structure of that page to try and interpret who said what when. But there's some problems with this. People, you know, sometimes forget to add their name on at the end, you know, so we don't always have the authorship. We don't actually have the history of actions. It turns out that people do quite, you know, modify each other's comments. Sometimes they fix a typo for each other, that kind of thing. The collaborative nature of Wikipedia means that we're missing a lot of information. If a comment is toxic, it gets removed. So then you don't see it. So then you don't really understand the evolution of that conversation. So another alternative, which has higher fidelity, but it's much harder to do, is to look at the diff history. Every single region can pair the differences between the pages. Say, well, what actually happened here? So this is the approach we're taking. Do we have a tool for you to look at? Nice. So this is, like looking forward to looking at it. So this is what we did start with. This is an interpretation based on the diffs and it reconstructs the conversation. We use this then for doing crowdsourcing and asking this kind of question, you know, given this context, is this comment toxic or not? So you can see that there's still some colons, there's bits of improvements to do on markup. But, you know, we're actually starting to make some progress in this, which is kind of fun because it opens up a whole bunch of new questions. It says, you know, now we can ask the question of how important is that conversational context to the human judgment of toxicity? You know, in theory it's critical because we can always look at an example that we can make up a context where, you know, a toxic comment is no longer toxic, for example, it's being quoted or so on. But in practice, it doesn't seem to be so relevant. People seem to agree quite a lot, even without the context. So, but now we can actually think about modifying it. We can also ask questions like, are there early indicators of conversations becoming toxic? We can ask the question of, you know, are there people who seem to be able to not have toxic conversations? What does it they do? Can we learn from what they do? Right, can we, you know, this is that opens up new branches for you at research here too. And of course, can we then design best ways to actually help conversations in the media? So, this is ongoing work with... Sorry to interrupt, because we're having some AV issues. Folks on IRC are reporting that the YouTube screen has stopped. Oh. Oh, no. How's that? Right, are you on the call? It should be, yeah. Okay, I'm just getting, I'm getting a ping saying that we're back online. So, I'm going to disappear again. All right, thanks for bearing with it and the little pause there for those of you who maybe it was working for. So, I was just saying that, you know, we can ask lots of different questions and, you know, we're working on ways to explore how we can make it better. And this is joint work with the community research and Cornell Tech, and particularly Christine Nescu, each Hwa, Dario, and everything. So, I think at this point we can go back to this question of the promise of the internet. And I think I'll leave you just with a kind of summary of a little bit of journey and a couple of links. So, you know, it seems sometimes that conversation, you know, how can we have good conversations on the internet seems like a wicked problem. It's hard to even define what it means, but we can make some progress. We can start to identify, we can do crowdsourcing. We can find the areas where people do agree or, you know, on what kinds of things are constructed for a conversation. There are many areas where people don't agree as well. And maybe that's an interesting thing to highlight to people. We can build demos of authorship experiences to help people write comments better. We can build, you know, new kinds of UX that help people curate a conversation or jump into a conversation. We can build new kinds of UX that let people, you know, view a conversation or even analyze conversations at scale or what's actually happening in the space. Machine learning is writing some new ways to think about this. So, delighted to work with people who are interested. That is a link to conversation AI GitHub IO, which is the research page. If you don't want to build your own models, you can find code there to build your own models. But if you don't want to build your own models, we have an API that makes it really easy to use models. That's prospectiveapi.com. And there's a couple of links to the Wikipedia research projects that I've been involved with as part of this. And at that point, I think we'll stop and open up for questions. All right, fantastic. Thank you, Lucas, for this overwhelming overview of research for our team. So, I'd like to open this up first to IRC. And then maybe we can open it to the room as well, Jonathan. Any question from the channel, Walter Lay? Yeah, we have two questions on IRC. The first one from C. Scott Ananian. C. Scott says, it seems a bit fragile to have a hand curated list of identity terms. It seems like you'd be vulnerable to someone discovering or abusing a new slur, i.e., what happens when milk or curing machines suddenly becomes a classifier. I'd be interested in seeing the single word pinned AUC probabilities. I don't know what that means for the entire vocabulary to see if the top things are all epithets, as expected, or if a new unknown identity term, or if there are new unknown identity terms which are being mislabeled. That's actually, that's a very smart map. And we've been experimenting with that too. We tried, for example, initially looking at a fraction, a toxicity fraction, and we also looked at AUC scores. And indeed, what you find is that you get, you find some things in that list when you sort it, but you don't find as much as you would like. In fact, I think one of the biggest challenges is how we identify identity terms here. So our approach to, and I strongly agree, with respect to slurs in particular, if they're used in a negative way, then that tends to come out in the data. But the problem is there's lots of ways to refer to identity that maybe are not inherently negative and which we don't pick up, but which occur frequently in toxic comments. And that's the point when your model ends up getting an unintended bias. And there I think the answer is really that we need to, we need to be able to have a wide perspective of what makes an identity term helpful. And for that, our current approach has been to actually crowdsource that question itself. We've been building jobs and actually happy to share data as well. You know, what are examples where someone says there's an identity term inside this piece of text? We've also been working on models for, called attention models or rationales, which tell you which part of a piece of text is contributing to the decision. So by combining attention and rationale models with crowdsource data on referent identity terms, I think we can actually automatically infer the way people are referring to identity. And I think that's a much more robust approach to digging into the notion of identity than as you observed, kind of hand curated lists, which are the bit that leave me feeling most uncomfortable with that approach at the moment. Thank you. And one other question from Diego César Trampere. I'd like to know which representations of text are they using for their analysis? Bag of words, embeddings, and then further context from Diego slightly down the page. Will their system be sensitive to variations of keywords such as FUC period K, U, or something like this? This is something very unusual to see in online forums. Or sorry, very usual to see in online forums. Yeah, so the usual ones actually get picked up quite easily. But the reason why is actually the answer to the first question. So the models that you'll find that we built are deep CNNs, deep convolutional neural networks, based on top of words, tokens. We've also been experimenting with recurrent neural networks and the tension models. We'll find those in our e-hub shortly as well. And then we've also been experimenting with character level models. So character level models are best at picking up these kind of creative misspellings. But it turns out that the most common creative misspellings actually get picked up by your deep convolutional neural networks and recurrent neural networks because the vocabulary is based on words that frequently occur. And because they frequently occur, they then get included in vocabulary. Excellent. Thank you. One more question so far from Aparagos. Were you folks blindsided by these reports? And they link to an article on vice.com about the identity term issue. And then if so, how can researchers and developers avoid similar, didn't see this mistakes in future? If not, what happened there? That's a great question. So we actually, we knew that there would be unintended biases in the model at the beginning. That's quite commonly understood. And so we wrote about it before we launched. We wrote about it on the Wikidetox page. And you can read a bit about that. So at that level, we weren't blindsided. However, what we were blindsided by is that everyone we spoke to thought that the most likely place for there to be bias would be in the annotator's judgments. And so we spent quite a bit of energy looking at the annotator judgments themselves. And we did actually find very much there. So what did blindside us is that the actual problem was in the raw distribution data itself. So that was actually a surprise initially. And in terms of ways to mitigate it, that's actually what the work in some sense we're presenting here is that for text classifiers, there's different kinds of biases people care about. People didn't care much about and didn't write much about the bias towards pigs and donkeys that our models have. But they did, and rightly so, I think, right? Because at some level it's the human impact that has the real worry here. If you build an application on top and you choose a threshold where the bias is an unintended bias manifest as unfair treatment of people, that's a real problem. So those things I think are really important to identify. And that's where sets of identity terms, mechanisms for even measuring bias, which didn't exist that we knew of before this work for a text classifier. So at some level here, unfortunately, the research area is very young. But you can now develop tests for it. So if you have a text classifier and you're worried about identity terms, you can use a pin there you see as a metric to measure, well, how does your model differ between different identity groups? And I would welcome people having a look and trying that out. There are other ways to do it too. You can also, if you actually know a specific threshold that you're running at, you can look at error rates with respect to sets of comments that contain different terms. And by looking at the confusion matrix, you can also understand something about where your model is biased at that threshold. The challenge which has been there previously is that most of the methods were threshold dependent. And typically the solution was threshold dependent too, where you modify your threshold so that you treat different groups equally in terms of the error ratios. But the problem with that is that usually needed knowledge is the demographic group. So I think some of the advancement here makes it a little bit easier to develop these tests for models. The other thing that I think is important to say is that for machine learning in particular, you will always be able to cherry pick examples and make models look bad. And at some level that's a good thing, right? That's the mechanism by which you help uncover a broader problem. But the realities that the schools are, for quite a while, probably, maybe for a long, long time, going to be different between different identity groups. What we have to do to really understand the impact of that is understand what thresholds have been used in what application and what's the impact on society. And that's a really important thing. And I think that's also going to be something that requires a much more interdisciplinary approach. In terms of measuring unintended bias in a model, you can literally do these computations, you can measure it, that's great. But we don't know what impact that has in society. A small bias in one application can have a very big impact on society, perhaps if it's in policing or if it's in judicial systems. If it's about removing search suggestions or giving hints back to people as they're typing, the impact is likely to be much less. So we have to really look carefully at what the impact is. Maybe I'm wrong, maybe the impact is very high in those areas too. But without actually studying that problem, it's really hard to know. But those are the methods that I think are helping us get there. Awesome, thanks. We have another one. Do we have time for another one, Dario? We do, yeah, I think it gives time, at least until 12.30, usually we go until 12.45. Yeah, we definitely have time. Awesome. So this is from Adam White. Adam says, we, the WMF team that writes ORS, which is our objective revision evaluation service. I think there's a lot of potential in getting direct feedback from humans about false positives. This doesn't seem to be fleshed out in your work yet. It was mentioned as something that happens at Hock. What are the obstacles to systematically asking for more of this type of feedback? I may have mis-presented it then. We, in the authorship code and in the moderation tools that we build, we always build in feedback as a class. In all the collaborations we do, we also try to get feedback as a class. Adversarial examples more generally are extremely useful. They've been the other main mechanism beyond this bias mitigation work or tackling unintended bias as well. So getting feedback is absolutely essential. And we do it in a few ways. One of the ways we do it is directly in the user experiences we develop and the prototypes we build for that. And the other way is directly crowdsourcing. So we put a lot of work into going, how do we do it? So in fact, we actually have a Wiki labels project, which doesn't have a lot of contributions to, but this was an idea of like, different communities have different standards and different goals. It's really important to have feedback from the community. So in our API, you'll also see that there's basically two methods. One of them is get a score for a piece of text according to a model. And I should also add that there's 11 different models. We're not just, we definitely just built models for toxicity with the models or unsubstantial attack on a commenter and a bunch of other things. So the, we need to, we get feedback. That's the analyzed comment. The other main method that we have is suggest a score. So it's the two things that the API does is get scores and get feedback. I consider getting feedback to be absolutely essential. So yeah, thanks for highlighting that. If I may just expand on this briefly because I'm familiar with this to relay on behalf of Adam and Aaron. I think part of what this project is pointing at is the fact that if you look at the distribution of effort between building models and obtaining high quality label data versus designing not just mechanism for feedback, but also a UX and a labeling schema for feedback that makes sense. It sounded as like a blind spot at the moment. In recent, I'm sure I'm not like as familiar with literature as Adam and Aaron are, but it sounds like this is not in a moment like a big focus on a list for each but how to collect structured feedback. And you can see most of this, most of the UX you've come across on the web for reporting plus positives. It seems fairly informal and structured. So I think the question of figuring out how to best design something that's probably as critical as the system by which the original label data is created is something that probably will be more research on. Yes, I strongly agree with that. There's a couple of good examples though that are worth looking at. There was a nice paper in ACL last year looking at comparing crowdsourcing. And I think the same thing applies for feedback where you give a absolute metric, like a five point scale or seven point scale of something, how toxic is it, for example, versus given two examples, saying which is worse or which is better. They showed a very nice method for doing comparative measurement and how that actually solves some of the problems for feedback because different people have different standards but they often actually give the same ordering of comments. Those differences in standards about what they consider to be very toxic or just toxic, those things end up coming out noise in the overall distribution if you don't do a comparative method. So that's a really nice piece of work. There's also a nice work by, and it's not in the academic literature as much but I think it's still very interesting to see by the organization called Civil Comments who had a very neat idea which was and they've refined it a good deal which is every time you write a comment you have to first review three other comments. So it's a piece of this kind of UX space which is rethinking the way we do comment review and how that leads into a moderation process. But for the broader question, this is, yeah, I totally agree. This is a huge open space. It's very unclear how we get that feedback as it dropped down. Menus are effective. Do you want to slide? You know, when you have a mobile device you could be sliding left right or up down. There's a lot of different UXs that we can build. There's a lot of different taxonomies we can build. Maybe we should be doing it by a free form text response but maybe we should be doing it by a taxonomic response. You know, what levels of sales and so on. Loads and loads of great open questions and that's a really great point. Awesome, thank you. Those are the only questions we have in IRC at this point. I have a question so I'll add myself to the queue. I have a question I want to first ask the room. Is there anything else you want to ask? Okay, so Jonathan you can go next. Sweet. So I wanted to ask, this is kind of a, this is maybe another research. So like research directions question where you have a more philosophical question. So you've defined toxicity as comments that are likely to make somebody leave a conversation. And from the examples you've provided both in the presentation and on the perspective API website and then from some of the auditing that people have done around the identity terms and also some auditing that I've done myself. It looks as though, at least in the case of things that are considered to be probably toxic. What is the kinds of things that are captured are, you know, use of particular identity terms or other keywords or other indications of something that looks like a direct insult. That strikes me as kind of a subset of what I would consider to be the kind of behavior that would cause somebody to leave a conversation. So just for an example, I went to the perspective API website and I typed in into your little tester, I don't care what you think. And that got an estimated toxicity score of 0.25, unlikely to be perceived as toxic. So this just has me thinking of all the other kinds of ways people are toxic in conversations that don't necessarily fit your model's parameters or your training data. And so for one example is sarcasm, right? Which last time I checked was very, very hard to model. But there's a wide variety of potentially toxic behaviors or at least if you're defining toxic as something that's gonna make somebody leave a conversation. And I'm wondering what future directions your team has for, I mean, I guess the way I think of it is expanding your definition of toxicity or expanding the types of behaviors that can be detected and labeled. Yeah, that's a great question and great area of exploration. So I have some, lots to say about it. I think it's important not just to look at problems in conversation, but also to look at the good things in conversation, that's super interesting also. And different people want different things in conversations whether it's personalizing those other things. With the fact of toxicity itself the bad news I think is that, I think the questionnaire that we initially developed has quite a high bar. That people will, toxic as a concept seems so bad that people don't want to call something toxic unless it's quite clearly so. We did a little bit of work breaking it down into subtypes, which gets a little bit of your question. So we did a crowdsourcing job where we break down toxicity into insults that you mentioned just generally have seen language threats which you didn't mention which actually are important class as well and identity-based hate which I think you mentioned as well. So that covered about 95% of comments that people could be toxic. Now, the notion of toxicity itself came up as an observation that when we were doing crowdsourcing we were finding that we were getting more in terms of taking agreement on this bigger notion than when we talked about personal attacks. So it seems that people were able to identify this kind of emotional response more consistently and they were able to identify whether it's exactly what the reason for it is. So that's an interesting thing which also makes kind of an open top challenge for this. The other thing that I think is particularly interesting in your example you showed, which had a score of 0.25 I think it was, I don't care what you think. It's a great example. Nearly all comments have a score of less than point like one or 0.01 even, right? So the interesting thing we found is that low scores, which are like in the 0.25 range it's true that most people would not say they are toxic perhaps because of the word toxic is so strong but they do seem to be indicative that there's something wrong in the conversation. And if you go to the blog post for example and scroll down and you look at comments in that range you can get a pretty bad feeling from all the way down to like 0.1 or so. So it seems that the models are picking up some forms of language that co-occur with real toxicity. And some said this is an unintended bias. In this case it's a positive unintended bias but positive in the sense that it seems to be meaningful and useful and make applications better. So it seems that the models are picking up some aspects of passive aggressive language or what the right terminology here is a big open question. And it's going to be for me a big focus of stuff I look at over the next probably few years. I think how we get at those more passive areas of toxic contributions is really interesting. So I generally have been thinking about it and calling it passive toxicity because passive aggression is a bit too specific. But what exactly it is I think is a big question and it consists of growth of a generalization. It consists of passive aggression. It consists of making a claim about someone else's emotion. You know I've looked at a bunch of examples. So I'm starting to try and develop a taxonomy of that but I'd be delighted to talk more about that too. Hi, I'm Christopher James. Yeah, I'm just interested. It looks like most of your judgment about whether something is toxic feeding in is coming from a manual review. And somebody I assume basically saying from the tools I've seen and played around with like toxic, not toxic. How much have you thought about playing around and trying to sort of analyze and basically see when somebody actually leaves? Since given the definition at least anecdotally doing this clearly, there doesn't seem to be a big difference between when somebody says they would leave and when they would actually just never come back or leave, you've thought about that. Yeah, it's a great question. So right now we're using what people think will make people leave as a proxy but what will actually make people leave. And it's basically really hard and noisy to tell when someone's leaving. You have silences of maybe they went on holiday or whatever, there's many, many reasons why someone doesn't like to comment. People don't write comments with very regular periodicity anyway. So why does someone stop partaking in this discussion? That's the main reason why we haven't done that. But yeah, we're very, very interested. I think there's other branches of UX research in this space which are super interesting too. Like I think when we have an authorship experience if you have it on the website, the Coral Project recently put it into their talk extension which is, and so there they also, we also have capacity types of feedback. When you have something like that where people are really typing into a thing and getting feedback, you have the capacity to do new kinds of UX research. Which is, you wrote this thing, you edited it, why did you change it? Did this, if you think about the impact that will have on other people, why do you do it? You just read this comment, you type what you said, and you didn't click submit. How do you feel about this conversation? There's a huge space here where we're going to have to dig into what we're going to do. There's a huge space here where we're going to have to dig into more qualitative methods as well as the quantitative ones. And quantitative ones are particularly hard when people actually need the discussion. Cool, I have two high level questions. The first one is about cross language coverage. So all of the examples you presented are in English and we have 300 languages in Wikipedia. So the question is, how do you see this as a scaling to become useful and extensible across multiple languages? It's going to be like a forklift thing. We just think you're a training model or you're expecting any complexity there. And the second question is, actually a subset of the first question is also about how to deal with people are using, people are like non native speakers of that language who may end up being unduly targeted by a model depending on their, on the purpose of these non native speakers in those spaces. And the second question is around the distinction between toxicity and disagreement, something we've discussed. So don't want to preempt the topic of the next presentation but maybe you'd say a few words about how you see the distinction between genuine disagreement and toxicity. Even disagreement is one of the edges that are behind Wikipedia. Yeah, great question. So I'll start with the one about supporting languages. So we really wanted to understand like is toxicity a useful concept? Can we build models with it? Can you build user experiences that make it useful? I think about this point now, we are reasonably confident that this can be useful. We understand something more about the challenges. We have a methodology which is to get the questions. We have translate the questionnaire and then run on other languages of Wikipedia. We could certainly do carry out the same methodology to create a course of toxic conversations and that's something I very much like to do. The other thing that is very useful in this space is interactions with media organizations. They typically curate and collect conversations on their platform and they often have some form of policy guidelines where they remove comments which falls short of that policy guideline. And so I think working with media organizations in different countries will allow us also to create additional bootstrapping data sets. Once you have a data set, you can start doing things like proximity to n-grams of existing data sets. There are definitely hard challenges in different languages. And we know that already from the history of NLP, some languages don't have full stops in the same sense. Some languages don't break up words in the same ways. So it can be, even at the very, very bottom level, it's very different. The deep learning methods, however, seem to be suggesting that, and in particular, some of the advances that have happened recently in translation, for example, seem to be suggesting that actually you can, if you have a reasonable data set in bootstrap and initial model of standard language models, you can actually make a lot more progress on different languages than you might think. But we still need to have that label data set in this space of toxicity in particular, but of all the other models we might want as well, whether it's comments should be substantial or whether it's comments should or should not contain personal anecdotes, et cetera. So that's our languages. The other question you asked was about disagreement. I think it was a bit much. Yeah, so there was a sub-question about languages development, obviously. How do we, do we have a problem issue where like the question was like about non-native speakers? Oh yeah, that's right. So we know that good examples of English, Wikipedia or major language editions have a central fraction of non-native speakers. Well, yeah. What we use there is that a substantial fraction of people who work on crowdsourcing platforms are also non-native speakers. So at some level, the judgment's been done by crowdsourcing platforms probably mirror the contributors to Wikipedia more closely than what I might expect. So of course, then it depends on the comments that you are analyzing as well. It doesn't just depend on the annotators. The comments we're taking are ones from Wikipedia. So if they're being written by non-native speakers, then the dataset will mirror it. Generally, we've actually found that the source of the data and the language, whether it's a native or not, is actually much less relevant than we initially expected. We found that the models don't pay as much attention as you might think to things being correctly grammatically formed or whatever. They don't care that much. Grammar is not that relevant. So, and words which have close enough meaning according to word embeddings or whatever, and the resulting backprop through the words that happens after word embeddings. Those things mean that the English is the second language thing. It's actually not worrying me as much as it did initially. The other question then was about... Toxicity and disagreement. Toxicity and disagreement. That's a great question. So what we're trying to do is have good conversations. That means having disagreement in good ways. So I think the way to make progress there is actually to also do crowdsourcing on. When are people expressing disagreement and when is it being toxicity and when is it not? When is a disagreement self-toxic? When is it not? Are there really different concepts? Anecdotally, we seem to have struck, me and my team have pretty strong beliefs that there's significant differences, but the way to make progress on that is by crowdsourcing annotations, by asking this question, and by looking at how these things correlate and what the implications are. It's important to find out how we have good disagreement. Especially at the moment in the political situation in the world, there seems to be increasing people living in filter bubbles because of the difficulty of disagreeing. At some level, our very, very goal in conversation, AI and this research effort, is how to let people effectively discuss things where they disagree, because that's the place where it's most difficult and also most important. So, yes. All right, thank you, Lucas. Are there any other questions from the channel? Jonathan? I have one more from Adam White if we have time. Okay, we do have a few more minutes. So, Adam says, to follow up on my question, the question that I had Jonathan asked, is it a problem if we give authors an ML tool to detect how their toxicity would be perceived ahead of time? Or is that gaming aspect offset by the huge impact on medium good faith authors getting pre-moderation correction? So I guess it's an interesting question. Does giving someone an incorrect response, what effect does it have on them? That's the fundamental question, right? Like, if we're going to use, and it's important to distinguish, you know, I guess there's a few different questions. I'm not totally sure whether it's asking about the effect of authorship on people or the effect in a pre-moderation setting. So I think that if I can interpret this, that Adam is asking, if you let people know, I think he's thinking of like a UI intervention here where you're letting people know before they post a comment that their comment might be perceived as toxic. Is that a problem? Or is the potential for gaming there? Is that offset by the potential positive impact on, say, good faith people who just didn't realize that they were coming off badly? Yeah, I see that's a great question. So gaming more generally is a very often question, right? So my view on this is that we should be inviting people to game us and we should be developing the methodologies and sorts of do that, deal with it, because, you know, it happens. Right at the very beginning when we launched the Perspective API, we noticed that there was a large number of people from certain Reddit channels and the 4chan who thought they should game us and they sent us tons of contributions. And it was really helpful because they, you know, we ignored their suggested scores and but we used the examples. So we sent them off to CAD workers and we got back better responses. And those examples were really good examples. Actually, they were ones that people would try to game us with. So I think that, you know, we definitely shouldn't be trusting the scores that happen directly. There is definitely, you know, if someone is dedicated to getting, you know, writing something that's bad, they will be able to ignore the authorship feedback and it will actually help them get around the system. However, what we've seen from the research, both from our own research, Christine Dinescu's and from Riot Games, is that actually by far the majority of toxic comments are from people who are not trying to game you. By far the majority of them are seem to be from people having a bad day. It's called bad day hypothesis. Maybe it is that people are having a bad day so there's something about this topic that aggravated them or the way someone has written something. And that's what has led them to write this toxic comment. If that's the case, then gaming is, and the research all seems to be pointing in this direction right now. So then that suggests that really we should welcome gaming and we should be working on, you know, really focusing on the good faith people, which is a surprise for many. Nice, thank you. That's all we have from IRC. All right, thank you, Jonathan. I think we're almost at time. So I want to thank you, Lucas, for joining us today. It was a fantastic conversation. Thanks to everyone who joined us, both here and on the IRC channel and I guess on YouTube. I hope to have you and Chris and our collaborators sometime in the coming months for an update on the working progress we're doing. So we're really looking forward to that. And as a reminder, our next showcase will be on the 13th of December. So with that, I think we're good and look forward to seeing you all in a month from now. Bye everybody.