 Hello everyone, thanks for being here today. The first presentation is from Vaclav Natravil, toxicity monitoring and assurance of community health. Thank you, thank you so much. If you were just few minutes ago, if you were down on the panel discussion, the word community and toxicity was mentioned during this talk. And this is something what I think it's interesting for many open source projects. But let's start from the beginning. A little bit about myself, my name is Vaclav, and I work as a senior marketing engineer at IX Systems. Sometimes I like to add that I am a spec of engineer, which is a combination of DevOps, data ops, MLOps, and some other ops as well, so simple spec up. But that's enough about me. Let's find out also something about you. This is the time for the pop-up quiz, but I assure you it will be easy. I want to ask you, do you use any open source project? Please raise your hand if you do. Very nice, very nice. To be honest, I expected the answer. And second answer, it's the second question. Did you ever find the need for documentation? To solve a problem, you know, you ran into an issue, you were using Google or other search engine. Do you find out the solution? Did that need ever happen to you? Please raise your hand if it does. Yeah, we have some superheroes here, but most of the audience also had the same experience as I do. And the last and I would say most important question, did you ever discouraged by, you know, you found a forum full of snarky comments, personal attacks, or just, you know, the community seemed hostile, so you abandoned the project. Did this ever happen to you? Yeah, we have some people who experienced this. And this is something that was a problem for us as well. And it took us a long time to realize that, you know, there is actual issue which is hurting our project. But let's start from the beginning. As I said, I exist as a company which is behind Trunas. Don't worry, I read the guidelines and I know that you don't like advertisement here, so I'm definitely going to tell you that Trunas is the most popular and arguably the best open source storage system. So never mind the lab. So it was a company founded more than 20 years ago. We were always focused on open source and open source solutions. And because of the bootstrap mentality, until recently we were called the, we like to call ourselves the oldest startup in the valley. And our approach to community was always very liberal. Expecting that the community is self-organizing. The community is self-governing. The community grows organically. And essentially we don't have to do much to manage it. But, same as you cannot accept a poor request which is someone sends your way without looking into it, also community needs to be managed because if it's not, it can very easily turn into ugly community swamp where no one really wants to interact with anyone else. Lots of hostility is involved. And there is really, it's very discouraging for people coming in. So we finally realized this and went into managing the community. And this created an opportunity for me to automate some things and make some things better. How to make things better? It was the question we give ourselves and there were few iterations. We were thinking, okay, what if we start doing dictionary filtering? Don't let people use bad words on the forums. We consider this and it turned out that actually this might not be the best approach because if you have dictionary filtering, people are just going to create a new speak. And if you say, I will kill you, it's same as, if you say I will analyze you, it's same toxic, it's same bad and it creates the animosity as well. But it also adds this new speak also adds another layer of bizarre. So when a new person comes to the forums and will face this, they will be, what's going on here? This is not for me and they will leave. So this wasn't the way we decided to go. The another one where we already started with natural language processing was the quite often used task of the sentiment analysis. But as you can see from the word cloud here, we have words like problem, issue, among those other better words. So in community, people are often coming with a problem. They are reporting bugs, something is not working. So the sentiment of their messages might not be always positive, but it doesn't hurt the community. Actually those are the messages we really want to hear. So this wasn't also the right way. The third and finally good way, the good path we walked on was to do a toxicity analysis of comments and posts to see if the comments people are adding are hostile or vulgar. Involve threats and similar types of toxic content. And we come up with a pretty concise solution for this type of task. We created a toxicity model which was read in a REST API. When I say we didn't create it, we use a TensorFlow toxicity model and we plug the things together to work. And then we edit monitors for certain social media like Twitter, subreddit, and our community forums. While doing this new message comes, for example on Twitter, a new message where the word Trunas is mentioned will come through the Twitter monitor. We will ask the toxicity model, is this message toxic? We will get a response and based on that we will alert our community manager at admins or we will just save it to the database for a later evaluation. To see what's going on really with the message triage is shown on this schema here. We have a reader loop, new message counts, we ask is it toxic, is it not? Triggers alert. The approach is pretty simple. We are not trying to do something overly complex. We are using containerized services which are just orchestrated to work nicely together. So what are the benefits of this? Why we are doing it? Why we don't just say, okay, we have community manager, we have administrators, we don't need to do this. So what are the benefits? The big benefit of this solution is that we can find out about the issues sooner. People don't have to go over multiple social media. You don't have to go over the forums all the time to capture every new post, every new comment. And if the automatic system gets a message which is toxic, they can alert them, they can jump on the issue, they can mitigate the problem and it brings them less manual labor because they don't have to proactively search. They are receiving the toxic messages. And from the long-term perspective, this is also interesting and pretty useful is we can observe incidents on, let's say, monthly basis. We can see, okay, here was a peak. What was it about? And then we can see, okay, someone was complaining about this and we can make decisions based on this. Is it because something we did? Is it because we something we did wrong? Is it something we can prevent? And if the incidents happens one time, second time, third time, it's a good indicator for us. Okay, this is a persistent problem. We need to address it. For us, also the line of toxicity shows the community hope. We know that the community is getting better, getting worse. Indirectly, we can detect issue in our software. If we have people being upset coming and commenting about something in our software, we will know that there might be something wrong. And also in the development of any open source project, there are philosophical decision to be made. Something will the maintainers will decide, this is the path we want to go on. And again, this will help you to see was this decision correct? How does the community react to it? So here is a view from Grafana where we have the toxicity levels. We have some rolling averages where you can see the peaks where we can say, okay, this was an incident. This was something that would be addressed. Okay, so the last thing on the end, it all can run on Truness. So all the containers, all the toxicity detection, it all can run within the environment. So thank you so much for your attention. Thank you to the organizers. Hello, well, this is about war, war on words, right? But have you hit situation when, say, some competitor tried to infiltrate or break your community? This is a pretty interesting question. I would say, hopefully not, or we were not able to discover that it was an intentional attack. We would assume that most of the toxic users are either trolls because they just like to troll, or they are users with strong opinions where somehow the toxicity is not harming the community too much. So if different language is used or if the issue is properly addressed and de-escalated, it's still beneficial. So I would say we don't experience a direct attack like that. So you created something quite complex for something that maybe if you hired a person that can read, could do that for you? I'm sorry, can I go ahead? Like you could actually, like a person could do that, right? If you could hire someone who just needs one skill reading and he could do that, right? We actually have a community management in place. We have people which are doing this, but also there is lots of sources of the content. Also, if you know how already it works, someone can open an old thread and you will very easily miss that. If you have a tool which reads every single comment, then the manual labor-safe is significant. Do you actually provide, I mean, I'm in particular interested in the question, is it toxic, is this published somewhere so we can, as a community, use it to detect messages? Yes, actually, you can find the Twitter stream reader, the subreddit stream reader, and the toxic, the TensorFlow toxicity model wrapped in the REST API as computer images on my GitHub. And I think that it's also on Docker Hub, but I'm not sure if that's very well documented there. So go into my GitHub repositories and find it there. It's not much. We just wrap existing solutions with a few lines of code which does the job we need. Please, what's your model for improvement of your detection code? I'm sorry, come again? What's your model for improvement of the detection of the toxic language? I mean, it's not 100% reliable and I think you detect some false positive and false negative events. So we didn't get there yet. So fine-tuning the models to really fit our situation is something what we plan to do in the future. So far, we are working on the data set of the answers or the comments to train, but we don't really know how to do this yet and what will be the approach. So far, we are just using the generic pre-train model. Do we have any more questions? Thank you, Vaklav, for being here. Thank you for having me.