 Alrighty. Good morning, everybody. Hi. Thank you all for being here. My name is Courtney Miller. I'm a researcher at Carnegie Mellon, and today I'm going to be talking to y'all about the work that we did called Did You Miss My Comment or What? Understanding Toxicity in Open Source Discussions. So this talk is going to be broken down into three segments. We're going to start by introducing online toxicity, then we're going to contextualize it in the space of open source, and finally we're going to talk about our findings on our understanding of open source toxicity and its implications on real-world practice. So let's start with an introduction to toxicity. Let's start with the definition. We define toxicity as rude, disrespectful, or reasonable comments that are latefully to make someone leave a discussion. This is based off of the definition used by Google's Perspective API, which is well-established in toxicity research. Online toxicity is pervasive across the internet. According to the Pew Research Center, for example, four in ten Americans have experienced online harassment themselves. And online toxicity can be harmful. The negative harms of online toxicity have been recorded in many different fields. For example, cyberbullying among school children has been connected to increased rates of anxiety and depression. Additionally, a study on Reddit users found that people who had been the target of intergroup subreddit conflict were less likely to contribute to their subreddits in the future. Now, because of its widespread prevalence and negative impacts, toxicity has been studied thoroughly on various platforms like Facebook, Wikipedia, Twitter, Reddit, and even Stack Overflow. And you might be wondering, well, why do we need to study it on all these different platforms? And if we've studied it already on all these different platforms, then why do we have to study it in open source? And thankfully we have an answer to that. Otherwise, this research will be useless. So, that answer is because from research on toxicity in different platforms, we know that it tends to manifest differently on various platforms. It's like each platform's toxicity is a unique baby bird in the nest of overarching toxicity. We'll now briefly discuss what toxicity looks like on various platforms to give an example of the diversity of different forms of toxicity and some of the ways that each of them deals with their unique toxicity. Let's start with Reddit. Why don't we? So, the prevalent forms of toxicity on Reddit include intergroup conflict. The majority of these conflicts are initiated by a small group of subreddits, and they tend to also come along with a serious hate speech issue. So, as a solution to this toxicity, Reddit deleted a small number of subreddits that were responsible for the majority of this conflict in hate speech, and this actually ended up being a successful intervention even in the long term with over 80% of these toxic users not returning or just coming back to a different subreddit. And then Wikipedia. Some of the most prevalent kinds of toxicity on Wikipedia are caused by people who are referred to as Wikipedia trolls and Wikipedia vandals. Wikipedia trolls repeatedly and intentionally cause harm to both the community of contributors and the actual encyclopedia itself. They contribute unhelpful and non-constructive remarks, they attack users, and they tend to work alone and anonymously. Wikipedia vandals cause damage that is time-intensive to address, including deleting articles, using inappropriate user names, and adding incorrect, irrelevant, or policy-by-delating text to articles. To deal with the toxicity they face, researchers found actually that communities participate in collaborative vandal fighting processes. So it's actually really cool. Wikipedia contribution groups have actually found some really effective tools and methods for dealing with the toxicity they experience. Furthermore, many researchers have developed techniques for automated detection of Wikipedia vandalism. So when someone does something like delete a whole Wikipedia page, it's not a big deal. So congratulations to Wikipedia contributors. They did a great job. And then Twitter. Oh, it's a Juicy. So online harassment is considered endemic to Twitter. Studies report hate speech, doxing, cyberbullying, misogyny, racism, and Islamophobia as most present. In addition, firestorms are a common occurrence on Twitter. Firestorms are where an account receives a sudden and intense influx of negative attention. Firestorms, mob mentality and trial by Twitter are employed as mob vigilanteism. So Twitter deals with some of these problems in general through deletion, suspension, and warning of harassing accounts. In addition, they rely on authorized reporters and trusted flaggers who have special privileges to identify and report inappropriate content on behalf of others. So essentially, moderators. And finally, last but not least, stack overflow. This is a unique one. So a study of norm violations on stack overflow found that personal harassment, swearing, and other unwelcoming comments were most frequently observed types of norm violations. There's this thing where the stack overflow community guidelines also request avoiding greetings and thank yous in questions. So articles are often downvoted and deleted for violating such conventions. And this actually becomes really problematic with newcomers and first-time users because they are unaware of these norms, and they are often chastised, criticized, and chased away from communities by violating these norms that they didn't know existed because they're new. So research has also found that female Reddit community members especially tend to have fear of harsh criticism from their peers. It's so bad that a former stack overflow executive vice president even admitted in a blog post that too many people experienced stack overflow as a hostile or elitist place, especially newer coders, women, people of color, and other marginalized groups. So to address toxicity in stack overflow communities, stack overflow uses two main tools to address their violations of codes of conduct. They first use respected community members who act as moderators and secondly they deploy bots that detect toxic language and then sufficiently severe comments are then escalated to moderators as well. Additionally, mentoring mechanisms for new users have been explored to avoid behaviors that might receive harsh responses like saying thank you. So because as we've just established, different platforms experience toxicity in different ways, they require and they deploy different strategies for dealing with that toxicity. For example, remember Reddit's toxicity, they deleted a small number of subreddits, worked out well for them, whereas Twitter, they address harassment in part by relying on trusted community members who act as moderators. So really important here, depending on the type of toxicity a platform experiences, they tend to have success with different strategies. So now we've introduced online toxicity, we've discussed how toxicity manifests differently on different platforms and we've discussed how different identification and intervention methods are effective for different forms of toxicity. Now let's talk more specifically about toxicity in open source communities because unfortunately, like other platforms, open source and GitHub also has toxicity. We know this because many open source practitioners have reported on their experiences with toxicity within open source communities and the negative impact it has had. Additionally, during the research for this project, we came across tweet after tweet after tweet after tweet after tweet from open source developers who are expressing their frustration with their experiences with open source toxicity. And we also know that toxicity is a major threat to diversity and inclusion. Prior work has found that it can be especially impactful for members of certain identity groups, particularly women, who are already severely underrepresented, as I'm sure pretty much all of us know. And given many open source projects struggle to recruit and retain members in the first place, ostracizing and excluding potential contributors through toxic behavior only harms the health of the communities in the long run, right? And although multiple studies have attempted to automatically detect open source toxicity, they by and large rely on pre-existing toxicity detectors trained on toxicity from other platforms and they also have mixed success. Already, previous researchers have hinted that toxicity and open source manifests differently than elsewhere online and more domain adaptation for detection and intervention is needed. We argue that one of the key issues here that has caused this mixed success is that fundamentally on any platform to design and deploy effective identification and intervention strategies for toxicity, you must design these strategies based on the specific kinds of toxicity present on the platform, fundamentally. And since most of the pre-existing work building automated detectors in open source uses things like perspective API that are trained on predominantly toxicity data from other platforms, we have a problem here. Instead, we argue that these detectors should be based on the types of toxicity present in open source. Unfortunately, existing open source toxicity research has been limited in scope. Previous research for the most part has studied the impacts of specific forms of toxicity in particular parts of the open source development process or in particular communities rather than exploring at a high level what even is open source toxicity? For example, a study of emails from the Linux kernel mailing list that were associated with rejected code changes found that uncivil comments frequently occur and they most commonly occur in the form of frustration, name calling, and impatience. Another study found that potential contributors to open source projects consider the politeness of the project's communications when they decide whether they're going to even contribute to it or not. And that they tend to be less likely, surprise, surprise, to join projects with more impolite communications. So, to take a step back. While we know open source toxicity exists because of reports from practitioners and everything else, and there have been a small number of research projects studying open source toxicity and its impact on particular components of the open source projects, there has not been researched on understanding open source toxicity and its characteristics as a whole. We argue that what is missing primarily from existing open source toxicity research is a deep understanding of the fundamental nature of toxicity in open source. What are its common forms? What scenarios does it occur in? Who are its originators? And how these characteristics compare to other online platforms where toxicity occurs? Only given such understanding can we begin to design the most effective detection, prevention, and mitigation strategies. Yet this qualitative work is still missing. So, in this project, we aim to address this gap in our understanding by looking at the qualities and characteristics of open source toxicity so that in the future we can build and utilize more effective identification and intervention strategies for dealing with toxicity in our open source communities. So with that, let's get to what we did. Yay! So, understanding open source toxicity. We break this part up into three parts because we're academics and we like to talk about research methods a lot. So we're going to start with that. I promise we cut it down. My advisors were like, please, please cut it down, and I did, so don't worry. But we're going to talk about our research methods, then we're going to talk about our findings, and then we're going to talk about our implications for real-world practice in moving forward. Research methods, rip off the bandaid. But before we do that, we're going to talk about let's engage with some ethical considerations for this research because it is on a sensitive topic. While some developers openly discuss their experiences with toxicity and its negative impacts, those discussions tend to be generic. They abstract from concrete instances. In previous research, our group tried contacting developers who had already spoken publicly about toxicity. But for the large part, they either preferred not to discuss it or they had already deleted the concrete instances. So since toxic comments are usually unpleasant interactions, we intentionally discarded research designs that relied on interviewing or surveying developers, especially those who had not already spoken out about their experiences with toxicity. In line with the Belmonts report Ethical Principle of Benefits for Human Subject Research, we considered the potential risks too high for an initial study. Instead, we decided to work entirely with public artifacts that did not require the need to contact participants. So to address our research question fundamentally, we knew that we wanted to look at a bunch of open source issues and then we wanted to study their characteristics. But because there hasn't been previous research studying open source toxicity and hope, there wasn't a pre-existing data set that we could just use. So to address our research question, we had to follow two steps. We collected our sample and then we analyzed it. Let's start with collecting our sample. Now, the first problem we faced when attempting... I took a lot of pride in these slides so I really hope take the humor, thank you. The first problem we faced when attempting to collect a sample of toxic open source comments is that there isn't yet a previous detector for toxicity of any form in any context. Each detector picks up on different signals and is therefore inherently biased in different ways. This is a non-trivial problem that we had to address when designing our research methodology because the problem becomes, well, how do you identify something when you don't know what that something looks like? And additionally, since toxicity in open source is relatively rare, in prior work, our group estimated to be toxic, we couldn't just analyze a random sample of GitHub issues to find a sufficient number of toxic ones that would not have been a feasible sampling strategy. Since there wasn't yet a highly accurate and unbiased toxicity detector, we instead used five, which we'll really briefly go over but again, more on that in the paper. The first one was using a pre-existing toxicity detector. That was just our first, or technically our second because we applied it first to original issues. In this case the shitty package with no context. It's cute. They're cute. And then we also applied it to issue comments, which would be in this case the example is what's up with the hate? Why even bother posting this? We then identified issue comments that had mentions of codes of conduct, issue threads that had been locked as too heated, and issue threads that had been deleted. Now as you might imagine, all issue threads that had been deleted are not toxic, so what we had to do was then perform manually labeling to confirm their toxicity. To assure the reliability of our manual labels, we had multiple authors independently label the issues. Their inter-rater reliability was high. I'll spare you the details about the COA's unweighted CAPA coefficients and so on and so forth. But what you need to know is that they were high, meaning that there was strong agreement between the researchers of when the issues were actually toxic or not. The high inter-rater reliability also indicated that having researchers label these comments rather than relying on the original users or the affected users seemed to be okay. So once we had our sample of confirm toxic issues, we randomly selected 20 from each of our five methods giving us a sample of 100 confirmed toxic open source issues. Now, I know, it's very exciting. I was like, great, now I can do something. That was six months. So, cute. So once we had our sample of confirmed toxic issues, we said that. It's important to note that while our sample was diverse, it was not necessarily fully representative of open source toxicity and we have more on that in the threats to validity part of our section. So this is a diverse sample, this is not a fully representative sample. Now let's analyze it. So, we used qualitative analysis. Specifically, we performed thematic analysis and card sorting, more on those methods in the paper. But we chose these qualitative analysis methods to understand and explore the toxic open source comments that we wanted to approach them as open to mind as possible and we wanted to be able to study many different dimensions or characteristics of toxicity. Now you might be wondering what do you mean by dimensions of toxicity? And this is what I mean. These are the dimensions of toxicity that we identified. We have things like, for example, the nature of the toxicity, what triggered the toxicity, who was the author of the toxicity, who was the target of the toxicity, and so forth. So in our paper, again, we have more details on the definitions of all these as well as the subcategories, like, for example, what does it even mean for the trigger to be an error? Well, that one's kind of self-explanatory. But in the other words, I'll do questions at the end if that's okay. Thank you. Actually, is it a clarification question? Yeah, we'll talk about that after. We'll talk about that after, thank you. So, yeah, that's all on the paper. For the sake of time, in the next section we will only discuss findings related to a small number of these dimensions. However, again, we will discuss discussions on the findings for all these dimensions in the paper. If you're interested in learning more about these findings, please root out or check out the paper. So, let's talk about some of our findings. First, open-source toxicity is built different as the kids would say. The nature of open-source toxicity is different than that on other platforms. The predominant forms of toxicity that we observed in our sample were entitled, arrogant, and insulting comments. Let's look at an example. You're not supposed to read this. This is an issue that was opened in an open-source operating system platform with over 600 stars. The user was unhappy about the latest update and the removal of the minimize button and set update, and here are a few quotes. The problem is your team forcing us to use the operating system the way you want us to use it, although it makes it many times harder to use it your way than what would be convenient for us. Or, don't you really see how ridiculous is what's written in the blog post why there's no minimize button? It's explaining the decision to remove the minimize button in this update. And finally, so maybe it's time to stop trying to be Apple or Windows and forcing people to use something the way you want it and give a simple solution to make it the way people want it. So, on a major change or something, just a simple solution. So what these quotes do is they help encapsulate the entitled, demanding, and often insulting nature of the toxic comments frequently found in open-source communities in our sample. The next one. Experience developers are not necessarily non-toxic developers. That one's got to hit home. If you're a call from the mind map, one of the dimensions we analyzed was who was the author of the toxicity. The four main categories we identified were experienced developers who had lots of previous experience in open-source but not in the particular project where they wrote the toxicity. Secondly, project members. These are people who were contributors, members, or owners of the project where they wrote the toxicity. Third, we have new accounts that had no more than three previous activities on GitHub in general. And fourth, this rather interesting group that we call repeat issuers. These are people who just wrote issue comments oftentimes a whole lot of them but had virtually no other activity on GitHub on these accounts. And now, despite the horror stories we heard from the Linux community, we didn't actually expect project members to be frequent perpetrators of toxicity in their own communities. And what we found was that we were wrong. Congratulations is not just you guys. What we found was that toxicity wasn't only done by people external to open-source projects. Project members were also frequent authors of toxic comments in their own projects which we found surprising. In this example, a user of a project found a bug, wrote a solution for it and created an issue. We did not categorize the original issue as toxic. Then, a project contributor responded with a toxic quote saying, to start, you're all probably not using the latest version of the database. This is not my problem. They go on to say, I expect a ban if you absolute idiots continue to spam me with me too from this other related issue. So, yeah, cute. So this is an example highlighting the fact that toxicity wasn't only done by people external to open-source projects. Project members were also frequent authors of toxic comments. In most cases, they were responding and the comment they were responding to was not necessarily toxic as in this example. So finally, implications. What do we do with this? So here are a few of them. Lots more on the paper again. The first one, there is space and some would say plenty of it for open-source specific detectors because off-the-shelf language-based detectors of toxicity trained on data from other platforms can now be confidently expected to not generalize well to GitHub. This user wrote a comment on a different issue thread. They became impatient after not hearing a response after five days. So, they created an entire new issue referring back to the comment in the previous issue thread. This blatantly passive-aggressive issue was called hold out by a project member enclosed. Now, despite the fact that the project member and our research group established this as a toxic issue, just did you miss my comment or what? If you were to run that through an existing toxicity detector, hypothetically, it would not be identified as something with a high probability of actually being toxic, right? And that's because the milder language of these sorts of comments makes them more difficult to detect by looking for bad words or other aggressive language or obscenities. Entitlement and arrogance are not often toxic through the use of strong language, but rather through the message they are conveying and the context they are conveying it in. So, we will require models trained specifically on a corpus of toxic open-source discussions. Next, research into the harms of toxicity is needed. We noticed a few different things during our research exploration, and one of them was that there were clear harms coming from the toxic comments that people were having to deal with. For example, contributors would often go into genuine detailed conversations trying to tease out the problem out of a toxic issue, or they would explain candid, clear frustration, disappointment, and confusion about the toxic comment itself. However, because the harms of toxicity were outside the scope of our project, we did not rigorously collect a lot of evidence about them, so we will not make any claims. Although open-source toxicity looks different than toxicity on other platforms, it is important to note that it is still stressful, and it can cause serious negative impacts on the developers and communities who face it. As we informally also observed during our analysis, the frustration and other negative feelings we observed aligned with the many reports from practitioners in blog posts, talks, and even tweet after tweet after tweet after tweet after tweet. Yup, yup, y'all got it. So, FutureWorks should investigate the harms and impacts of open-source toxicity more thoroughly. Alright, how about some summarization? Let's wrap this up. In summary, online toxicity is harmful and it is pervasive across the internet, including within open-source communities. Online toxicity manifests differently on various platforms, and different detection and intervention methods are effective on different forms of toxicity. While we know open-source toxicity exists, we do not have a sound understanding of its nature or characteristics. So we address this by first collecting a diverse sample of open-source issues and qualitatively analyzing them. We found that open-source toxicity manifests differently than toxicity on other platforms. It tends to be centered more around entitlement, arrogance, and insults. Since the nature of open-source toxicity is unique, FutureWorks should explore building more open-source specific toxicity detectors. Additionally, we learned that project members are also frequent perpetrators of toxicity in their own projects. Finally, although we observed that there were certainly harms of toxicity, we did not study it rigorously and in-depth in FutureWorks should do so. Thank you so much. And before we go to questions, I just want to have a little call out into the universe. We are seeking now we'd love to have conversations about open-source dependency management in dealing with depreciated dependencies. So if you or anybody else you know has some great stories or would like to talk about this, please reach out to me in my email, either like catch me at the conference or my email is literally just Courtney Miller and I would love to talk about this with you. So thank you so much and let's go to some questions. First off, thank you. This is awesome research. So much needed. I want to see more of it. Thank you. On the slide that had the sort of graph of where is it coming from, what kind of attributes, that one. Yeah. Maybe it's intended to be inside one of those, but I don't see called out demographic-based patterns where for example a maintainer might respond differently based on the gender or race of the person who's opening a PR or opening a comment and that isn't visible in just looking at one but across a pattern of behavior. Did you do any sort of pattern analysis like that? Thank you for your question. That's a really good one and the answer is that we consciously decided not to look at those because our sample was small and what we didn't want is the wrong people to find this project and be like women are bitches. So we intentionally said that's not the focus of this study and we're not going to carelessly just report that kind of data. We want to do that kind of research consciously and this was a high-level overview so we intentionally did not include that demographic data. Great answer. Thank you. So I guess our next question is how can we all help you? Because I think this entire room has been like yes this is awesome how can we help? Yeah thank you for that question. I think that there's one of the next steps is like literally like building. So the question is sorry to repeat for the online thing it was like how do we help? How do we move forward with this? And I think that that is exactly, thank you so much for saying that because that's exactly kind of what we want to do with this research. This is very much preliminary research and this is a launching pad. I think that there's a lot of different things this could go into specifically detection, work, building tools for example one of the things that we notice project members being authors of toxicity in their own projects. That's really weird. But it was often in or we thought it was weird. I shouldn't say it is. That's not an objective fact. That's an opinion. I thought it was really weird. And the reason I thought this was really weird because I was like why would they do that in their own projects? But the thing is they often were doing it in response to another person right? And so it was often done clearly the interpretation is in frustration. So in the same way that we have issue templates right for or creating an issue we're thinking what about issue templates for responding to issues so that each time you have someone who's new it's not necessarily a fault. They probably should have done the work on their own before they put it. But when they put something in and it's not quite what it's supposed to be, instead of having to put the time, emotional labor and effort into creating a response that might inadvertently reflect some of your own frustrations, we can just have a quick template, you can send it off and you can make it clear to them. So stuff like that interventions, detection methods all sorts of stuff. I actually want to speak a little bit to that. I'm a fairly experienced Linux developer and I'm analyzing my own reactions to things. And one of the things we sometimes see is people who use review as a weapon. So somebody puts in a lot of work to come up with a couple hundred patches for something and they send it out and somebody just nitpicks it to death. And I don't know if I'm going to argue for this point, but it could be a toxic response to those review comments, could be seen as a positive form of toxicity, if such a thing is possible, that we're being toxic to that particular person who is trying to derail somebody else's important work. Yeah, I don't know how I feel about this right now. I just came up with this idea of doing this talk. So maybe it's topic for discussion more than something I'm going to argue for. Yeah, no, I think that's a really interesting point because I think that open source maintainers are the atlases of the internet. They're literally just keeping everything alive. So I think it's really hard because I don't think that I would ever just say, oh, I condone toxicity in any form, but I think that context is important, and I think that it is almost like a flag of, like, we need help here. We're expressing a lot of frustration. We're under a lot of pressure, and what I glean from that is we need help dealing with it and finding tools and interventions to more effectively deal with it. So I entirely agree with you. I think it's a really interesting discussion, and I think it helps fuel some of the future work in this direction of, like, how can we make it so these maintainers don't get so frustrated, so they don't feel so overloaded, and they don't have any comments. So I have a question related to the domain of projects. And I come from background where we try to get students involved in open source software, and what we found is that the projects that have a humanitarian that have some social good associated with them tend to have, and again, we haven't studied this, this is just observational, a lower level of toxicity, and I'm just wondering if you could observe anything similar. Yeah, thank you for that question. Actually, one of the dimensions that did not make it to the final cut was the domain of the project. It ended up right, like, for example, my own hypothesis. I was like, oh, gaming? That's going to be a rough domain. You know, things like that. And within our sample, we weren't really able to identify, I don't know, we just didn't really get anything out of it. I think we used to be done in a bigger sample, or maybe we were categorizing the wrong kinds of domains. So I definitely agree that the domain is probably going to have something to do with it, but unfortunately, like in this work, we did not. We tried to, but we didn't really, we weren't able to glean anything on domain. Hi, I'm in the back here. I'm sorry for being late, so maybe you already covered this, but I imagine that the domain of respondents and repositories that you could cover were primarily volunteer driven, but I'm wondering if you have any advice or guidance to those of us trying to run an OSPO in a company beyond just code of conduct that could help reduce this or mitigate this. Like, do you think a company, if they are running an OSPO and see this type of thing happening in their community, do they have a moral obligation to interject right in a project that they're using? Yeah, thank you for that question. That's a brain teaser. I think it really depends on the company and its values. I mean, like, I can impose my own opinions and be like, yeah, absolutely. They need to make sure it's not toxic, but like, unfortunately, I don't run any companies. But I think that's a really interesting question and I think the corporate involvement does make it more complex, right? Because then it's like, okay, well, like, there are policies within the corporation about guidelines on behavior and whatnot. So I think that it's a really good question and I don't actually have an answer to it right now. But thank you for bringing it up. I think I'm definitely going to be milling over that one for a while. Building off, I'll ask a couple of questions on that one in particular. Did your analysis cover projects that are either maintained by or contributed to by people from different countries or regions of the world? And that correlates to professional versus hobbyist because there is a big difference between U.S. and Europe, for example, or U.S. and China and the nature of employment for contributors. And I'm wondering if that shows up in toxicity. Yeah, that's also a really good question. Unfortunately, we did not collect any demographic data on our contributors. So we do not have that information. But that's again, I think that would be an awesome thing for future work to do because I think you're totally right. And I think that again, that's a great question and like, I don't know if you're a researcher but like, I would like to see that paper. I think that there's a lot of these really good questions filled off of this in different more specific directions and that should absolutely be done. Hey, sorry. Sorry, I'm not sure if this is going to be worded properly because I'm still formulating the question in my head. But I was just curious if you have any thoughts on like mitigation strategies when you may have and I guess this like speaks more towards what you're saying about like entitlement and arrogance when there may be a comment that individually could be identified as toxic in this open source community but you run into a situation where the community begins to whitenight the offender and then how do you like mitigate when like the community is sort of sheltering that person from I guess like being criticized for that like abusiveness in their interactions. So thank you for that question. I think it's really interesting and I think it kind of reflects on a bigger question about open source communities and the cultures that they maintain and cultivate and I think that toxicity obviously is very closely connected to that so I think that I think it has a lot to do with the leadership and what they establish and if they whitenight people they're going to whitenight people and it's going to continue to happen and it's probably going to get worse. So I think that it's a really good point and a really good connection that you're making between toxicity and the higher level cultures that allow these things to permeate and continue to happen. Also there's a question up here, oh you got the mic, okay cool. So I've definitely matured and grown in my role as a manager throughout my career and I'm wondering how much of the project's longevity and maturity has to do with some of the responses. I'm sure people develop themselves as well as they grow with their project and if that has any role in the way you see things. Yeah absolutely we looked at the size of the project not the maturity of the project but going off of my own hypothesis I think you're absolutely right because you make a project and it's in its young phase we're in the start up vibes maybe to be tappy all day and then at a certain point you get a user base you get a following, you become an establishment and it's all about when you're building up to that point what are you establishing with the norms? Do you have a code of conduct? How's your leadership acting? How are the communications? What have we established? And I think that establishing that kind of stuff can go a long way and so I would assume that maybe younger projects maybe they're still in that metamorphosis vibe and they haven't actually gone to the point where they're establishing things that could have conduct but I think that's I definitely can think of informally examples of projects that are young and have established that but I also can think of plenty that haven't so yeah I think it might have something to do with it but also could be project specific who knows maybe someone could find out so I just have a follow-up question to that one that you just had or the comment answer you just had you know these days in an open source community we have our choices of different licenses that we can select is there a place for generally community driven consensus for codes of conduct because I think a lot of small communities awesome I'd love to hear about this because I think that small communities really struggle with like what's the right language how do I enforce it that kind of stuff and just having something like just as easily on github as I pick a license I can pick a code of conduct and it just applies would be great so the last couple questions have touched on having a code of conduct not just the measurement which is incredibly valuable and we need that information but also how do we respond to toxicity and how do we empower communities or members of community to do so even when the project maintainers are the ones who are the cause of the toxicity that is a really thorny problem yeah and oh sorry there is work ongoing the LF is funding some of it there's also a bunch of independent people who are working on this who have been working with codes of conduct and what we call consent for leadership I mean out of the Consent Academy for 10 years now we're in the process right now of evolving that and trying to create to your point a set of guidelines not just the code of conduct document that says here's our values but actually here's the process to follow to respond to an incident to respond to a complaint here's how to do it so that you are legally safe to focus on a restorative outcome stay tuned there's going to be more announcements about this coming out of the LF over the next 12 months or so that's awesome thank you much for doing that work I really look forward to hearing about that more thanks for the talk you put together all like the data from different real world projects did you use any did you develop any tools to do that or did you manually find these things I wish okay well I'm what I'm thinking is if there was a way to detect like the word shit isn't a comment or something right and we could just auto close issues because I think the toxicity comes when people just react and like build on top and so there are existing bots right there are bots that you can download into your projects today that will be based on things like the perspective API that will identify things like obscenities like that and I mean I couldn't do exactly what those bots do but they can identify the obscenities and I'm assuming probably close them or give some sort of notification I think our challenge is that not all toxicity has obscenities for example in our sample it's not here or actually yes it is severe language only about half of our toxicity contained obscenities so therefore if we use something like a tool that relies on that we're going to fail half the time so it's a good start and those tools exist now they exist today oopsie daisies but yeah I think that we still have to build more and I think that it becomes really complicated a lot of it especially the entitlement comes from the context and so that makes it a little more complicated and that's why we kind of advocate for additional building of specific open source detectors that would be able to be trained on those sorts of discussions if that makes sense thank you alright we're done thank you all so much again we love love love to hear if you guys have any discussions about dependencies depreciation dealing with that we're preparing for that and thank you so much for your attention