 Okay, so hi all, if you're all in here, my name is Callie Doffy and I'm so excited to be presenting right now and to share a little bit of my story and the work that I currently do within the Fedora community. I kind of wanted to start off with a small story from like the very beginning of my career if you could call it that, that highlights what I value in a community and has led to be a major inspiration of my choices professionally. After my freshman year of college, I got an internship, it wasn't anything shiny or fancy, but it was something and because of that I was so excited. I remember the very first day we had our normal internship orientation, we got sent off to go to the floors to go meet our manager and I walked on the floor, it was a tech-only floor looking for my manager and a random man walked up to me and was like, oh, are you one of the new interns and looked me up and down and I was like, yes, I'm still just excited to be there. I'm looking for my manager and I couldn't even finish my words before he looked at me and was like, sweetheart, you're lost, HR is on the second floor. For the rest of the summer I worked in a closet and my manager hardly speak to me. I wasn't really given any work and the person I mainly reported to was counting down the days until they quit. While this obviously wasn't a perfect experience, it was formative in my career and I realized what I prioritized in my work in communities and kind of as a tone for me for this presentation, like prioritizing inclusion, passion from myself to my work and also the people I surround myself with in having meaningful work. And so with that, I'm Callie Doffy and I'm a part-time student right now at Boston University pursuing a joint master's in bachelor's degree in computer science and I'm interning for the open source program office as a machine learning data science intern and we'll get to learn a little bit more about the work that I'm doing with that in a little bit. So as the background of me, I feel like kind of where I came from and a lot of things more in my high school years really form where I am now. And so I was born and raised for the most part on a very small island on the coast of Texas and yes, Texas does in fact have islands against all of the popular belief up here. My parents work in the seafood industry and tech was very far from the focus of my town. School really didn't come easy for me growing up academically or socially. I had a rough time learning how to combat my learning disabilities and others. But with that, I'm really thankful for the worth ethic that it forced me to develop at such a young age. Also during this time, I dove headfirst into the competitive softball field, which is a huge part of who I am. I spent most of my time as a kid either practicing softball or studying. With that, for some reason that I still don't understand, trying for kids in my town made you a target for vicious bullying. It was the cool thing not to care. And for somebody like me who just wanted to try as hard as I could, whatever I didn't see what happened, it was very toxic. This was not the best environment. And I'll be forever thankful that my parents decided for my last two years of high school to move off the island and to go to a high school for I could get a much better education and be in an environment I could thrive in. When I transferred to high schools, it changed my life for a lot of reasons, mainly my career trajectory. I signed up for a computer science class for no other reason than I heard it was an easy advanced credit class to boost your GPA. Little did I know I would meet people and a teacher that would change my life. The teacher of that course, and his name is Mark Russell, ended up being my mentor and pushed me to reach my full potential in computer science and softball. His goal as a teacher was to push and mentor more women to go into computer science. My freshman year of college, he actually passed away unexpectedly. And I know he is looking down, so excited for me doing this and talking about how I got here. Finishing up high school, I knew CS in this world was for me. It was the first time I actually done something in school where I didn't feel like I was climbing uphill. This was for me and just how my brain worked. My senior year of high school, I committed to Boston University to play softball and I was so ready to go and experience a new world. Little did I know that I'd only actually end up playing for one year. During my freshman year, I actually broke a good number of my ribs pitching and ended up being medically ineligible. And in the weirdest way possible, it was probably the best thing that ever happened to me. Leaving a world I always knew was terrifying, but it allowed for personal growth I couldn't have imagined. It allowed me to learn how to love and be confident in myself, especially in an academic and professional setting. I'm a woman that is going to hold a large presence in a room and be heard. And for so long, I was told that was not okay. And if you've seen me in meetings or anything, that's just, I know it's who I am and how I want to navigate. I won't dim my light or my ideas to fit into a box generations priors have created for me. And I will empower women and other non-binary folks around me to radiate their own energy in the workplace. Nothing less and nothing different than what is uniquely theirs. When you finally have a chance to explore your interests, your passions start to expose themselves like just naturally. For me, professionally, they focus on two major things. One is the pursuit of safe and secure cryptographic algorithms and standards. This is a little bit of a sidebar just about me. And this includes protecting people's personal data and advocating for legislation to hold companies accountable to a modern day standard. And the other one, which I'll be speaking a lot more about today, is discovering how to use data science and machine learning to help uplift our minority communities. Through different courses, I was horrified to learn how data science and algorithms had been created and were not taken to account. The potential biases of the data that trained them and the potential malicious uses for minorities. I feel like we all can relate in this sense of just wanting to create, to just like learn and figure things out. And not realizing the impacts that could come from releasing that to the world. This is like where I started to find, where I can make my impacts on inclusion and where tech and social issues collided. My passions were there. I just didn't know at the time if there really existed a work environment. Where I could do this and be surrounded by others who shared the same passion. I wanted to work in a place where I could feel the passion radiating off the people I work with and the drive to make communities a better place. This transitions well to talk about the current work I'm doing involving the Fedora community. This past summer, I got the incredible opportunity to enter for the open source program office under Brian Prophet. And I'm lucky enough to have my internship there extended through May. Going into it, I never would have dreamed I would have had the creative liberty to shape the focus of my project. Going into it, I knew that I was going to be working with sentiment analysis. But not really much more than that. And with that, I got the opportunity to put actual actions behind my words of wanting to use data science to promote diversity and inclusion and hopefully make the space more welcoming at all. So now we can get into the project a little bit of what I've been working on for the past few months. So going a little bit into why they wanted to focus on mailing lists and the impacts on open source communities as a whole. These mailing lists have been a target for mining sentiment and emotions. This is a communication standard that for distributed open source communities everywhere. And with that, sentiment can provide insight into community health and how each other are being treated and really can open up some questions is this a welcoming environment for all? And so and with fostering like positive community health, you're making a stronger community and allowing for more people to have a space. And so originally, my project looked a lot different than where it is now. I came in and we had three major focuses. It was first was just on flagging like negative conversations and seeing if sentiment analysis could be used to identify like overly negative conversations and notify community managers. The second focus was on identifying discriminatory language and is really where this project has taken place now. It's to see is is there a way to identify members or emails that are engaging in discriminatory language that might be unwelcoming and push away people within our community? And then another thing that we were looking at originally with behavior around major releases is their trends and sentiment analysis into performance around community events like releases. And so during this time when I started the project, it was about is back in June, things socially in the world were starting to get very heated in looking at what had been my passion thus far. The second option really took my focus. I started to do more research on what the open source community looked like and these statistics I'm about to talk about really made me be like, this is where this is needed. This is from a survey done by GitHub on open source users and developer 95 percent of users and developer identify as male with only three identifying as female and one as non-binary. As well, there's only 16 percent that identify him as the minority within their respective country. And also only 7 percent identify as gay, lesbian, bisexual or asexual. When you have your minority groups so small within communities, it makes you want to look at why is it and how can we change that? So from there, I had a completely updated focus for this project that I'm going and still working on now. Looking at trying to have to create hate speech and offensive language detection on mailing lists. Right now we're at the stage of modeling and looking to train multiple models to determine the best approach for detecting hate speech and offensive language. And the next stage, we're turning this into a service that will automatically clean emails, label emails and notify managers of problematic emails and users. And this is going to be one of those things where it's not like you're flagged. It's done. It's going to be an open conversation to see what is happening within the community that's making this happen. And then the extension for this project is looking to use the model to determine if there's trends within these threads and figuring out why this is happening and what is allowing for our space to not be welcoming for all. And so the majority of the beginning of my summer was actually spent on data cleaning, not the bright and shiny portions and machine learning that everyone likes to talk about. When I first got the data from the mailing list of the developers and users, the formatting of it was completely stripped of all of its HTML, which for a little bit of context is usually how people identify what text is actually relevant and what you're going to be wanting to use for sentiment analysis and what is just random and not really something that you would want to use. That all just got put into one place and I got to do with it what I would. I actually want to show you all first kind of a little bit of the format. This up here is how might the emails first came in. I'm not going to lie that being an intern in a remote setting, I like actually remember this so vividly. I was in the middle of nowhere, Texas and working on a pretty much a folding table and first seeing all this the first time, I was like, where am I? What am I doing and how am I going to get somewhere with this? As you can see, there's a bunch of tags that really isn't what we're looking for. There's also layers of different emails on top of one another. If you look at this, the only text that we want is about two or three sentences. Whenever we were getting probably, I would guess looking at this is about two or three different emails that are within the same thread. With that, a lot of trial and error came in and I end up using four major tools to be able to get the data to the point where we can be doing some sentiment analysis work. I actually go back to this example. The notice is that I started to pick up on that there was these tags, these different tags at the beginning. They're just a bunch of, I think it generated by the scraper and text that we weren't going to be wanting to use. If I could identify what was the last word of these major chunks of unuseful data, I could cut that and start getting into more of the meat of what I'm working with. With that, I figured out what all the common documentations for the beginnings of the emails was going to be used and use that as a cutting reference from the beginning or the end. You had to play with it a little bit to figure out is this going to cut the email that we actually want or an email that is to be discarded completely. The next step was special character removal, which is something that's pretty similar to or common for machine learning cleaning. And then the next two steps actually build a little bit upon each other. It's something I'm actually pretty proud of coming up with. So once I got to a point with the cleaning that all I had left was just the different text of emails where I only wanted one of the blocks. There would be maybe an email from earlier on the thread, maybe before and after. There was no formality way of how it came out. But I decided to put all of the emails in chronological order and make the dictionary of all the threads so that they would all be identified from their subject line, which one was coming from the same place, and you'd have them in chronological order of all the ones that were together. With that, I used that information to use a sequence matcher. So by taking, let's just say, the fifth email in a thread, I would compare it to the first, second, third, and fourth, and see if there was large chunks of similar text from the ones prior. If there was, I would use that to cut from the later email so that it would leave only the text from the actual user at hand. And so this is a good example right here of how much of now this text was just, I said this is removing all of the special characters, all of the nonsense, just having sentences. And we figured out, the algorithm figured out which ones had been seen before and used that large chunk to cut from the text that we were using. And so from there, getting a clean data set and ready to go into the next stage of modeling, which we're at right now. The first step was actually creating a training and testing set. And this was something that I really wanted to do right there. It wasn't exactly any set that was similar to the data that I had been using for online as much as I looked. So I knew that I needed to make my own set of labeled data to start training off of. With that, if I was the only person labeling, there would be my own personal biases in it. And I'm really trying to navigate in this world differently than people in my past have and start making my own data. So I had a lot of, like, era for data science and machine learning. And with that, I actually had about 10 to 20 different people help label the data using definitions of hate speech and offensive language to navigate. And so then by having a distributed amount of people labeling the data to train off of, you're not going to get weighed down by one person's bias because let's face it, we all have them. And with that, I'm now at the stage of trying to figure out what is the best model for the Fedora mailing list. I decided that I wanted to try to compare about six different machine learning techniques with four different data sets that I've found. The emails one is the one that I created and while it's applicable directly to the similar to all the texts that I have, it's not as large as these three other major data sets that are being used for sentiment analysis all over the world. And so I'm going to pretty much apply each of the tools with the different data sets and see which ones get the best results. I'm personally really excited to use this data set that's on emails with sexist language and trying to detect to see if there's anything there where we can try to detect sexist language within our mailing lists. And so yeah, that's pretty much where we're at with our work and thank you very much. Hi Wow, that was awesome. Oops, hold on. I don't mean to do that. That is a ton of work and it is super impressive the part where you're like patting yourself on the back. I'm going to also pat you on the back. That was awesome. I'm really excited to see what we can get out of this for Fedora and it wasn't like, you know, we're not sure exactly where it's going to go, but Matthew and I kind of worked with Callie a little bit to you know, think about what we might want to get out of there and stuff maybe that's not just about discriminatory language. Like can we talk more or can we look at it and say like, what are people talking about the most or etc. etc. So there's a lot to be done with this kind of data that she's pulling and working with. So very cool. Okay, we have a question. We have 15 plus years of traffic on the Fedora DeVal mailing list. Will you be able to analyze this fightiness? No. This over time, he also said fightiness. So yeah. So yeah, can we analyze like a bunch of data or is it a lot of work to do that? No, so one of the great things is kind of the tools I was using to help clean is that all of the emails you have a tag of exactly when the emails were sent. So I remember going through and starting to analyze the emails. The amount of emails that I have read from the Fedora mailing list from 2003 is more than anybody ever should just trying to figure out how to how to finish and fix things. But that is definitely in the scope because I mean I said every single email we have a tag and while we're still training the algorithm once we get to the point that we can start cutting up the set and being like okay let's look at 2003 through 2008 versus like the past 15 years. It would be really interesting to look at some trends like Fedora has released every six months so like the things maybe get more stressful before the release or is there just more activity or you know we can kind of we can start making graphs and we can look at some of that stuff so that's really cool. I thought it I think oh one thing I noticed that's like interesting just about tech culture in general when you start looking at emails back in 2003 sign-offs were very common back then and the nature of a lot of sign-offs during that time is something that for me was shocking to just see there I mean the language that was used was concerning at times if I'm being completely honest and so I'd be interested to see how it views the overall culture of tech of being that one thing I noticed just that everything was assumed to be you're talking to men at all times and that was the verbiage that was used or the example of the unknowing user was always the wife and so it's just an interesting dynamic of how we communicate how that changes. Sure, sure. So I had kind of one of my one of the favorite parts about your story was the part about your mentor mm-hmm. I really think that mentors are such an important part of our life journeys but especially when it comes to tech and my mentor was really the biggest reason besides the other friends that I made that I stayed with Fedora and I continued on here and it was really great to have a woman as a mentor as well. I know that you had a high school teacher with a man obviously men can be great mentors too but it was so fundamental in keeping me with Fedora so I guess I want to ask like how did you foster that connection and how can other people kind of try to find those connections or do you feel like it just happened by chance and you're just totally lucky or did you put effort into making it happen? Yeah, so there's a couple of different layers to that so I said whatever I met Mr. Ruxel I was the new person in town I had just transferred to high schools as a junior which 0 out of 10 recommend to anybody it was a great in the long run but the TV show of eating lunch in the bathroom is those things are real and so him and I automatically connected in the sense that he was also a part of the softball world he had daughters and he had coached before and so that was that was my identity for years that's everyone knew me whenever I came in that's who I was is that's kind of what started the conversation between him and I and how he started to get to know me and for me I put a lot of the credit on him because his passion he was super big in the NC WIT program I don't know if you're familiar it's pretty much a woman and computer science scholarship that's big within high school students and college students and he knew what my schedule was outside of class and he was like you need to focus on trying to become a part of this community like this is a place for you and allowed me to take like a week or two off of our like my actual schoolwork to be able to focus on applying and end up getting the scholarship and that was like my first step into the CS world and so in some ways I just consider myself incredibly lucky because he kind of he saw the potential in me even though I never had he said he knew I got in the class in the first place I was I was looking for the GPA boost and saw my potential and what I could be it was just like you're going to do this and so I said I am so incredibly thankful for him and why at this point like mentoring women into the CS world and just open source in general is a huge part of my life just extending his legacy and also figuring out like being on the mentoring side is so rewarding like I'm still figuring out my way but getting to help and work with like high school and early college like students who are women and like I'm in a computer science club and BU I have a lot of like younger women who I've kind of helped and are hoping to get more into this world and as I said he changed my life and it's part of it you can find I think part of it is natural and part of it you can seek it out and so it's a little bit of both. I think the piece that I took from it was finding groups with similar interests so you know beyond his involvement with tech so for example if you're in Fedora if you're really passionate about like a niche part of Fedora or part of tech there's probably other people who are also really into that and if you're sharing a passion with them they might be more like interested in mentoring you within the Fedora space so I kind of like that's how I kind of translated to maybe how it might apply for Fedora but if anyone else have any more questions feel free to drop them in the chat. There was another thing that I was going to say is that it kind of shows that like for any like men who are in this chat that showing that by opening and bringing a seat to the table and involving engagement that is how we start to extend the diversity and opening up our community from the gender stand standpoint that his focus was on I want women to get more like that was his focus and I know so many people who are kind of in the same spectrum as me have never been here and so that's how people who want to be involved and like want to make that impact and from bringing like more diverse people in that's I mean that's how we can help and what people honestly need to have a little bit of bringing more seats to the table because there is room right so I have to I'm just going to share one little story here so when I went to my first flock and oh gosh it was in Prague it was maybe 6 years ago something like that there was maybe 10 women there maybe you know it was like a conference of like 200 people and I had like moments where I felt very at a place you know and I'll share more of my story there but the point is at the last flock that we had in Budapest there was like 30 women there so it might be happening slowly but it's still happening within Fedora and I think the point that you're making about men being mentors is really important because you know women we come into this space and we're like oh man we want to forge the way for more women let's mentor you know like I mentor actually I'm doing an outreach mentorship internship but like I think we need them we need men too it's going to happen at too slow rate I mean it's everybody it needs to get involved and one thing that was kind of reflecting on when I was making this presentation I said I know like my personal person I am loud I am a person that's going to hold a space in a room and be completely fine with that and that's something that's almost required of women in field of tech and that's something that I've really reflected on being like okay how do we change the tide that women in general can portray themselves in how they want to like it's great that I'm the way I am but not everyone should be required to be this loud and huge space in person like I feel like whenever you're looking at women if you're not if you aren't that big standard then people are going to overlook and push you aside which is just it's a huge problem that needs to be addressed yeah so I think like kind of what that made me think about is just like approaching people in different ways and giving different ways of accessibility so that people don't aren't expected to all use this one method to do the thing because it's not going to work for everybody you know and that's you know and when you say building a whole community that means like a community that's open and accessible just on it's on face that's like one of the things that we have as part of our foundation so it's very cool well thank you so much for presenting I don't see any more questions in the chat if you want to say anything more wrap it up you're welcome to oh no oh my gosh I almost forgot I almost forgot we're doing something very cool this year we are trying to make a video after this event and we have written up a little script if you would like to participate sure yeah okay so this is recorded right now so no nothing to do except hang out right where you are I'm going to grab the text and put it in the chat um so we made this really cool video for our conference in two summers ago now in Budapest we're going to make something similar for this so this is what it is over here so we're having people speak the first part in their native language which for you would be English and then the second part in English so the whole thing will be English for you but we're happy to have that as well so whenever you're ready you can go ahead and read that out Hi my name is Kelly Dolphy I'm from the United States and I speak English we are from different countries we speak different languages we are of different cultures but Fedora Unites us with open source awesome does she have to do it twice? only if you want to do a second take you didn't say that I am a woman part oh shoot okay I want to redo it because I want to say that I just didn't okay go ahead okay so Hi my name is Kelly Dolphy and I am from the United States I am a woman and I speak English we are from different countries we speak different languages we are of different cultures but Fedora Unites us with open source awesome so yeah we're going to take this content and make a couple cool videos afterwards so awesome yeah I'm glad you could participate let us know what you're doing with your work we totally want to know if you have some updates feel free to like come to me but I think the community might be interested in what you're working on and if you ever write up a blog post or anything like that the first post is coming out actually soon well we have a community blog and anyone with a fast account can go in there and submit a post for our blog I can send you the link I think the blog we're doing a three part blog series on this project with redhot.next look at you fancy the first one is in the review process and should be coming out in the next few weeks well maybe since you have that official red hat open you could just link to that one and say hey fedora community I wanted to let you guys know I've been working on this and if you wanted to get people involved you could say this is how you get involved etc etc so it's just like more casuals but especially if you have this like super a fish one over on red hat open or next or whatever I don't even know all right I'm going to pop off thanks again and for everybody else this is the last session for today so we'll see you tomorrow bye everybody