 So welcome everyone. Thanks so much for being here. My name is Rebecca Cummings and I'm the Digital Matters Librarian. I'll be introducing our speaker for today. Before we start the workshop. I'm going to start with a land acknowledgement. We acknowledge at this land, which is named for the Ute tribe is the traditional and ancestral home of the Shoshone. Paiute, go shoot and Ute tribes. The University of Utah recognizes and respects the enduring relationship that exists between many indigenous peoples and their traditional homelands. We respect the sovereign relationship between tribes states and the federal government, and we affirm the University of Utah's commitment to a partnership with native nations and urban Indian communities through research, education and community outreach activities. Now to our speaker for today. El Said is the 2020 2021 ACLS emerging voices postdoctoral fellow of digital humanities. She holds a PhD and MA in communications, as well as a bachelor's and master's degrees in computer science. Her work sits at the intersection of technology, culture and power, interrogating how technologies can hinder as well as facilitate our interactions, activism and public dialogue. Through her research, she seeks to make the voices of marginalized populations heard as well as help create healthier environments for their expression. She is passionate about understanding online human behavior and the relationship between technology and society with a focus on AI and machine learning. Her passion for equitable participation has recently led her to examine ways for responsible innovation and AI fairness. The topic of Dr. El Said's talk today addresses how we can use machine learning in humanities and social science research, as well as the biases that we can introduce through our data and algorithms through a case study of her design. Since we only have a short time, I do ask that you reserve your questions for the end of Dr. El Said's talk, but please feel free to put any questions in the chat and we'll get to those at the end. And with that, I will turn the mic over to our speaker for today. Dr. Yamna El Said. Thank you so much Rebecca for the lovely introduction. So I can share my screen now I'm co host right. Yes, you should be awesome. Awesome. Okay. So thank you everyone for coming to my talk it's pleasure to be talking with you today about my research and how we can use machine learning in humanities and social science research and that's basically my experimentation with this topic so I'm just, I would say a novice in how to use machine learning in social science. So I'm going to start off by giving you a little bit of outline about what I'm going to talk about today. I'm going to start you off with, you know, a brief introduction to what is machine learning what are its different types. Machine learning is a computational method. So I'm going to turn your attention to a case study that I've designed which is basically social science experiment in which I utilize machine learning for the sake of content analysis. And then I'm going to use that case study to introduce you to some of the issues or the challenges facing machine learning such as the problem of unintended bias. The problem of accuracy versus interpretability, which is an ongoing debate really seen some people question if this should be a debate at all. And then, finally, I'm going to introduce some of the resources that I found helpful and hopefully if we have enough time, I'll go over a quick demo of some of the resources that I mentioned. So AI and machine learning have recently become buzzwords and practically every circle in every discipline, you will find you know that people are talking about machine learning it's unintended consequences how it's changing society how it's changing culture. But it's actually goes way back I remember taking a class about AI and neural networks back in 2001 or 2002. But at that time we did not quite realize the potential of machine learning because the processing power was not keeping up with the neural knowledge at that time. So by the by this time after you know we had technology that's capable of amassing very large amounts of data computers that have great storage, great storage inexpensive storage and great processing power. He started to realize the potential of machine learning and that's why it's picking up in it has been picking up in the past few years. Machine learning models are basically computer algorithms. And they try to find patterns in large amounts of data and by data we mean anything that can be digitally stored, such as images words reviews or comments. And those you can feed into your machine learning model. You will find them used a lot in many of the services that we use today such as Netflix which seems to know you very well, or YouTube or the search engines auto complete features, social media feeds, some voice assistance voice recognition such such as Siri and Alexa. So they're used in many of the things that we already interact with in our daily lives. I like this definition of machine learning from IBM. And it basically states that machine learning models are built through the data. So instead of programming every possible case like we used to do, you know, each and every possible scenario using if else statements nested if statements for loops and while loops, you would instead feed your machine learning to very large amounts of data that describe the problem, or your problem space, and hopefully covering all possible situations, and then you let your computer program discover those patterns and those features on its own from the data and this is what we call training so the model would be trained on the data. So think of machine learning as sort of this mathematical process that would try to find a statistical function that can fit or explain relationships between large amounts of theta. So once we've trained that model, we can start to let it out in the wild and let it predict the output or the solution for new data. So we might train it for example on the features of houses and their prices, very large amounts of data, and then it would discover, you know, the features that make a house of a certain price, and then you would feed it those features without the prices that will predict those prices for those homes. For example, so that what a machine learning in a very quick nutshell might be. So machine learning also comes in different types. And, for example, they could be supervised learning unsupervised learning or reinforcement learning, depending on how they learn, or the amount of information in the data that they're trained on. So that what distinguishes those three types of learning so supervised machine learning, basically feeds on large amounts of labeled data, that is you are feeding your model with data that is already labeled with the right classification or the correct solution so to speak, usually done by human annotator, and then it will be able to predict the labels or the classifications of new data that is not labeled that supervised learning. So an example for that would be taking the database of many dog and cat images that are already pre labeled as dogs and cats, and then feeding it an image of a dog that is not labeled and it will be able to tell you that's a dog unsupervised learning on the other hand, you feed it to very large amounts of data except that it does not have this label to come with it. You're not capable of doing that or you don't know enough about the problem to be able to tell what each one is. And so you let your machine learning model discover on its own the form or as a structure of the data and that usually happens when you don't quite have a precise research question, or you're doing more exploratory research than other supervised approaches. So in this case, you would let it train on large amounts of pictures of cats and dogs that are not labeled. And on its own it will be able to tell that you know that's a dog or that's a cat through different features in both of them. And then it will be able to predict as well. So, the last one is reinforcement learning, and in reinforcement learning, it sort of doesn't quite learn from data that you feed into it as much as it learns from its action. So it's learned through feedback that it gets from the environment around it, especially with tasks that require sort of tacit knowledge, things that you can't really describe or label such as riding a bike and so you will find it used with robots, robots, for example, you would give it an objective or a task, and you would ask it to fulfill or reach that goal. And then if it takes the right steps you would reinforce that with a reward if it takes a wrong step you would give it a punishment. At one point we used to call that remorse for some reason, and then it would adjust itself until it learns that particular task. So that's three types of the most well known types of machine learning. Now I'm going to focus on supervised machine learning that's the one type of machine learning that I've used in my model, and in supervised machine like any machine learning, it feeds on a lot of label data. So, if you imagine like a plain coordinate system is trying to fit a mathematical function like we said to this data, and by fitting it means inventing or finding a function, and that function would eventually be your model or your algorithm that you will use to make predictions on that. So, this diagram describes well how supervised machine learning works, you would feed your model large amounts of data that are pre labeled so we already know that this is a square this is a triangle, and then it will be able to identify the feature that would distinguish a hexagon from a square. It will start to notice that you know squares do have you know for right angles for example as opposed to the three angles of a triangle, and then it will learn those features. Then at a later stage, you would feed it new data that doesn't have those labels and it will be able to tell right away based on what it has learned the that this is a square and that this is triangle. This is an example of supervised machine learning. So it sounds easy enough right I mean you would give the machine learning the world, and it will produce it back to you and you can just predict stuff but it's not that easy. And it's not that straightforward and it does have a lot of pitfalls baked into it. It's not as good as the data that you feed it. It can be quite hard to find a data that is representative and accurately labeled in order for you not to get inaccurate predictions, for example. If you're training your model to predict whether a hotel review is positive or negative, you need to train your model on a large enough data set of hotel reviews. That that is descriptive of this kind of space you will not feed it news comments for example you will not train it on news comments you'll be training it on this exact same problem. And your hotel reviews no need to be representative enough large enough that they cover all possible negative reviews and all positive, all possible positive reviews, you know the snarkly ones and the hidden negativity and all that should be covered in your data, in order for you to have an accurate model. And it's not only about the data. I mean, but bias can be introduced in so many ways but I'm not going to get into the question of bias now, but one ways in which you can have a not very good model is if you don't optimize your model well. We need to have good data but we also need to have and build loss and cost optimization functions into your model that can produce good accuracy and by accuracy we mean that the percentage of correct predictions. And for for the test data is high enough, usually above something from 85 to 19 it depends of course on the problem the level of accuracy that you would want to have. So basically when you take a larger big data set, you would divide it into either a training data set. And then a test data set the training data set is usually around 70% or something of your data and then your test data set might be 30% or even less. And because we have this test data set which is labeled, we can actually measure the accuracy of our model because we already have the true labels that we can compare our prediction to, and we can calculate our accuracy. We should go for trying to have as much accuracy as possible using optimization parameters. And there are so many, but some of them for example is the learning great, you can make it fast or slow the learning algorithms that you're using. One of the main problems that you may run into is overfitting, for example that your model may learn so much the training data that it actually spits it out, and it only learns that training data and unable to recognize other data and so you should try to avoid that through those optimization functions that I'm talking about by trying to improve your model and not making it overly complex. I think this be useful to humanities and social sciences in specific. Usually digital humanities tries to draw insights from large corpora of texts. We do that by trying to first pre process and clean huge amounts of data and try to write very complicated rule based textual analysis of these documents. But in machine learning, on the other hand, you would take a different approach, you would feature algorithms as large amounts of data and let it figure out on its own, the distinguishing features of that particular corpora of texts for the sake of either forming a classification or prediction so you may want for example to identify or isolate stylistic or content features of authors and characters by their gender, race and nationality in a collection of work so a machine learning model would be able to tell you that it identified this style and it identified that different style and then you might be able to correlate that with those different variables. It can be used for the classifications of topic genre or format, and I've seen it also used with journalistic forms and genres for example, how they changed over time. I've also seen it used in sentiment analysis if we want to tell how the sentiment changed in a particular corpora of text over time, then this is something that we can use it for as well. So I want to turn your attention to my particular case study, in which I used machine learning to content analyze some of the comments that I received on my social science experiment. In order to be able to answer a primarily social science question which is this humor reduce online toxicity. My interest in humor started back when I was studying the Arab Spring. And specifically I was examining the potential of humor in overcoming political and social oppression, advancing agendas for social change, especially at times of polarization and state repression in authoritarian context. So those are pictures and of some of the people I interviewed or some of the means and remixes that they created so it was primarily a qualitative study. But I also wanted to test the assumptions that I was making about humor quantitatively using an experiment, and specifically using machine learning to analyze this specific question that I had. And at the time I was doing that research I didn't quite think it applicable to the US, because I was studying studying authoritarian context, but then it was 2016. And there was, you know, an inflammatory presidential rhetoric, emotions were very high people were increasingly becoming polarized. And so I wondered about if we can if humor can be used in any way to advance conversation because people stop talking to one another. And so I investigated this. And I also want to point out before I continue that this is a collaboration with Professor Andrea Hollings head from the Annenberg School for Communication and Journalism. She is trained in social psychology and group interaction so her input was very helpful in me designing this particular experiment. And the particular setup of the experiment was motivated by the fact that I was scrolling the internet and I came across this article written by a Muslim writer, and it was written in a humorous tone. I have this tendency to read the comments before I finish the article so I scroll down, and expecting to find the heart wrenching comments that will dehumanize the writer connecting to terrorism and all that but to my surprise. Luckily, I didn't find that and instead I found that the comments were quite civil. I started to wonder if humor had this disarming effect on people that it might have reduced, you know, their tendency to be uncivil. And so I put that to the test, and I start to examine if humor may have an effect on online toxicity. And online toxicity as anything that is rude, disrespect for unreasonable that will make someone want to leave a conversation it's a definition by Borken Dixon and so on son, and I find it very helpful because it captures the, the politeness side of instability, as well as the democratic feature that we want to have in any conversation to be inviting for everyone. So I start with the research questions, can humor reduce online toxicity. And if it does how does it humor, how does humor reduce online toxicity what might the role of anger and liking be in such a process. People's toxicity would reduce the effect of humor on toxicity because there has been studies that have shown that you know if a message is communicated in an uncivil manner. Then it will reduce the importance of that message or how people rated serve informational value. Otherwise I ask I mean if the message is associated with on civility from other people with that in any way reduce the effect of humor on toxicity. And so our first hypothesis that we started off with what the use of humor will be positively associated with reduced comment toxicity. And, but I wasn't only interested in the effect of humor direct effect of humor on toxicity. I was also interested in how that worked. What are the enablers of that relationship and so we found clues in the relief theory of humor which postulated the user to humor creates a state of counter arousal that is, you know, you're not likely to be angered that can release tensions increase liking and enhance source attraction, which led to our second and third hypothesis that the use of humor will be positively associated with reduced anger in other words when someone gives you a joke you're less likely to be angry at them. This reduction of anger will mediate their relationship between humor and comment toxicity. In other words, you're less angry at someone who gave you a joke you're less likely to be toxic towards them. And then we ask if source liking plays a role in any of that if you liking the person who gave the joke actually helps with reducing your anger towards them and thus reducing toxicity towards them. And then we finally explored what happens if we throw in civility into the mix because you know comments don't happen in a vacuum eventually you read other people's comments before you comment yourself. So what happens when you are exposed to other people's in civility would you likely to be more uncivil yourself with the effect of humor go away or not. I also want to point out that this was, you know, a two part experiment so this experiment that I'm going to show you now is just a replication of an earlier experiment which replicated all the findings so I'm going to just relate to you this experiment. We designed a two by three between subject experiment with two levels for humor humor and no humor, and three levels of social influence that is exposure to other people's in civility so either civil, uncivil and supportive to the writer, uncivil and unsupportive to the writer so we ended up with six conditions like this and people could be randomly assigned to any one of those conditions. For example, here they would read an article that is humorous and a comment underneath it is civil. Here they would read the same article that was no humor in it would take the humor instances out, and they read a toxic but supportive comment on it. So that comment can be toxic but supportive to the argument of the writer or toxic but unsupportive to the argument of the writer and we wanted to tease those. There's a difference between those two. And we recruited 208 participants from Amazon Mechanical Turk, who read an article with a comment underneath that was either civil or toxic, and they themselves wrote a public comment on the article, and then they responded to a survey about their feeling of feelings of anger and feelings of liking. This is what the gender breakdown looked like as you can see here. It was slightly skewed towards males which is can be reflective of the Amazon Mechanical Turk population. The political ideology was slightly was somewhat balanced between conservative and liberal. We invited participants to read and comment into articles the first article was a distractor article it was about technology it was a non humorous article non controversial article, just to distract the participants from the kind of experiment that we're conducted and then the second article which was our stimulus article is this one it's an article by playwright for jihad Ali and he's a Muslim playwright. And he, he was talking about an incident and unfortunate incident in which a Muslim student was taken off a flight for talking on the phone and saying the word inshallah or God willing. So, he used this opportunity to humorously introduce the meaning, the Arabic meaning of the word inshallah so it was a very humorous article it was a very funny article that's talking about a very serious incident. And then once people read the article, they were asked if they can comment on the article after reading a comment underneath it. They were told that their comments might be seen by other people in the future to mimic the public nature of commenting in general, and their comments were anonymous because anonymity tends to be conducive to in civility. So here's an example of our humor manipulation you can see here, this is the humor condition there there the joke is, but in the no humor condition we just take the joke instances out but it remains to be of the same informational value just no humor. So here comes the machine learning part. We conducted the experiment people read the articles and gave us their comments, we were left with the problem of how to code the comments how to code the toxicity of the comments, and how to decide on that. In the earlier iteration, we had human raiders go about deciding the toxicity of the online comments so we had developed a very complicated criteria about what toxicity is and what toxicity isn't. And we went about coding the comments by hand ourselves, and we found this to be a very subjective process, a very inconsistent process because we couldn't quite agree on what toxicity is. And so when you have raiders who are from different backgrounds, especially when you're talking about toxicity directed at a minority group. And so, you know, someone may consider a stereotype as toxic but another one from a majority might consider that well that's fine that's okay. We couldn't quite agree. And so we decided to use machine learning as a more objective and systematic way of deciding the toxicity of the online comments. And we use the specific kind of machine learning which is called sentiment analysis and in sentiment analysis what we try to do is that we predict the polarity of the text is either positive or negative you find that used a lot in online reviews in the analysis of tweets, the toxicity of tweets and specific in the analysis of print media coverage, for example, it's used quite a lot. And it's also machine learning model that needs to be fed on large amounts of data, until until it can predict new the values for new data. So it comes in two types. The first type is a classification model, that is, you would give it an input and it would decide whether it belongs to one or more classes, or it's a regression model, in which case you give it the data. And it would predict a value for that data so you give it features of a house and it will predict for example the price, expected price of that house. And for our specific case we utilized a regression sentiment analysis model because we wanted to know the degree of toxicity in the comments not only whether they were toxic, or not toxic. So, we were lucky enough to find a great training data set which is called a toxic comments data set. And it's provided by the civil comments platform. It has an interesting story. They went out of business at one point, and they decided to make their data set of over 2 million comments available for researchers of instability so they can continue the research on instability. And so we take that data set and we downloaded and upon inspection we found that it was quite unbalanced so you had fewer toxic comments compared to civil comments and that's the nature of it there are a few toxic comments but just one is enough. So, we decide to under sample the civil comments so that we have an equal amount of both toxic and civil comments and we train our model on this sort of balance data set. So we ended up with around 300,000 total comments. This is what it looked like so you can see here the comments. Some of them civil some of them not civil the target here represents the degree of toxicity or the rating of toxicity. Apparently here a bunch of losers is highly toxic. And because this is textual you could put it in an Excel sheet. This is a word cloud of the toxic comments data set. We are the size of the word represents how much is how much it is represented in the data the bigger the word the more it is represented. So you can see some of the words that were using the toxic comments data set. And then we built our toxicity prediction model using a feed forward deep neural network. We developed using curious and Google TensorFlow which are quite high level libraries that are available in Python. And they have great tutorials that you can refer to in order to learn how to use them was very well documented. So, what you would do is that you would first create embeddings for the toxic comments data set so as you've seen here the toxic comments data set has a lot of text. But because we're feeding it to a machine learning model or computer algorithm, you need to transform that text into numerical representations that it can deal with. So embeddings are basically numerical representations of the data that carry semantic relationships in their encodings. And hence words with similar meetings if you envision that your data is on a plane like that words with similar meaning would be closer in that plane to one another than words that have different meanings for example. And as I said, we created a regression model that would predict the degree of toxicity, and we ran our model on a high performing cluster was 48 nodes. So, just you know to quickly go over the findings. Now that we had this column in our data set in our data sheet was the toxicity of each and every common, we were able to actually test our statistical relationships so we found that the use of humor was equally associated with reduced common toxicity as we predicted. So you can see here the means for of toxicity, the average of toxicity in people of in the humor group or less than the average of toxicity in people of reading a non humorous article. The average of anger in people who were who read a humorous article was less than the people in the non humorous condition. And so the use of humor was positively associated with reduced anger. And this reduction of anger that mediate fully mediate the relationship between humor and common toxicity. And we found also that source liking mediated the relationship between humor and anger in other words are in a nutshell, what our findings are that when you're exposed to humor, you're less likely to be angry at the person in front of you, you're more likely to like them, and hence you're less likely to be toxic towards them. So in regards to the incivility findings. We actually did not find any significant effect of exposure to toxicity on common toxicity which is surprising. You would think that exposure to incivility would make you on civil, but we didn't find this effect in our experiment and we, we did have some speculations as to why this happened, but that's not the topic of this talk. One of the implications of our findings is that we need to think about what we are trying to communicate and in what context do we understand those findings so remember when we mentioned that the article that we're speaking about was about a Muslim man being subjected to a racist incident. And we indicating that people should use humor in the face of other racism. Our article was just a stimulus article to try to understand how humor works and how it might affect online toxicity, possibly giving guidance on how to deflect general situational incivility. So we need questions of other racism and systemic racism we need far more systemic solutions than this so that needs to be clarified. The other thing it's pointing us to is the role of emotions in battling toxic behavior online so you will find that a lot of social media websites are battling daily toxicity. They're doing their best really in trying to do that, but the focus is so much on the logical channels on misinformation, battling misinformation with facts. And it doesn't leave a lot of space for people to think about, what about the role of emotions, what about the emotional channels that we might want to tap into if we are tackling a primarily emotional phenomenon. And finally, I also want to point out that the specific kind of humor that's used in the article might have produced that effect so that a writer for example was using self deprecating humor. I'm pretty confident that if we use another type of humor we might get another result but this is something of course to be tested with other research studies. So this leads me to the second part of this presentation, which is the problems that you would find in machine learning models so like any machine learning model or machine learning model did have a problem with bias. It's called the problem of unintended bias. Because of the co occurrence of certain identities with toxicity, a lot in the toxic comments, the model wrongfully learns to associate just the mention of these identities with toxicity. This creates the problem of unintended bias. So just when it sees the mention of any minority identity, for example, Muslim or Asian, it would just consider it as toxic, without even if it was mentioned in a positive context. There are ways to battle this kind of unintended bias. One way is through the data that is by showing the model enough examples of both abusive and non abusive examples of the use of this identity so that's one way to go about it. So using supplemental data sets. And that are more representative. And another way is through the algorithm itself by modifying the objective function. So generally when people test their machine learning models they would get a 90% accuracy and they would say okay, that's a great machine learning model. But the problem is it's performing that well on the majority of people what about specific minorities are not well represented enough in the data so you need also need to test the accuracy for the specific minority groups inside your machine learning model to make sure that you're also accurate in identifying or classifying comments that are specifically related to these groups, for example, but I want to zoom out a little bit and from talking about the specific problems of that machine learning model to talking in general about some of the things that troubled people in machine learning so one of those things is the fact that machine learning is a black box. The model may identify distinguishing features that we as humans may not quite understand doesn't make sense to us as humans. And at the end we can only judge the models validity from its output, which is fine I mean if we're talking about specific research findings that won't touch a lot of people's lives, but it gets tricky. When these models are making decisions that affect people's life such as giving them a loan or accepting them into a job, or even releasing them from prison as opposed to keeping them incarcerated. So this is where we not only need accurate predictions, but where we also need clear explanations about why those decisions were made. You need to explain why you didn't give me the loan or why you kept me behind bars. And, and that's why people there is a movement towards making machine learning explainable interpretable. And more importantly, we also need some degree of human autonomy or oversight that can utilize the model, but doesn't completely rely on it. So one of some of the things that people have been doing recently is to try to have some explainable models, and that approach basically utilizes another model that tries to mimic or replicate the behavior of your black box. And this model is usually linear. And so we can interpret it, but the drawback is that it may produce the same output as a black box but it might have utilized completely different features in coming up with this output so it's still an approximation. One way is the interpretable models which are less complex machine learning models that may incorporate domain knowledge into your algorithm, but they may come with the expense of less accuracy as they may not cover or identify all features or cases. Perhaps that might be related with bias as well, but this is a long debate. People have very strong opinions about their accuracy versus interpretability debate. So I'll not get into the details of this but it's enough to say that many of these issues surrounding bias data sets and the need to explain machine learning decisions have driven companies. As you can see, to incorporate ethical AI practices, so that they can think ahead of how to curate representative data sets, accurately label them and make sure that our algorithms have tested their models accurately and all affected groups, so that the machine learning model would not be biased towards one group over the other. So that's an, I think that's a very useful area where AI and humanities can talk to one another, where humanities can be helpful in pointing out the possible areas of bias that may be lost on some of the developers of the machine learning models. So that's what I have for you today. I just wanted to point out some very helpful resources and doing that research. The CHPC the Center for High Performance Computing has been immensely helpful. As I said I learned Python to do this project so we offered a lot of workshops on Python. It offers workshops on using clusters so machine learning models do consume a lot of processing power. So if you don't want to kill your machine in the process of creating your model, you might as well use one of their clusters to train your model I'm going to show you I think we have time I'm going to show you how I did that. And also, just belonging to Digital Matters Lab, the reading group itself is very conducive for us to think about those questions in between discipline. It's also immensely helpful to me. So, I might as well. I just wanted to show you some of the tools that I found to be very helpful specifically CHPC. Okay, so to access the cluster. So I'm going to go to on demand that CHPC dot shooter dot edu. Can everyone see my. Alright, and usually you would log in. I had to actually request that they create an account for me on the cluster. And they graciously did that for me. Or you could request that from your PI. So once I had that I could. So that's the dashboard you would come here, and you would start an interactive session so right now, anything that I will create or any session that I will create will be over the cluster. So I could go to Jupiter notebook which is a coding application that you can use online. And what it's telling me here is that it's going to create a session for me on the cluster, and they have a very cool names for the cluster can speak not speak long peak. So I'm going to choose this one. I'm going to specify the number of course I just need one number of hours I'm going to be using it. And just say launch, and it will tell me you know you can connect to Jupiter right now so I have one opened right here. This is what it looked like your file system. So I can put my code here. I'm now on the cluster just couldn't paste my code here. And this is what it looks like. This is my code. This is how this is me reading the training data and the test data. This is me dividing it and doing some cleaning to the data. This is where I split the data set to training and testing. This is where I build my model. There's a lot of high level usage of libraries because Google and TensorFlow do provide you was a lot of very helpful functions. And then once I built my model which is composed of several layers, I just fit the model. This is me training my model on the data. I'm going to use it to predict. So see here I'm giving it my data, which is my comments, and I'm telling it please, based on what you've learned from the huge amounts of data, the toxic comments data said, can you predict the toxicity of my comments. So that's basically, you know, my Python code. And then I would go and open another shell access. So I'm going to have shell access to the cluster. Now I have my code. It's already stored. I just want to execute that code. So I would create a shell access to King speak, for example, cluster. Now this is the time where I usually forget my password. So all I have to do is navigate to where I stored my Python file and run shell script of that particular Python file. So I would have a shell script that would execute that particular Python file just run it. And once I did that, my model is running in the background, I can still use my laptop. This is the job that I just created is running on the cluster. I'm going to cancel it because I don't want to be using unnecessary resources. So I mean just so you can have an idea of how easily this can be done using CHPC cluster. So that's about what I have for you today. These are some of the selected references. I think I went over time a little bit. I'd be happy to share those selected references with you. And I'm happy to receive any of your questions. Thank you so much. Thank you so much. We really appreciate that. And if you could all join me in thanking our speaker for today with whether it's with a clap emoji or actual clapping. Thank you so much, Yamna. We still have about 10 minutes for questions, but just maybe while our speaker takes a quick breath, I do want to make an announcement. The deadline for digital matters fellowships is March 31. So you still have a couple weeks to put those together. It's, it's an easy application. It's a couple pages. We're eager to hear your ideas about digitally inflected projects in your field. If you have questions, please feel free to reach out to me at rebecca.comings at Utah.edu. But that is coming up March 31 and we have both graduate student and faculty fellowship opportunities. So I have at least one question I saw in the in the chat box at which we'll get to first and then max you'll be next. So we have a question here Yamna that says your comment about unintended unintended bias raises another question for me. How would you or others. Others involved in similar tech space machine learning account for irony and I assume things like sarcasm as well, which is difficult even for humans to understand without body language and vocal intonation. That's a difficult question I mean I think the main difficulty would be annotating the data set first would be difficult even like telling if this is IRA or not people. I disagree so you need to have a very accurate criteria. That's might be an impossible goal really like what is irony. But I mean yeah very having a very accurate annotation criteria with people who are diverse enough in their understanding as well. If we have an accurate data set that is representative as much as possible then we might be able to create a machine learning model that utilize I'd love to take part in that project, whenever you decide to go for it. And then max I think you had a question as well. Yeah. Well, as pretty good that was great. Thank you. The question was similarly about bias so you mentioned that you you found you discovered that your model had a bias. So maybe it would be helpful for you to explain like how you were able to see that your model had a bias was that because you sort of did some yours is sort of supervised in within the sentiment analysis and explain like how that actually appears in the data, like that there were negative words that were categorized. Yeah, so when I first downloaded the data set, it was actually advertised that it had this problem of unintended bias in it. But I sort of, you know, said maybe that maybe that bias won't show in my data because we're always optimistic about our research. But actually, I did find it. I mean, and it happens and thank you for that question max because it leads me to something very important which is that we shouldn't blindly rely on the machine learning results as accurate and perfect, and we don't look at it. I had to qualitatively examine each and every comment to make sure and see it for myself that you know, oh yeah, the problem of unintended bias is showing in my data, it is an actual thing. So, they actually, one of the things that they're doing for the toxic comments data set is that they're providing the supplemental data set in which, you know, the minority identities are mentioned in a positive light so that you know we can alleviate that kind of bias and help the machine learning model not wrongfully say that you know if that's a comment about say a minority population then it's by default toxic for example. Thank you. Do we have any other questions. Also for you to talk a little bit more about the humor side of it. Yeah, I'm not because one question I had at which you answered was that this was a particular kind of humor. You mentioned that was like self a facing humor. And there's also I would say like satirical in some way to. So, but obviously we're familiar that we live in an era of like infotainment where humor is part parcel of what seems to be sort of like accelerating a kind of in civility. So, how have you thought at all about like the follow ups and how you might structure or think about sussing out the differences in how humor affects some of this the impact it has on on civility. Definitely came to mind because the particular type of humor that was used as self deprecating humor. And it's historically it has been used with religious minorities in specific, and I was wondering if there is you know this. This interaction between your identity and the type of humor that you use and how it might affect how people receive it, and who the people who are receiving it belong to do they belong to a minority group as well or do they belong to majority maybe a majority will only accept self deprecating humor from you. Right. I was curious as to something like satire for example because it can be sometimes or sarcasm and specific. It can be inferior infuriating for some people. Right. I think it was, there was research that it may even work differently between different political ideologies some conservatives maybe more receptive or less receptive to humor than Democrats you know it differs as to how each one of them receives humor or can learn or receive information through humor. So, yeah, that's, that's quite an interesting question and something that I certainly want to follow up on in the future. We have another question. Oh, sorry, I'm not. Yeah, yeah, Rebecca, I think I was going to mention that. I'm not sure you saw the question in chat. Is it the resources. Yes. The CHPC actually had great resources they, you know they had those sort of workshop lectures that there are hands on that you can just take the code and try it out even during the workshop. There is have it next to me here. In this book, it's called hands on machine learning with psychic learn and curious intensive for specific for machine learning. But by some really, I mean, I've learned enough to be able to develop that model. I actually don't have quite a comprehensive knowledge of Python so you learn just enough to get you started on your project and and I think the workshop of CHPC have been great in that. So many resources there. I don't think people use books anymore. Sorry. Medium articles, I think I have time for one more question anyone has a question for Yama. I just wanted to say thank you so much that was really, really helpful and I just appreciated your breakdown of it and it's not really a question but maybe a suggestion would be once we're back to some kind of more fluid schedule we could have a workshop. That would be really, really cool. I love that thank you so much Elizabeth. Yes, that's a great idea. I think there are other people that would be interested in the workshop as well and I would definitely sign up for that too. For anyone here who was a student there is a wonderful class on campus also called programming for everyone that David Johnson teaches in Python and Anna and I audited that class and it was really accessible and useful for this kind of work. Okay, well I think with only one minute to go we should thank our presenter again for coming and speaking today it was a wonderful presentation, and I appreciate everyone that came today. And we have research talks coming up on April 12 if you want to hear what our faculty and graduate student fellows have been up to this semester and please do if you're considering applying for the fellowships for fall. The deadline for that is March 31. Okay. Thank you so much. Thank you for inviting me and for listening to my talk. Thank you.