 Good morning. Good afternoon. Good evening and welcome to the very special mini series that we are doing here with the call for code for racial justice team. I'm Chris short host for the most of red hat live streaming and I am joined by some of my friends over at IBM. To talk about the project take two. Demi, please introduce the topic of discussion today, please. Thank you so much. Thanks again for having us here again, Chris. Always a pleasure having or being here on your show. So today I have the take two team here with me as well and we're going to get a little bit into what take two is pretty soon. Before we get started, I just want to give a, you know, a quick intro to anyone who's not familiar with call for code for racial justice. If you've not been tuning into this OpenShift TV series, just want to give you just like an overview of what call for code for racial justice is, and what call for code is. So call for code for racial justice is a component of call for code. And we'll get into that shortly. Just again, just our speakers who are going to be speaking today, Joanna, Naoki, Anna and Nagma will be talking pretty soon. Just after I give this intro on call for code. And we have a few additional members team members here as well. So many people have been participating in this project. From the beginning and there are some of them are still currently working on it and this list is continuing to grow. So if you see us in a few months, the list will be even longer than this. We have lots of people joining the community and participating as you as you'll see, we also have a lot of different roles that are available for people to contribute to as part of take two as well. So getting back to call for code call for code is essentially one of the world's largest tech for good initiative of its kind so essentially we have a global challenge every year where developers and non developers can come together and design projects that are solving specific social issues. So we've done a lot of different topics. So we've done natural disasters, climate change, COVID-19. And in 2020 we had a spot challenge for racial justice. And out of the spot challenge racial justice emerge call for code for racial justice. And now we have seven offerings that are related to this so call for code for racial justice. Emerge in 2020 as you might know, there are a lot of things happening in the US specifically around race relations and there was a bit of a like a social and racial reckoning. In regards to the killings of specific unarmed black citizens in the United States so IBMers within who are interested in doing something about this issue. They came together and they decided to do a spot challenge around a call for code so essentially they wanted to do something with them they could feel like they were giving back and contributing to eliminating systemic racism. So, out of that emerge these three pillars, police and judicial reform and accountability, diverse representation and policy and legislation reform. So during that time, we were able to create solutions and create projects that aim to address each of these issues. And actually, funny enough, I had the pleasure of working with diverse representation at the time. I was supporting that pillar so I worked with another product manager Ian, who was leading take take two at the time, and a few other projects so. And we'll talk a little bit more about take two shortly but these were the three main pillars that we had as a part of call for code for racial justice. And just a timeline like it's only been a little over a year at this point, since the launch of call for racial justice so was launched last year in Juneteenth in June of 2020 with a spot challenge. We had a lot of different participants who were involved. 500 volunteers globally we had distinguished engineers master inventors. As I mentioned we created seven open source solutions. And now we have a backlog of about 20 solution solution building blocks that can be used for creating other open source projects. And then we launched call for code for racial justice and externalize five of these solutions with the Linux foundation. And we're able to externalize an additional two of them of in this year, and we have a really wide variety of things within our tech stack so if you're interested in Python, you're a front end developer back end developer the different ways for you to get involved with our solutions here today and today as I mentioned before we're talking about take two. So I'll let the team take it away to tell you a little bit more about their work. Thanks, Demi. So I'll take us off. So, like Demi mentioned, when we were presented with the call for code for racial justice challenge. The problem statement that our team wanted to focus on was that bias can be learned and perpetuated in different ways, not through societal beliefs and misrepresentation ignorance. And ultimately this can lead to inequitable outcomes and a lot of times what happens is that this is something that's unconscious that people don't even realize that they are doing or perpetuating. And so we thought that this would be an interesting problem to tackle and look into deeper and see what we could do to help. If you go to the next slide, Demi. So, in particularly, we wanted to take a look at language right so I love this statement by our very own Dale Davis Jones and Humphrey from IBM, that words really they shape our worldview, how we regard others, and how we make others feel and so language is just such a powerful part of society, culture, everything that we do. And so we saw that there was this opportunity that we as a team could potentially address this, the issues that could be perpetuated via language, racially bias and mitigate it. And so, after the murder of George Floyd, you know there was a lot of scrutiny that was being said in the media, how it was being said, why there was a racial bias persistent in it, and there's lots of educational opportunities. And, you know, people were just really sharing this wealth of knowledge and background as to where the stereotypes were kind of being perpetuated, and, you know, pointing out social media posts news articles. And if you think about it, we're really all exposed to media news, fake news on an hourly, like basis. And so it's inherent in everything that we read and we want to be able to mitigate the perpetuation of these hurtful ideas of biases. And our team, we actually came together because we were often humbled by those who are sharing the background and the use of certain words and phrases and we thought I need to even educate ourselves and bring awareness to question our own biases in any content that we've created or or consumed. And so, you know, all the headlines that you're seeing kind of highlight that other companies and organizations that people were all kind of thinking about this. And so, some of you may be aware of these, these movements that have been happening for the removal of racially bias or insensitive language from different products, services, etc. And so, you know, we think it's just so powerful to understand the background and the words of the phrases that we choose to say, as well as continue to say. And so we really operated under the premise that awareness is key, and that we believe that bringing awareness to one language is racially biased and why would help mitigate content, content that is being created from people just not realizing or knowing that what they're saying is consistent racial bias, and therefore it can minimize bias that's perpetuating via the content that they're creating, and then ultimately reduce the offensive and harmful content that's out there. Yeah, so did you guys notice any or we've seen any work that talks about maybe the other different strata of types of biases, like in terms of, you know, for instance, like what someone says in social media versus a news publication just in terms of the effect that it might have, or in the context or the construct of that, or is there anything that you guys know about how that might differ from different in different contexts essentially. So, from what at least what I think I noticed was that I felt like in social media, there was a lot of like people emulating things that they thought were funny, like just out of like humor. And I think they didn't, they were, they thought it was lighthearted and it wasn't really offensive, but people were taking offense versus I think news articles were more so like, I don't know if they recognize that the way that they were describing different populations was, was just different and harmful for like people who are in the black population versus like how they were describing crime, for example. I felt like in social media that it was more coming from like a humorous standpoint versus like in the media, I don't know like for some reason, just the way that they were describing was, was not humorous but just different. I don't know if anyone else has anything to say on that. Comment briefly on that. So I think it would be an interesting thing, you know, to consider a machine learning part of the solution, if it can take advantage of that sort of, you know, broad contextual information to boost the accuracy of bias detection. But maybe I'll say more about that. Okay, cool. Looking forward to your section. All right. So moving on. Okay, so we initially began with the idea to build a tool that would detect bias and content. But we quickly realized that at the time at least there was no publicly available data set for us to leverage. And so, you know, this kind of made sense. Language is incredibly subjective. And while it is more straightforward to identify some overt forms of racial bias, it's a lot harder to capture the more subtle form. So kind of speak to what we just talked about within like the different contexts like social media versus like news articles, gaming chat rooms. And so, you know, this is really where our Chrome extension, our browser extension came into play. We wanted to see if there could be a way to maybe crowdsource this data via trusted contributors. And so we wanted to our goal was to really provide a trusted community of members with a tool to flag content that they were seeing consisted of racial bias and provide a mechanism for them to explain why something was racial bias racially consisted of racial bias. And so this would feed into our machine learning model, which Naomi will go into further detail on. And ultimately would serve as a data to help our machine learning model classify when phrases or expressions within certain context, maybe racially bias. And ultimately, this would then feed into our API, which Johanna will go into a little bit more detail that we wanted users to be able to leverage to check their content to see if it consists of racial bias and therefore be able to feel empowered to know and then make a decision as to whether they wanted to continue using that language or maybe there was a different way to get across the point that they were trying to make. And to be clear, we understand that this is an extremely complicated task that's dependent on context, you know, we don't really know what the ground truth is here. But we do think that we have a great starting point, at least for these different concepts and ideas. And so that's kind of that's kind of an overview of our solution. Very cool. Yeah, you said you had a Chrome extension with trusted people using it. How did you make that happen. Right. Like, are you going to get into that. If I'm asking too early just tell me. No, I can I can definitely, or, you know, others please jump in but so something actually something that we came across was we wanted to balance giving power to the community to flag content while also recognizing that there are bad actors and that that we need to make sure that they're credible people marking this and that people have the correct or, you know, the background to be able to understand when something is racially biased and so that's that's the reason why we're calling out trusted leaders. And so we are still kind of in the process of identifying what that looks like or who that is and our small team of original challenge. You know, when we were just started off it was a challenge that we're not sure if we're even the people to make that determination. We kind of want to talk to experts to help facilitate that and understand what that looks like but that's just a little bit of background on that. Thank you. So I will pass it over now to Johanna to talk a little bit about our solution architecture. Thanks. So hopefully everyone can hear me. So my name is Joe and I worked as a developer on this really fun project. So the solution at the moment and it's it's definitely not the final the final solution. But it's basically built up of three components and our repos are split into three to match the components that go along with this architecture. Just now I mentioned one of the issues we found at the beginning was not having a data set that that contain that more subtle form of racial bias. So it's very easy to find toxic language data sets or racial slur data sets they do exist. And we can bring those into our solution. And we have to use those to sort of for training and things like that training our models. What we found more difficult to find was things like. And like when I was talking about news articles that might be more heavily making it look like young black males are more heavily engaged in crime than the rest of the population. It's more difficult to to identify and there's not that many data sets that that that kind of contain that information. And so we so we decided okay let's create something so that someone could privately if they're browsing because people were already doing it on Instagram on Twitter they were already picking out a piece of content from someone's website or from the news and saying hey you know this is not okay. And this is why so we thought if we could give them the mechanism where they can actually do that via a private tool so it's it's not something that anyone else can see it's just privately highlighted. They go into their browser they can be on any any web page they want they highlight the word or a whole sentence or paragraph whatever it is. And that data is then via the API posted into a database a backend database now in our github kind of get up and running example that databases a couch DB instance. The overall plan for the project long term is for the community to own the data set via some sort of organization who are kind of impartial. We obviously someone who's kind of neutral ground for something like set you know to to kind of because it it would it would basically be like this is the definition of racial bias. Some people might interpret it that way so it needs to be kind of owned by the community. API then calls out also to them the machine learning model. So if someone so basically at the moment we have just a very simple text editor which we'd also like to turn into a Chrome extension. At the moment what happens is on the client side they can actually type content so if they're thinking of writing an article or posting something social media they can type what they want in the text editor. It makes a call out to our machine learning model which then will highlight any parts of that text or sentence that it feels might contain what doesn't feel that it finds might contain racial bias. And then it will actually explain why it thinks that it's very clever machine learning model why it's suggesting that you might want to reword. We've also talked about offering alternative words and two things that people are using and this could be anything from social media posts all the way to. To code that someone's creating online. So I'll just share my screen and then I'll show very quickly how it works. Hopefully it hasn't logged me out. So just to recap we've got three GitHub repos this is the repo for the API. And in here you'll find everything the information about what what we collect from the Chrome extension so. The categories that we define so I'll go into that when I show it and then a getting started guide on how to get up and running. And today it is not running on OpenShift but we have run it on OpenShift. When we were building the solution as part of the code this is just running on my on my normal machine but you can run it in a in a cloud native environment. We've got the. So this is the web API. We've also got the data science repo. And then we've got a repo for the Chrome extension. So the way it works is we've got this API running. When when you get the solution up and running and there's a number of different endpoints and they're used for various things and that's all explained in the docs but basically. You can get the marks that are in the database. You can post to the database which is what happens when if you post the mark that's what happens when someone selects and text in the Chrome extension. What I'll do is I'll just show the Chrome extension. So in order to be impartial at this moment I've just got an article about lettuce because I. I don't want to say what I think is racially biased or not. But basically the user can toggle on the cursor once they're logged in. And this will allow them to choose any words or phrases. So if I thought addition was a racially biased term. Oh, I've done that wrong. It's a bit slow. Just reload it's gone a bit slow. I'm sure you streaming video has nothing to do with it. Okay, let's just pick another one. Okay, and then your user once they've highlighted they'll be presented with six categories which is our initial idea was to have a category of racial bias we've also discussed just having a kind of Boolean so yes or no it does or it doesn't. We're still kind of, I'd say figuring this one out. But essentially the user can select. They can also. Again it might just be really slow, but they can also change a category if they want so if I had selected stereotyping I can change it. I can also delete a mark which will, which will basically and it calls all those endpoints that I showed earlier in the API. What then happens is so that's basically the Chrome extension and you can see that it needs some UI design work because it's extremely basic and actually what was quite cool about this is we use an already existing just general highlighter tool that existed for browsers. And we just took the, it was just like a highlighter tool that highlights stuff in yellow, yellow pen, and we use that open source project, and then kind of made it our own in order to suit this project which was a really nice way of using something that's already out there in the source world and kind of modifying it to our own kind of needs. So, before you leave that page, I'll just carry this a little bit more about the Chrome extension. So, I guess it's kind of disappeared but it has something about copying what happens when you do that. Copy all. Yeah, I guess remove all. What is copy? Yeah, so I think copy will take all the all the text that you've highlighted and you can probably paste it somewhere else like in a text editor if you wanted to something like that and removal will just remove all of your. All of your marks that you've made. There's lots of work to do on this Chrome extension. So, I think that there's a lot if you look in the Chrome extension repo you'll find a whole host of issues that some of us have raised that need to be worked on. Nice so I'm curious here is it like for centuries and human the highlights is it tell you what the racial biases are just told you that's highlighted. So at the moment it just tells you what's highlighted. Obviously because this is this is the side where people are capturing that data so it wouldn't know what the racial bias is yet. And we want to add like a space where they can not only highlight but then fill in some information about why they've highlighted it, and that information could then be used by the machine learning model as well. That's all stuff that we kind of haven't got around to. Okay, but it's a good idea. So, on the other side of the kind of API is is our text editor. And for now I don't have it hooked up to a machine learning model I just have it hooked up straight to the database I'm using but the general idea is I type in a sentence in the text editor. And it makes a call out to the machine learning model and it would say, you know, you've got this word, and we found out that we think it's a racial slur. And here's, here's the definition of what that is, and here's any additional information which might be things like you could, you could use this word instead. And that is pretty much it. I've obviously got my database running in the background but this is sort of not that interesting because like I said, we really want this to be a community own data set, and not something that one person just spins up on their own. There could be scenarios where companies want to use their own data for a specific reason so I know IBM, they have a set list of words that they would like that people should not use. We have done a project with an IBM where where before they post content onto their IBM developer website, they use our tool to scan the content of the of the new of the kind of article, and flag any words that were not appropriate to use, and then send a message back to the developer who wrote it and so you need to, you need to maybe reread the content and maybe change some of these things. So, so we have, we have, we have deployed it within IBM on like a trial project. And yeah, that's about it. I don't know if there's anything else. I think there's one more slide that I probably need to talk to you after before no key. Which is about data authentication so that obviously no key will go into data authentication from a machine learning perspective but there's some things that we've identified that we would potentially do to mitigate malicious actors or yeah or unreliable data. One thing I forgot to say is that the data is the user is obviously authenticated before they use the Chrome extension. They're up using one of IBM, the IBM app ID service. Initially we had the idea that we would ideally want anyone to be able to contribute but I think the way it's moving now I think it's probably was the wrong, the wrong choice. However, there could still be bad actors, even if you've gone through a very rigorous process. So, and also we want to kind of build up some consensus around the data that we get. So we've, we haven't implemented this yet but these are kind of some mock-ups of the kind of thing we want to do and this is very typical of something like what Google Translate does just asking the user while they're highlighting text, and then pop up to say, you know, click on the words that you feel like are racial slurs so that maybe this data is already in the database but we just want to verify it and try and build up that credibility. Same for maybe serving them a sentence and saying, you know, which, which category of these do you feel is most appropriate just so we can kind of build that that verification into into the process. So it could help with that, like many things in in in UI in the UI space. And so I will now hand over to Noki who will talk more about machine learning. Awesome. Okay, thanks, Joe. Next slide please. Yeah, so this is sort of a one slide depiction of our roadmap, if you will, or kind of vision, cautiously optimistic vision of where we want to go in the machine learning component. There's essentially a list of different versions of machine learning component of with with increasing sophistication, I guess, and starting from the left version zero will be sort of a normal machine learning version where you do in the database or dictionary version one is is a classical version where you can train a classifier that can detect what sort of specials or maybe long, you know, expressions and sentences and paragraphs are biased or not from labeled data. Version two is is the most next, you know, sophistication of that where you leverage the cutting edge deep learning technology to do this in sequential context. And the version one has been implemented in the repo and there's ongoing work on version two. But the rest of the versions actually are not implemented yet. Because it's in the future tense. Version three, we call the AIX what explainable AI model is a variation of the previous two, where it not only can flag a whole, you know, paragraph or whatever as bias can also point to specific or expression within it that are responsible for that, you know, bias or not judgment. Version four is one that tries to address the markers credibility question that Joe was, you know, talking about another mechanism to address, where we try to learn the credibility of the markers in a crowdsourcing context, and and use that information to build better and better machine learning models. Version five is an extension of that where you use that credibility information to actually actively choose and seek labeled data from the markers that are more credible, and about the questions that they, you know, the model models, we want to know about. So that's just sort of the summary. And in the next couple of charts, I will go into a little more detail about two of these versions, version one, sorry, yeah, version one and version four. Next slide please. Yeah, so just a question here before we move on. Can you explain just for the general audience, what is meant by broad context versus sequential context. Yeah, so maybe I shouldn't say broad but it in context so in in Joe's demo, she happened to be picking, you know, words to label, but in general you could just label a whole context like a paragraph. So this is trying to basically that the version one is going to classify whole, you know, paragraphs, or whatever into bias versus non bias and sequential means it really is paying detailed attention to the sequential context of, you know, in a sentence structure and that that's something that we need deep learning technology for. I mean, hard in my understanding or lack there of machine learning in general like how hard is it to train these models, right like this has to be a very challenging model to train I feel like. Yeah, yeah, good question. I think maybe I will say more about what we have done concretely and I think it will come out. It's not done by any done or complete we have a lot more to do so. So this one is the first version that we have implemented and it's using relatively classic machine learning classification on the technology like support vector machine and I base classifier combined with what we call bag words presentation of texts. So the picture on the right is sort of illustrating how this works, sentences coming as data, and they're labeled in this case, for simplicity biased or unbiased. And then these sentences or longer, transforming to something representation for the bag of words, she's essentially a vector of frequencies of many many words. So these sentences are mapped into very high dimensional numerical vector space, like, you know, not necessarily zeroes and ones but often zeroes and ones. And then you can run run classic machine learning classifier is like support vector machine, and naive is classified to come up with a classification boundary. So what we do on a bunch of training data, label training. And once you have that classifier or the model, you can apply it on a new sentence or paragraph in this example. It's a master replica, which is also transformed to bag words and then the classifier is applied to that, and it comes up with the judgment or biased or unbiased. So that's how it works. And it's been implemented using scikit learn library and that you can find it in our report. So that's our first version. If there's a question I'd like to move to the next one, if you can go to the next slide. Just want there's one question check, could this solution be used for other languages, or is that something that's a map. That's a good. That's a good point that one of the advantages of, you know, machine learning based natural language type methods is that you train it with data so if you train it with English, they will learn about German. And one thing to know, I think our solution right now is primarily is English and also US, I think we're very, we've been kind of US centric and so that's definitely something on that we've been trying to, you know, consider moving forward. Considering different cultures, different languages and all of that though. Yeah, yeah, yeah, there's a lot more that you should and you can do about them. It doesn't all come, you know, automatically, but right, but it's basically extendable. So version four that we call ensemble consensus learning. This is our vision as an or an attempt to address the issue of there being unreliable markers or even adversarial markers in in crowdsourcing context. So the reasons that Joe stated, we believe that cross crowdsourcing is very important, and that's the way to go. But it's also challenging if you're collecting data from, you know, thousands of anonymous or people from all over, you may not necessarily have all the reliable people. But how can you, you know, how can you try to detect that and and learn from or leverage that information. Here's this here's an idea. We leverage a general idea called ensemble. That's a notion where you have a number of many potentially many machine learning models or methods. And and basically you make judgment using the ensemble. That means as shown in the question on the top, basically doing average. You do like a weighted voting by all of the machine learners in your ensemble and make a sort of an average judgment. So that's one thing about ensemble. The other thing about ensemble learning is that you can also learn the credibility of those models by updating their weights credibility weights, and you do that by this second equation where basically it's cutting up the weight of a model that makes a mistake. So every every time it makes a mistake compared to the ground truth label, it cuts, for example, cuts the weight by half. So it actually you can show that it quickly learns and zooms into to the good set of models. So this is the general idea of our ensemble learning, but we apply this in our context and there are a couple of things that we need to do. The first is that for those models, we don't really want to have just the markers if you have 1000 markers and train a model for each marker, you don't have enough data to form a single marker to create a good model. So we instead create many many models, but you train each one with data from a subset auditory subset of the markets. So that's sort of how it's depicted in picture. And then you serve you feed random subset of them, the all of the markers into different models, you can have hundreds of models. So that's one way that one way that differs from the general ensemble learning. The other difference is that we don't really know the ground truth for the same for the very reason that some of the markers are actually unreliable. So how do we know when to learn that one of the models made a mistake. So that's where we use the consensus, meaning the rated average prediction of the current ensemble you treat as as if it was the ground truth. And then you, you penalize those models that deviate from that. So that's what we call it ensemble consensus learning. Now, so with some assumptions about the reasonableness of the ensemble, you can show that you can hope to show that this also will quickly converge and and will quickly zoom into those models that are built with credible subsets of credible markers. So this is a, this is an idea that we would like to take forward, but sort of abstract. So if you can go to the next slide, I would like in the next couple slides, I'd like to show some concrete work that we have done. So much of this was done by Pritika, who can be presenting with us today. But what we did was, rather than sort of, you know, doing all this in abstract, we wanted to do it in concretely. So this dataset for the JITSO dataset from the Kaggle competition. It's a decent dataset about racial bias expressions. The dataset has 1.8 million observations, but it's not just about race or not just about, you know, black or Asian. As it's shown in the histogram on the left, there are many different kinds of biases that are present in the data. But by and large, you know, building machine learning models to learn these biases. So that's, that's a similar kind of modeling exercise, you know, irrespective of the exact type of, you know, bias we're talking about. So this is a very useful dataset. Another interesting thing is if you look at, so there is a column in the data, which stands for severely toxic or not. Well now we're using that as the, you know, target to predict. So we're trying to judge whether something is toxic or not. Interestingly, the sentences in data, if you look at the presence of obvious, obviously offensive terms like the F word or the N word, you can see this monotonic increase in toxicity as a function of the frequency of those terms. If you look at some more, for example, like big or dog, it's not that that simple. So there's a lot more that you need to do to learn, rather than just counting frequencies of, you know, these bits. If you can go to the next slide. This slide is summarizing the results we have obtained thus far with this data with the first version that I talked about support vector machine version. There's a bit of an extension that we did call TFIDF that's kind of variation of the bag of words. We're trying to detail on that one. But so far, we've obtained the accuracy figure is around 80%. It's not satisfactory. So this sort of in part answers your question press that it is a challenging problem and we do need to leverage, you know, more sophisticated cutting edge methods to improve. But it's, it's, it's not a bad start. We, you know, awesome. Yeah. Yeah. And there's some this sort of spells out the steps of this involved in this modeling like cleaning up the text or removing stoppers. And dealing with the embossed data by down sampling the non toxic data and up sampling or the toxic ones and so forth. And, and we use SGD is a method for training support vector machine models, and we're using a specific, you know, version of the parameters with that. If you can go to the next slide. There's some takeaway from from this exercise. We've done both support vector machine and knife base. And as you can see, my base is getting about 75% accuracy. So even with this kind of classical machine learning technologies, some are better than others and and you know, you can hope to keep improving. Deep learning, for example, I think will improve significantly one of the indications that the model as of now is still not satisfactory. Is this is the fact that it's a little bit technical but the optimal loss function of the model is something for the modified who bear loss, which means that there are outliers in the data with, which means also that with the present in the data in the sentences. It's not enough to be making reasonable classification for for some of those ones. And so we expect that, you know, maybe it's a sequential context or maybe it's some other kind of contextual information that that's lacking in the data. That you would need to be able to make better judgment of those. So those are some takeaway lessons. Another comment is that this data comes with, as I mentioned these different kinds of discrimination like against Asian or against black wiper, we're not necessarily trying to predict those lack those categories but they could be useful as features because the type of offensive expressions might depend on those categories and it could be helpful in defining the models. Another thing is that this data is it comes with the marker ID column. So you do know which are which sentences or which examples come from a particular marker, even though we don't know their identities. So we could actually try to do this version for the ensemble consensus learning where you're trying to learn the credibility of markers. That's something we haven't done, but this data set actually is rich enough for us to do something like that. But eventually we want to move to crowd sourcing where you can choose to collect more information, maybe demographic and other kinds of contextual information about the markers and so forth and also do selective, you know, sampling of all the active learning. That's not really possible with the existing data set. So we are, you know, I'm supposed to move to the crowd sourcing mode. Sorry, I ran out kind of long but this is it for the machine learning component. Are there any other questions? Yeah, I had some questions. Can you talk a little bit of just about the computation is required for this, especially if you're moving to deep learning so how can someone, how easy is it for someone to do this locally on their computer. What do you guys thought about that? Yeah, so for the classic machine learning models that we've done, I mean that it's easy to do like on my laptop and training takes maybe, you know, seconds. So that's not a problem. But deep learning, if you really, I think maybe at this level it wasn't that bad. I'd love to check with Ritika. She probably did it on her own laptop but as this data sizing increases and you want to train, you know, for a longer period of time and so forth, you may want to leverage GPUs and you know, going to a higher, more capable setup but I think you could also do that with cloud clusters. Okay. It doesn't have to be IBM cloud but IBM cloud would provide such capabilities and you can seamlessly train these models with no particular difficulty problem. Nice. And what's the volume of data that you think you might need in terms of crowd sourcing? So like just a call to action to the community. How much more data do you guys think you need for your next phase? I don't know. It's difficult to say how much. I mean this data set has 1.8 million which is decent amount but it doesn't, it's not restricted to the type of racial bias that we're interested in. That's less. But with crowd sourcing, we have the ability to select the markers and select the data that we want so that would reduce the amount of data that we need. But generally, I mean hundreds of thousands would be. Okay. Good to know. I actually have a question. So I know we've talked about context and you know where we're kind of flagging and I know in our solution we've talked about like including where the data is coming from or like what platform we're on. Did you see that that was a field in the Kaggle data set? That's a good question. I doubt it. I haven't checked thoroughly but I'm doubting it. Yes, that's an example of the type of digital information we would like to collect. And sorry, maybe I can jump in there as well. So obviously we've built into capture where data comes from but I think to Demi's point about how much data do we need. I don't know what no key thoughts are but even when we've collected enough to train an initial machine learning model, I think racial bias is something that what is racially bias changes over time. So it's something that we'd want to keep updating and keep iteratively improving over time, also so that we can understand how definition of racial bias might have changed over time, depending on political events, different types of situations that unfold. So I think it's something that we obviously we need an initial set of data but we want to keep that process going and improving and things like that. Yeah, that's important that we models are dynamic and able to adapt to changes in society. Okay, maybe we should move on to the next segment. Yes, so here are some of the immediate applications of takes to that we're working towards a text based content like social media platforms blogs. But there's also future applications that we would like to take that we'd like to have take to be a part of such as in the gaming industry and of course in other languages. And we're also actually working with different partners to realize and co create a vision to grow take to even further. So here I would just like to highlight where we are headed. We have kicked off our community meetings and these are some of the things we're focusing on right now, finding a home for our data updating the US of API browser extension, as well as the training model models. Next slide. And so here's where we need your help. As you saw we have set up the foundation for take to but even with the current functionality, along with a great team of contributors that continues to grow we're still looking for people that can help. This is a complex project, a complex problem we're seeking to solve. And excuse me, a complex problem we're seeking to solve. And so any contributions we can get would be really appreciated, we want to make it as impactful as it can be. So we are looking for various, various varied skillset. And there's many ways to contribute. We're looking for data hosting data scientists designers, a full range. So if you are interested, if you know of anyone else who may be interested, please check out the links that were provided in the chat, as well as listed on the next slide. And the links tell you how to get started where you can find all of our repositories and all the information you need to contribute. Thanks, Demi, go ahead to discuss the next slide. So if I could, sorry, could I just pause it? And yeah, I just wanted to say that, you know, and I listed off a bunch of the different kinds of health that we need. And just wanted to call out that while, you know, we talked a lot about like the technicalities of our solution. But I think we also wanted to call out that because it's a socio-technical solution, we really also need people who are experts in like linguistics and, you know, political science and media and all of that. So I think really no matter who you are and what background you have, what skills you have, you have a very valuable contribution to make to our project. So technical, social, whatever it is, I think we're really kind of looking for all kinds of people in a broad, diverse community to help out. So I think that's kind of beautiful and also awesome about take two. And so just wanted to call that out. That's great. That's very awesome. So yeah, just in relation to that, we do have some opportunities for you to get involved. As I'll put in this next few slides, you can join our Slack community. You can also join us for Hacktoberfest. So we're joining Hacktoberfest, which is a month long celebration of open source contributions in October from October 1 through the 31st, and we'll be participating with call for code for racial justice, all of our projects. We participate in including take two during this celebration. So we'll be having a kickoff event on October 6, if I can Eastern, you can watch it, you can register for the event at ibm.biz slash Hacktober CC, that is case sensitive, or you can watch the replay if you're not able to catch it live there. It'll be on that same link. So and for here, you'll be able to find out more about how you can contribute to our projects, exactly how to get involved in Hacktoberfest, and just learn a lot more, whether you're technical or non technical about ways you can contribute to all of our projects going forward. And this is a last call to action here. We'd like for you to join our community. There's definitely a lot of work to be done with take two. So we'd like for you to join us. And if you would like to be a part of a community can do so by joining us through our registration page, which is ibm.biz slash OS TV 0928, which is today's date when we're filming this. And with that, you'll be able to join us on the call for code slack channel. And once you get there, they're like going to be a million slack channel. So the thing to it's important to know is that you want to join the racial justice take two channel as well as the racial justice general channel as well. And with that, thank you for joining us today. It's been really awesome having us to being here and talking to you guys about take two. Yes. This is awesome. Thank you so much for the work you're doing. Thank you so much for bringing this project to the channel. And thank you so much for sharing your work with us. It's very much needed work and I'm passionate about it. So thank you very much for doing that. Any, any other points you want to make just thoughts before we get off the air here. Other than, you know, all the stuff I dropped in chat and links and such. I think just thank you again for having us and letting us share our story and we're looking forward. Hopefully people can join us on in our project and the journey and wherever we're headed and so, yeah, thank you. That's a good growing community and it's really nice to see this effort, you know, for and so you're welcome everyone to join. Yeah, it's cool to see when we all started like as a challenge to where it's gone and where we're sort of headed. So, you know, back last June to where we're at right now is really, really awesome. It's quite an achievement and just a little over a year. I mean, it's very impressive work. Thank you. Thank you. Awesome. So thank you all for joining. Thank you audience for watching and stay safe out there folks and we'll see you soon. Bye.