 So I am Elizabeth. I'm going to talk to you today about positionality aware machine learning We are gonna start off with a question tomato Fruit or vegetable? What do you guys think? Fruit vegetable, right? Okay. It's a matter of perspective. It's also a matter of context in which you want to use that answer If you're a botanist you say fruit if you are a nutritionist you say vegetable lawyers and judges in the US have agreed vegetable computer vision researchers Say it's miscellaneous This idea of classification. It's the process of assigning a Name or a category to a particular idea concept thing It's a process that we go through in our daily lives continually The idea of classification is also the idea of creating taxonomies for understanding the world around us and Reducing the large amounts of nuance and do tell detail that there are in the world Usefully we see it when we're trying to understand how diseases spread around the world We use it when we're trying to understand what online harassment looks like we use it to understand differences in race or gender Gender is a particularly interesting one particularly in the kind of Western societal context where at one point We saw gender as pretty much agreed upon as a binary variable. There were two options But now that's no longer the case So if we're thinking about these classification processes and trying to embed them into the machines we're building We need to be thinking about it critically and in the context that we're currently in and what might change moving forward This is a quote that we had from one of the many user interviews we conducted with different ML and AI engineers This woman is at a major US news Organization and she talked about the idea of classification and when it might present problems in terms of harms it could cause She said there are sometimes when it just doesn't matter. It's it's not an issue to do with harm She started with the idea of okay our otter toner model for image recolorization. Well, that's not gonna cause anyone harm paused thought about it Actually, maybe it does it's kind of a weird algorithm that may lead to whitewashing And so this is something we saw a time and time again as we were asking these Practitioners to think about their classification choices and when they might be problematic that once they started digging into that problem They realized there was this potential for problematic decision-making that on the surface wasn't an issue in the first place And so this is where we come to our idea of positionality Positionality is the specific position that or perspective that an individual takes given their past experiences their knowledge Their worldview is shaped by positionality It's a unique but partial view of the world and when we're designing machines We're embedding positionality into those machines with all of the choices we're making about what counts and what doesn't count So this is a very very simplified data pipeline Okay, this is when we go from data into our ML model that we are trying to train I'm gonna use the context of online harassment Let's imagine we have a whole bunch of tweets and we want to define whether or not those tweets are exemplifying harassment or not Well, we would grab our data We would apply new labels to that data harassment not harassment and then we'd train a model on it so it could predict This requires a really complex classification system Right, so we have a system to decide what counts as harassment or what doesn't we have to train a whole bunch of Annotators to literally go through the data piece by piece and assign those labels Then they apply that and that's when we get to feed into that model So thinking about online harassment In a project I worked on we started with three categories That was our classification system our taxonomy, right? We had positive neutral and negative every tweet was going to fit into one of these categories Our annotators could not agree three categories did not work And it was because there was a bunch of boundary cases between neutral and negative It caused tons of problem We could not get good intercoder reliability or inter-rater reliability, which is a common tool to use to assess Agreement we added a fourth category called critical and all of a sudden our annotators agreed the majority of the time We had to redesign our classification system in order to respond to the actual data the way that data was being presented and The way humans interact with that and understand it So what we're saying is To interrogate these classification systems. We need to be thinking about What counts in what context? We need to be thinking about who those annotators are why we've selected them how they've been trained at what? moment in time and we need to think about the actual application of the classification systems and Question whether or not there is sufficient agreement and whether or not our approach has been reliable That was an example of a homegrown classification system for a very specific project But this idea of positionality is embedded even in the very old institutionalized classification systems that are used around the world So the international classification of diseases is a tool that's used internationally to identify and classify health problems and it actually underpins a lot of the US healthcare billing system This is an example of the different codes you can use in the ICD for being harmed by birds Okay, so there is a code for having been harmed by a chicken or a goose or a parrot There is no code for ostrich though Okay, think about how big an ostrich is then think about maybe living in Australia if you ask an Australian What is gonna be a more risky harmful health situation being kicked by an ostrich or being bitten by a goose? Probably they're gonna think the ostrich is the more important thing to count But the ICD it wasn't developed with that in mind the ICD was developed with its origins in the 1850s It's now maintained by the WHO and it was designed primarily by white men in Western Europe and North America Their positionality it's embedded in the ICD today and it will continue to be unless we Routinely question what that positionality looks like Ultimately choices here are inevitable and this idea of removing bias It just doesn't jive when we're understanding that these choices are gonna happen regardless a lot of the conversation about Debiasing algorithms is about adding rows if you just add enough data You'll be able to get a representative view of the world But if you limit yourself to only the columns for parrot Chicken and goose and you don't have a column for ostrich you will never capture how many ostrich kicks there were So if the debiasing debate isn't helpful, what do we do instead? We argue that you could look towards being positionality aware and we suggest that there are three basic steps that machine learning engineers and others involved in the process can take The first is to uncover positionality in your own workflows Look not only at the classification systems But also the data and the models that you're making use of and think about where positionality enters keep track of it Next is to try and assure there's context alignment That's an alignment between the classification system in the context in which it was developed and the actual application scenario for the machine learning tool that you are creating and Here let's return to that online harassment example We developed that for Twitter. Maybe we want to use it on Reddit now If you're thinking about just taking the model that was created for Twitter and applying it to Reddit There's very few options for embedding a positionality aware approach If you're thinking about well, maybe if I just feed in a bunch of new data I can solve the problem. So you trained it on Twitter data now you're going to train it on Reddit data that'll get you closer But what you actually need to do is question that classification system You need to go back and look at how you're actually assessing what counts as harassment and what doesn't Because the way people communicate on Twitter is different from Reddit on Twitter. You have a short character count You might use hashtags at replies on Reddit. You are probably talking in very specific subreddits You're probably engaging in particular language because you know There's a moderator watching what you're doing and keeping track to make sure that you're within the bounds of what that community has Deemed to be acceptable. You have way more space to do it Right and so the ways that we classify content for Twitter and Reddit They're probably going to be different certainly the ways we train our annotators has to be different because Those approaches do not work when the content and the context are completely changed The last step here is to remember that you need to be continually trying to ensure that there's that that alignment exists The models might change the data might change the classification systems themselves might change the ICD It's changed by the WHO Relatively routinely and so if you're making use of it, you need to update your approaches It's also important to recognize that the context in which you're building something might change whether you like it or not And so having a lack of control there kind of requires you to be aware of what's shifting in order to build a reasonable and responsible tool So with all of this in mind what we did was run a workshop with ML engineers And we've got a number of other workshops already submitted so we've submitted to epic and fat star We've created a white paper that's available on our website and plan to write a more detailed position paper that we can make available widely I Will let you go explore the website on your own but before I do that I just want to leave you with this right now ML and AI systems kind of are like a one-size-fits-all t-shirt They fit very few people a lot of us end up kind of unhappy But we can do better we can harness this opportunity to be aware of the very specific context in which these tools can be deployed think about how they can be tracked over time and Find ways to serve the specific needs of the users and the developers in order to be Aware of the particular perspective from which we are designing and that Perspective which is embedded in all of the tools we're creating Thanks