 Hello again. Now we have with us Stephanie, who's going to talk about the myth of neutrality, how AI is widening social divides. Stephanie is turning her passion about ethical, sorry, Stephanie is passionate about ethical and just AI. And she organizes and hosts workshop for kids and teenagers on the topics of AI and algorithmic biases. She has also been a speaker at multiple meetups and barcams. Where are you streaming from Stephanie? I'm streaming from Hamburg, Germany. Oh, awesome. Cool. Over to you. Thank you. Okay. So, hi everyone. Today I'm going to talk about the myth of neutrality or how AI is widening social divides. First of all, a few words about myself. So, I'm a machine learning engineer, but this is actually a work in progress because I'm currently finishing my master's thesis at the University of Hamburg, Germany. And I, apart from coding, I really enjoy dancing, drawing and laughing and occasionally ranting on Twitter about algorithmic injustices. Let's jump straight into it. So imagine that you're driving down the highway when you're suddenly pulled over by the police. They check your driving license and they find that the picture on your license matches the one of a person that is wanted for an armed robbery. You know that you didn't commit that crime, but the officers seem convinced that you did. So, what is going on? Well, nowadays, it might actually be something like this. So the officers took the photo from your driver's license, fed it into an algorithmic system, whose access to a database of pictures of people that are wanted for different crimes. And the algorithm then spits out a match for your picture and a person in that database. And so you're going to jail. If you think that sounds drastic, then let me tell you this, this actually happened to at least three black men in the US, to Robert Williams, Michael Oliver and Niger Parks just in the last few years. And in this talk, I basically want to shed light on how things like these can actually happen when we use algorithmic systems and how they can basically cement existing injustices that we already have in our societies. And to start out, I want to have a look at the bird's eye view of the AI landscape because this informs the context that shapes all of AI development and research. And let's first look at the sectors where AI is being developed mainly. So the first one is actually big tech. Big corporations have a lot of power. They have monopolies and I'm going to get into that in a little while. And the second, there's the military who's been investing in AI for several decades. And we also have government and community, but they don't seem to have as much leverage there because they also don't have the money to invest in these AI resources. And the problem with this is that basically a lot of AI development focuses on for profit and surveillance tech. Secondly, I want to talk about demographics. So as you can see in this picture, this is basically showing us the state of AI, how it's developed right now and who it's developed by. So this is generally the same as in the rest of the tech industry. But this doesn't only go for gender identity, but also for other axes of identity, such as race or sexual orientation where white males are general in the majority and there's this other unicorns. And then third, I think it's also important to mention geography because the US and China are at the moment really leading an arms race. What many people call it for basically who's going to get the upper hand in developing AI where whereas the other is just a tiny stack of dust in between those two. So now let's dive a little bit deeper into it because I want to actually look at the standard AI development process to show where different problems can come up. So first I'm going to start with research and funding, what problems we have there and who's basically doing the funding. And then we'll look at data collection and labeling, how problems can arise with the images, how the images are obtained and how they are labeled, then training and testing and looking at different metrics that we use to assess the system. And then finally deployment. So first research and funding. I think it's important to actually start here with the birth of AI. And this whole sector basically started in 1956 at the Dartmouth workshop, which was organized and attended by this amazingly diverse group of people that you can see in the picture on the right. Among them were John McCarthy, Marvin Minsky, Nathaniel Rochester and Claude Shannon who are big names in this research area. And they actually coined the term artificial intelligence back then. And a lot of these men actually believed that fully intelligent machines would come about until the mid 70s. Well, so far I haven't seen a terminator yet. But let's see what happens in the next few years. But I think it's quite interesting to actually look at what they define as intelligence and looking at a quote from the workshop proposal. So the quote says, every aspect of learning or any other feature of intelligence can be so precisely described that a machine can be made to simulate it. To be honest, me personally, I don't believe that this is actually true. I don't believe that you can mathematically describe or formalize intelligence. But for the ones of you who are shaking their heads right now. I just want to ask you this question, which is kind of a joke, but still, why can't I do the washing up for me then. And maybe clean my cat litter box at the same time and be disgusted by it. So maybe if we at some point get to this place, then you might convince me that it's actually possible. But side note, let's go back to the main topic. So who's funding the research. First of all, we need to look at military and intelligence agencies. DARPA, for instance, which is short for Defense Advanced Research Projects Agency, were basically a major source of funding since AI's early days, and they are still now. So they are involved in extensive R&D research efforts, many of them with a clear military and intelligence focus, but they also research other stuff such as the Moderna vaccine, for instance. But as you can see, military interests have been a part of the AI field from the start, and they still are now, as I think we can see on this slide, which depicts the US contract spending on AI by government agency. And what this means is that this is not the money that the agencies spend on their internal R&D regarding AI, but this is the money that they actually spend on third-party contractors, so firms, public firms that work for profit, and these agencies are basically buying solutions from them. And the number one spender with almost 1.4 billion US dollars in 2020 was the Department of Defense. And one of the companies that's increasingly receiving more money and more attention also in the media is Clearview AI. So Clearview AI is a US-based company that sells access to its biometric identification software to law enforcement agencies in the US, but also in Europe. And the way that their product works is basically you give them a photo, and their software will spit out all the pictures that it has of that person that you can see on the photo, and maybe some additional information like their name. And to create their product, they actually went ahead and scraped over 300 billion photographs from Twitter, Facebook, and Instagram. And I think for me this is quite worrying actually, because if I think, okay, I might have at some point posted a selfie of myself that I don't like so much, or of me in a bikini, or maybe some of us have participated in some crazy college party that we rather forget about, but there's still picture evidence that might be somewhere online. And if I imagine that Clearview AI has access to these pictures, I just get a shudder. But the CEO of this company actually sees it quite differently. So the way that he puts it, he says, all the information we collect is collected legally, and it is all publicly available information. Yeah, so framing it as information and basically saying that it's a free-for-all buffet that is just there out for companies like Clearview AI to be scraped. Let's move on to Big Tech. So the first problem that I see with Big Tech is that they're buying everything. This is a timeline of tech giant's billion-dollar acquisitions. So just to make sure that everybody gets this, so this is just the acquisitions that cost them more than $1 billion. All the other companies that they bought or acquired are not on this chart. But what we can see is that between 2000 and last year, the companies, Facebook, Apple, Microsoft, Google, and Amazon have been spending more and more money for acquiring more and more other companies. And I think the problem with this is, first, they are increasing their monopoly on markets. And they're also getting into new markets that they haven't been in before. So basically, they're vacuuming everything up and putting it under their own umbrella. And I think this is bad not only economically, but also in the way that they can basically then stir the direction in which these companies will put their R&D efforts in the future. Secondly, Big Tech is leading to an AI brain drain at US universities. So in this chart, you can see the number of AI faculty departures in North American universities between 2004 and 2019. And these faculty members have actually joined Big Tech corporations in the US. And what you can see here that this number has been steadily rising. And the problem is not only that the research at the universities that these people left will suffer, they will get less grants. And the brain drain is real, but also what the researchers that basically created this publication, Government and DIN found, is that this also has negative impact on startups being founded by the students that are left behind at this university. So they found less startups, which is, again, economically bad. However, you might be saying, OK, the people are joining Big Tech corporations and Big Tech corporations do a lot of research now, especially in the sphere of machine learning, because they have so much money. But what I'm going to say now is that research within some of these companies does not seem to be entirely independent. And I think the next quote actually makes that quite clear. So this is a quote from a senior manager at Google while he was reviewing a paper on recommendation algorithms before publication that was written by Google scientists, among others. And he says, take great care to strike a positive tone. Google does not seem to want to be criticized even by people actually working there. And also what this tells us is that on top of the scientific peer review process that we have where you basically send in a research paper to a journal or a conference and somebody else that you don't know about is going to review it. So a double-bottlined review. On top of that, Google seems to have an additional review process that papers need to pass. And yeah, this I think is a problem when managers tell employees to strike a positive tone. And one particular example of this is actually the one by of Timnit Gebber and Margaret Mitchell. So these two women used to be the co-leads of the ethical AI team at Google up until basically six months ago, a little bit longer. And they actually published or wanted to publish a paper criticizing Google's large language models. And Google didn't like that very much. I'm not going to go more into detail, but I really encourage you to read the story that I've linked in this slide and make up your own mind about it because these women's actually no longer work. These women don't actually work at Google anymore. Next, let's get into data collection and labeling and the problems that can occur there. And I want to dive into the ImageNet data set here. So ImageNet can be seen as the image classification data set. Image classification is basically where you have a lot of images of objects or persons and each of them is accompanied by a label. The model can actually learn when it sees new images of a certain thing like a cat, for instance, to recognize what it is. And ImageNet contains more than 14 million images in more than 20,000 categories. And one of the researchers that made ImageNet happen actually said the goal was to map out the entire world of objects. Seems legit. Where did the images come from? They came from the Internet because ImageNet was scraped from search engines and photo sharing websites. And I'm not sure it was the first data set where researchers actually did that, but I think it's one of the largest ones initially where people just went ahead and took whatever had a Creative Commons license on the Internet and used it in their data set. Did they ask for consent? Well, nope. ImageNet arguably was what started this free-for-all data buffing. And this is still very much present nowadays in the deep learning sphere. So people just go ahead and take images from the Internet without other people knowing, without them having consent. And they say, well, if people don't like it, they can just write us an email and we'll take it down. Well, if you don't know that you're in a data set, how are you going to, let's not get into that. But next, let's look at the next layer of problems. So how were the images labeled? To understand that, we actually have to go a little bit back in time and look at WordNet, which is a hierarchical Word database connecting words created by the Princeton, at the Princeton University in 1985. And these, the nouns, or some of the nouns that they have in this WordNet database was used as the basis for the labels in ImageNet. What the authors now did was they took the labels, they took the images they scraped, and they created a project on Amazon Mechanical Turk, which is a platform where gig workers or ghost workers, as we call them today, can make money by, for instance, labeling images. And in 2009, after they were done with this, they published the ImageNet dataset. And let me now ask you how you would choose a label among the following for an image of a person. So this is a word cloud of what can be described as probably the most offensive terms in the person subcategory of the ImageNet labels. So we have great descriptions like flop, loser, convict, bad person, drug addict, and some that I'd rather not read. And first of all, I would have like a real problem, like how do you tell whether someone's a bad person from an image? That just buys into all the stereotypes we have as people. And also the second question, why does the system need to know what a call girl, for instance, or a bad person looks like? What objective does this serve? Also, weddings don't look like this everywhere. And if you're a little bit confused now, then let's look at this image. So here you can actually see four pictures of people getting married. They are, however, not labeled by a person, but by a neural network who's been trained on the task of image classification. And on the left-hand side of the three pictures on the left were actually labeled correctly by the network. So they have labels like ceremony, wedding, bride, groom. But the picture on the right is just labeled as person or people, even though that's actually a couple getting married. And what this demonstrates is actually there's a huge geographic skew in image datasets. Here on this slide, you can see the pie charts on the left for open images and on the right for ImageNet and where the images come from. You can see the huge yellow part is actually from the images from the US. And a lot of the other images come from North American or European countries. So this doesn't represent the world as it is. And this is also a problem because if a neural network is trained in a dataset like this, you cannot use it. You potentially cannot use it somewhere else. You just misclassify things or get the context wrong. Next, let's look at training and testing and specifically the metrics or one of the metrics that we use to assess the goodness of a neural network. And I want to talk about what many call a landmark study, which is called gender shades. So in 2018, Joy Bolemini and Timnit Gabru investigated biases in commercial binary gender classification systems. And you can see a picture of Joy Bolemini here on the slide and her work was inspired by her own experience of not being recognized by open source face detection software. So as a black woman, she had to wear a white mask for the system to finally detect her face. So she basically wondered whether skin color would actually have an impact on the classification accuracy of a binary gender classification system. So she went ahead and tested three commercial systems. And here on the slide you can see the mean accuracy scores of this gender classification. First, we have Microsoft with 93% face plus plus with 89.5 and IBM with 86.5. So on first glance, doesn't look so bad, right? But what she did now, she went ahead and compiled her own data set of pictures of members of parliament from African states and from European states. And then she checked whether the performance of the system would be how the system would perform on these different groups. So darker male, darker female, lighter male and lighter female. And she found that on the lighter male group, the system basically worked almost perfectly. It got almost all of the pictures right. But in the darker female group, the story looked quite different. So that's actually the largest gap that she found was between darker female in the IBM system. So darker female was classified at 65.3% accuracy. Well, lighter male was classified at 99.7%, which makes a gap of 34.4%. And just to make that extra clear, this is a binary classification problem. So if I throw a coin, I will get roughly 50% right. And this system got 65% right on the darker female group. So this is not good. And one of the things that this teaches us is that a single success metric such as one dimensional accuracy does not tell us the whole story. And Buolamini's and Gebru's research motivated many other researchers to assess biases and to try to build fairer systems. And sometimes people tried to fix this by creating more diverse datasets, like the next one. So this is about diversity in faces now, which is a dataset created by IBM with the goal of advancing the study of accuracy and fairness in facial recognition. So they actually went ahead and took a preexisting dataset based on flicker images that were scraped without consent again. And they annotated them or let them annotate with facial features such as facial height and width, distances between the mouth and nose, the facial geometry and so on. And on top of that, they also annotated age or perceived age, race and binary gender. And the reasoning for this annotation process with the facial features was that the measurements would allow better assessment of accuracy and fairness and more fine-grained representation of facial diversity. But this begs the question whether diversity just means variety of face shapes or whether diversity means binary gender assigned to the pictures by crowd workers. Wow. The thing is, AI creators decide about the classification system. So they decide about the boxes that these images taken of people are being put into. And then the ghost workers put them into these boxes. So I think Kate Crawford actually put it very well in her quote from Atlas of AI. The practice of classification is centralizing power. The power to decide which differences make a difference. And to move on to our deployment part now, I want to look at how these systems can affect us when they're deployed in the real world. And I think a very good example of this is hiring and firing where more and more companies use algorithms. So in traditional recruiting, what would happen would usually be you see a job posting somewhere that you like and you apply to the company with your CV and maybe a letter of motivation. And if the company likes your paperwork, then they're going to invite you to a person-to-person interview or maybe zoom face-to-zoom face interview nowadays. And if they like what you're saying and they like the interview overall, then they're going to hire you. If not, they're not. Recruitment today, however, works slightly differently. So first, many of us actually create profiles on online platforms such as LinkedIn. And if you want it or not, LinkedIn will probably send you emails about new job postings that match your geographical location and your skill set or whatever you put in your CV there. And so, yeah, this is the first time you can actually come into contact with AI without even realizing it. So you might not get to see all the jobs that you might be interested in, but only the ones that the recommendation algorithm seems relevant for your search criteria and your skill level and your location. The next, if you actually find a job that you're interested in, you again create your CV, your references and so on, send them to the company, for example via the online platform. And now many companies use automated approaches to actually sort through the massive amounts of paperwork that they get from applicants. And if the algorithm allows it and you're lucky, you'll then get to talk to a real human. But if you're not, then you'll get to talk to an algorithm. So more and more companies actually rely on automated interviewing solutions today, for example, using video interview software. And you can imagine it like you're sitting at home in front of your computer and you're talking to this AI generated voice, which is asking you questions posed by your potential future employer. And they take a video of you and analyze your voice, your posture, your facial expression, so your emotions and give you a score. But you might not actually find out what the score is. So it might just be a number in between one and five, but you might not even find out. You will probably just find out whether you've been rejected or not. And many companies use these tools now because vendors make the promise that the AI is less biased than humans and will help promote diversity. But I just see layers of problems with this. So first of all, you cannot ask an algorithm any questions, right? If you are in an interview with a real human and you didn't get the question you can ask, sorry, can you rephrase that? This doesn't seem possible with these systems. Also, you're not getting feedback after rejection. Like I said, you might, if you're lucky, you'll get your point score. But what does that really tell you? You don't know how it's calculated. So how are you going to improve in the future? Second, no, you don't have proof of possible discrimination because these algorithms are trade secret. And this makes it very hard to challenge the decisions made by them. And third, what I find really unnerving is the scalability of these things. So the thing is in traditional recruiting, you might apply to 10 different companies with 10 different humans looking into your CV or talking to you. And these humans will all have biases like we all do, but they will not all have the exact same biases. But if you scale these systems across different companies, you might not ever get out of this loop. So you have no way to know why you've been rejected, no way to appeal, so no way to escape, seemingly. And also, how do algorithms actually determine whether someone is a good fit for a company? Amazon tried it a few years ago. They actually built a hiring tool which checked people's CVs. And the system was trained on resumes of applicants over a 10-year period. Well, remember what I said in the beginning? Most people working in the tech industry are male, so they actually realized that the system discriminated against women. And they tried to fix it by making it blind to certain words that would indicate gender. But they soon realized that they couldn't fix it. So the system kept finding ways to infer a person's gender from other seemingly unrelated factors or so-called proxies. And I think they took a good decision by just saying, okay, let's trash this system. It's not going to work. However, since then, they seem to think that automatically firing people based on algorithmic scores is a great idea. And they're actually doing this now to drivers in the US. So if you want to find out more about this, then you should definitely read the article that I've linked in there by Spencer Soukman-Bloomberg, tried by Bot at Amazon. It's you against the machine. Often, people actually take a different approach than trashing the system that they've been working on so hard. And instead, they're trying to make it fair. So build fairness as a feature into the system. Now, I'd like you to imagine you had to build a fair IT hiring algorithm. So your boss is telling you, we're trying to get more women into IT department. So let's build an algorithm for that. Would it be fair if the algorithm just randomly selects from the pool of applicants without knowing about the gender? Well, if 90 men and 10 women apply, then this is very likely not to change your company's demographics because there's still more men are going to be in this sample pool of applicants that you accept. You might be saying, OK, well, what if the algorithm approves the same percentage of women and men? So this principle is called demographic parity. The downside, as some of you might say, is that unqualified people might get the job now because you're just going after numbers and not after skills. OK, well, maybe then we should approve the same percentage of women and men given that they are qualified. The problem with that is that men and boys statistically get in touch with coding and technology way earlier. And they are also more likely to have their own computer earlier in life. And combining this with gender stereotypes basically gives them what you could call an unfair advantage. So this notion of fairness is also not going to get you far in this case. And all these notions of fairness can be described mathematically and people do describe them mathematically. But we first need to have a discussion about what fairness actually means in each context because people have vastly different ideas of fairness because fairness is a political decision. So let's not outsource these political decisions to the select few developing AI systems. And I include myself in that because the way that AI is built at the moment, it's leading to a giant AI feedback poop. So let's recap the military and big tech fund research and do research. Then we collect data that does not include consent and basically is very skewed towards certain groups. The training and our metrics with which we measure success are very one dimensional. So we don't really focus on minoritized groups. And then during deployment, that leads to basically manifesting power in the hands of a few and that nobody seems to care who is harmed by these solutions. So when we look at that example from the beginning, these systems can cause real harm to people's lives and well-being if you're being jailed even though you're innocent because an algorithm tells the officers that this pictures you. And the system that led to this arrest might actually produce more data points. For instance, for predictive policing algorithms, which will be used to train more and other systems that will cement this injustice even more. And it will screw the data sets more and more to harm communities that are for instance already overpoliced and will also fund more and more harmful research. And I think what we're entering there is what I'll call the seven circles of AI health. So how do we get out of this? How can we start to build bridges to get out of this hellhole? First of all, my advice for everyone is to stay informed. And by that, I don't mean that you just read all the articles where people claim AI has superhuman capabilities in X, but be a bit critical about this as well. And also think about what are the uses of AI that we as a society actually want and which are the ones that we don't want because there's just too much harm. Then second, join and organize collectives. So there's a number of great collectives and people that are organizing together to fight for instance, biometric mass surveillance. And I've got a few links in my references if you want to check them out. Third, I think it's important to vote and donate. So donate to these organizations, but also very important vote for people that are trying to limit big techs power and that are trying to make ethical principles a part of AI development. And for the people in machine learning, I have also more advice for you first to be critical. Whether it's about your own work or somebody else's work, always at least double check and discuss and look behind the curtain. And also don't be fooled by one dimensional metrics. Second, I think it should be moral first, math second. So the discussions about technological harms and their consequences should always become, should always come before any formalization. And with that, I don't only mean that we should only have discussions within the space of machine learning where it's developed because that just fosters this system of power that's already asymmetrical. But we should, and that's my third point, involve other humans, whether it's people from affected communities. Because sometimes when we deploy solutions, we cannot really know how the, how what the possible harms are until we talk to people that have some experience with this. And also we should talk and listen to social scientists because they've been doing very important research in the last few decades, and also qualitative research that we've been ignoring for way too long. And with that, I want to say thank you for having me. It's been great. And I really encourage you to look at some of the references that I've linked here have got books for the people who like to read like me. Also some podcasts on the topic, and also films in series, and the organizations around algorithmic injustice. And for the ones that really want to dive down the rabbit hole I've also got some scientific paper links and talks. Thank you very much. Thank you for that great talks, Stephanie. There are a couple of questions that we have the time to take on right now. So let's go at it. Okay, so individuals borrow images all the time without citations, and we call them memes. How do you convince people that this is something they need to worry about when it's not their content? Ooh, that's a tricky question actually. I've never actually thought about that to be honest. I think meme is such a big part of our culture, but I definitely get the danger with that, especially when it's not something that is maybe drawn like a stick figure, but a person's face. Yeah, I guess if you read about these things, check out some of my references regarding how these datasets are, for instance, collected, and maybe try to convince people based on that, that basically privacy rights is something that we should all care about. So when we have a picture, also this goes for just taking pictures of random people on the underground or wherever you are and posting them on social media. We shouldn't do that. This has been so normalized. I think we should all try to convince each other, but I have nothing that I can pinpoint that I can say you should do this. Yeah, it's difficult. Definitely, we do need more general guidelines around this. All right, onwards to the next question. Is advancing the accuracy a worthy goal to strive for if we disagree with the very existence or use of these systems? Wow, very, very good question. I think this depends on what use you're talking about. Like I said, context is so important. And I gave us one example of this dataset, diversity in faces, which has been taken down by IBM now and people are complaining about it. But if it's being used for something like biometric mass surveillance, which will first hit minoritized communities, I think accuracy, you can get a 99% accuracy and then what? Then we're all being mass availed. So is that something that we want? Yeah, so I think this really depends if you want to, I don't know, classify whether you have to check whether you have to water your plants. I think accuracy is definitely a worthy goal because your plants might die otherwise. But if it's about human lives and endangering them, then yeah, I agree we should definitely be more critical. Absolutely, that was a great answer. We have time to take on another question. Do you think the problems you describe are rather technical and require better software or social and require deep social change? The second, definitely. So I think what I try to convince people of is fairness is a great goal. I think it's important that we actually are able to formalize these things, but only if we actually agree what this means, right? What is fair? And people will have different observations or what is fair to them will change from context to context. I think we do require deep social change and also to bring in scientists, social scientists that actually have been studying how technology and human society have been interacting and basically feeding each other and changing each other. So this is really important. Thank you for that wonderful talk and taking the time to answer all the questions. I guess you could hang out with the audience in the breakout of the room. Thank you very much.