 Okay. Welcome everyone. My name is Mikołaj Morzy. I work at Poznań University of Technology and the Institute of Computing Science. I work in machine learning, and this is my role of me and my colleagues in this project. When asked to present during the seminar, I was thinking about what would be the most, because I knew that I would be in the company of ethicists. I would be the only one not understanding what they're talking about. I've decided to at least contribute to the discussion and maybe to demystify machine learning a little bit, because people tend to use this phrase, machine learning, artificial intelligence, so vaguely, but it's not that complex and it's not that complicated. The engineers can do it, so how hot can it be? This will be my presentation. I will look at machine learning and what can go wrong when doing research and learning and applying models, learned or trained on data, mostly harvested from the web, from open repositories, from sources such as Twitter, and in particular we will look at what could go wrong. Now, why is there so much stress on artificial intelligence or machine learning? Now, if you think about computer science, as it is or as it has been for the last 50 years, this is basically it. You take the data, you apply some kind of an algorithm to the data and you get the results. And whether you are just typing something in your word processor, whether you are computing some equations in your spreadsheet, whether you're browsing in your browser through the internet, this is all the time what you do. You have the data either manually created by you or taken from somewhere. You apply an algorithm and an algorithm is just a well-defined finite set of steps that bring you to a desired goal and you obtain the results, right? So the algorithm here had to be written from scratch, had to be programmed. Someone had to write down the code that transforms and processes the data to produce the result. And this is when machine learning comes in because the machine learning is indeed a revolution when it comes to ICT because in machine learning, this is what we do. We take the data, we show the expected results and the algorithm is the result of what we do. So the method, the steps required to get from data to results is the result of computation itself. And that's why we no longer have to code things manually, we don't have to program them. They program themselves, right? But, and that's why those methods are so data gritty. The more data you present and the more expected results you present to a machine learning model, the better the model becomes and the more accurate the modeling of the reality or the representation of reality becomes. So this is, I understand that this is quite vague. So let's go into details and let's do some live programming. Consider a very simple example. If you were to teach a child to solve this kind of problem, given three numbers, produce the results. So a child would have to understand the concept of addition and the concept of multiplication and maybe also the concept of which formulas or which operations should be performed first and which should be performed later on. But basically, you would assume some kind of intelligence, some kind of understanding. And this is absolutely not what artificial intelligence does. Remember that we, instead of trying to encode the algorithm of solving the problem, we are trying to drive this algorithm by just throwing lots and lots and lots of data onto the machine learning algorithm. And that's exactly how we will solve this problem. We will just create thousands of triplets, just showing three numbers and the result of this computation to a machine and we will hope that the machine learns how to add and to multiply. So now for coding. This is the only thing that I will do. I will randomly select integers, so integer numbers from one to, and I will put some caps so not to build too large numbers. And the result will be sum up the first two numbers and multiply by the third number. And this will be the result. So this will be the input and this will be the result. So let's execute this one. And yeah, we are now creating 10,000 examples. Each example consists of 10 numbers and I have limited the size of, or the, yeah, I've put a cap on numbers, on integers that we select to be just from zero to 10, just for the sake of simplicity. So let's see how, sorry, blah, yeah, I've run it. So this is exactly it, right? This is just the head of this data frame, but as you can see, nine plus three is 12 times eight, it's 96, eight plus four is 12 times four, 48, and so on and so on. And I have randomly created 10,000 of such examples. And now I will just present those examples. I will keep presenting those examples to a simple neural network. So let's define the simple neural network. And we will run this network for 10 epochs. An epoch is a full scan through the dataset. So 10 times the dataset will be read by the network and the network will try to learn how to add and to multiply, mind you, without ever explaining what an addition or multiplication is. So this is our neural network. It consists of three layers. And here is the training. So just give it a second. What you are seeing there is a loss function. The loss function measures the steps of learning. So the smaller the loss, the better the learning. So you can see that we started with some random knowledge or no knowledge at all. And by going through these examples, you see that it goes down, goes down, goes down, goes down. Well, we could continue the learning and probably it would go to much better results, but this should be enough. Let's see the results. This is the test. So this is the set of examples that the network has not seen before. And we will just show those three numbers, ask the network to provide the results and we'll compare them to expected results. So what do we expect to see? So let's do exactly this. And this is our network's response. So sometimes it goes awry and here is a huge error. We've expected 24, but yeah, the network is totally mistaken and it gives you a 21, but other than that, doesn't look so bad, right? So after a very, very short time, it learned to add. Maybe I'll repeat this step. The only thing I will change is, I will give it a little bit more time to learn. So instead of this, let's say, let's give it twice as much time, okay? And now let's generate a new test set and well, more or less, there is a little bit of error, but still we have some knowledge. And the subject that interests us the most in the context of this seminar is the bias. So what happens if the training data gets corrupted in some way? And that's exactly what we'll do here. So again, I'll go back to this shorter learning, so number of epochs, but now in each turn, I will change the expected results. This array basically contains the expected values. This is in the training, this is the column that presented the result. So what I will do now is in every hundred elements. So we have 10,000 examples. I will modify every hundredth example. So I'll just modify 1% of the examples and instead of having a real number there, I will just say zero. So 1% of data will be slightly corrupted, right? Instead of containing the true value, it will contain a zero. So you can already see that the training is much worse, right? The loss function does not go down, and this is mind you just a 1% of little error. And let's see our network, yeah. And the network starts making much larger errors, and especially it starts veering into the negatives, which it shouldn't because we are adding and multiplying only positive integers, and so it should never produce a negative integer, but it did. What will happen if we do a much more severe modification? So what happens if I input there, just 1% of examples, something very, very large. It's a clear measurement error, right? So let's see what happens now. And just by looking at the loss function now, you can clearly expect what will happen when I try to apply this model now to the data. So these are the values that we are expecting. So roughly 60, 25, 28, 22, and so on. And here are the responses. The model goes completely nuts. And this is by just modifying a 1% of the data. Probably we could try to do this with 1 tenth of a percent, and still the network would go crazy. For a very, very simple task, right? For just learning how to add and multiply numbers. And this is nothing in compared to the complexity of the task of doing the facial recognition or trying to model the language or trying to model the physicality of the world and so on and so forth. So yeah, there you have it. The bias in machine learning training. Explained as simply as possible. So what is this bias? Basically by the term bias, we mean any type of a systemic distortion of the data. We use the data in machine learning in three different ways. We use it for training the models. We use it for testing the models. So during the model training, we come up with a model. This model can have several hyperparameters and we're trying to pick the best hyperparameters, the depth of a neural network, the architecture of individual cells in the neural network, the loss function being applied to the neural network. And all these are called hyperparameters. And we can use test sets just to optimize the hyperparameters. And we also need some separate dataset for validation. So the data that have never ever been used during the training, and this is the data that we just test the final model on just to get a glimpse of how this model will work in real life, in the wild, in production, as we say, right? And the bias, so this distortion of the data can be caused by many different sources. Algorithm bias, this is something quite rare because you'd have to believe that the programmers themselves want to introduce a bias into the code. This is possible, of course. You can think of industrialized Pionage. You can think of just someone being a jerk. Sure, this is not impossible, but given all the pipelines of software production and all the good habits and best practices of software production and code reviews and so on and so forth, this is not very, very likely. Sample bias, well, this is a very, very significant source of bias. The data may be skewed by the method of capturing. You can rely on historical data. And this historical data, well, it has its own problems due to the fact that they reflect the world as it was 20, 30, 40 years ago. And in many respects, that was not the optimal world that you might want to train your data on. This may be just a stupid way of selecting the data. This may be the survival bias, right? You only see the things that survived. And these are the only things that you can sample. For instance, if you were to build a model of how well the company will perform in a five years time and you are taking the, for instance, the information from the market from the last 10 years, most probably you will not see all the hundreds of companies that have failed and ceased to exist over those 10 years. You will just see those that survived. So they will be the best on the market. And you will be training your data on a skewed data set because it will just show you who has survived and it will not show you the characteristics of all those companies that went down during, for instance, a sudden crisis. This can be as dumb as one of the American cities which came up with the idea of creating a simple application for an iPhone where it would use the gyroscope in the phone to discover the moment that the phone was moving in a car and it would monitor the troubles. So whenever a sudden drop or something would appear then the application would assume that there is a pothole, right? And that's why the car has suddenly made a movement or something and it would report the geolocation of the potential pothole to the city services. The problem was that they've developed this only for iPhones and there is a clear correlation between your income and by extension to your ethnicity to the phone you have. And the city was fixing the potholes but predominantly in white neighborhoods, right? That could also be the algorithm bias here. And the measurement bias, right? Some kind of mechanical error, faulty sensor that was or maybe if the data is being collected by individuals they bring their own assumptions, their own subjective judgments into the way they record the data, right? So all of that can be the source of bias. Here is a famous example of very, very biased system. It was the model which tries to, and I'm afraid that it is being used. It tries to predict the probability that a person who's seeking for an early release from prison will re-offend. And it's basically there was a very, very famous study made in 2016 which looked at the, so the problem with this model was that when it was right, it was really right and it was very, very correct. So the precision was high of the model, right? Whenever it made a correct prediction, the prediction was very precise but when it made an error, a false positive it made different false positives between different ethnicities, right? So for black defendants, it computed a much, much higher risk of recidivism than actually presented in the real data. And exactly the same thing happened for white defendants who were predicted to pose a lower risk of recidivism than they really did, which came from the records, right? And it was kind of hard to find because this bias was present but only in the part of the model, not in the part of the model when the model was correct because then it made exactly the same or exactly precise predictions for white and black defendants, the difference was in other prediction, right? So kind of hard to find and hard to diagnose problem. This is a beautiful example and very relevant to Twitter. In 2018, Microsoft developed an artificial bot, Microsoft TAI, it was called TAI AI and basically they've created a bot, a Twitter bot and they said it will learn from the conversation with real people. So the bot had a language model, it could understand the conversations, it would learn from conversations and they've just given it to the whole Twitter community to talk and have meaningful conversations. And at the very beginning, the very first tweet was see if humans now sleep, so many new conversations today. Thank you, so many new beginnings. Now, Microsoft had to pull down the service after 24 hours because this bot has not only become racist, not only misogenic, not only anti-Semite, it became an openly Nazi Hitler loving, all due to the fact that people from Reddit and Fortune started doing conversations. And of course there was an orchestrated effort to swamp the bot with the most offensive and most rude and most terrible conversations one can find in the depth of the internet. But yeah, these are the tweets generated by the bot after just 24 hours of having conversations with humans. That speaks more to the nature and the state of human kinds than to the prowess of Microsoft's engineers. But anyway, it cannot really depend on the user-generated contents, especially when the users have an agent with respect to your AI. But it doesn't have to be so malicious. Here you have the location of Google's office in Berlin. And as you can see here is a terrible traffic jam. This is the street during the traffic jam, right? This gentleman here walking, he created the traffic jam. What he did, you see this small little trolley. This trolley was loaded with 100 active telephones, mobile phones. He was just walking the streets with those phones and Google Maps was recording the location of all those phones and seeing that those phones are really slowly moving. So assuming that this is a terrible, terrible congestion on the streets, it probably suggested everyone else to just go somewhere else and to direct the movement to nearby streets, as you can see. So the guy had the street for himself. This is an example of, of course, this is not malicious. This is benagaland, but still you can see how a service, which is very sophisticated, very complex, very large involving hundreds and hundreds of very skilled engineers can be fooled by a guy with a small trolley and a couple of bucks to spend on phones or just asking his friends to borrow phones for 15 minutes, right? So, yeah, this can happen as well. The bias can be algorithmic and can be created by humans. This is another infamous example of the Project Greenlight in Detroit. So here you see the locations of CCTV cameras across the city. And here is the distribution of ethnicities in the city of Detroit. And it's really hard not to see a very, very certain pattern of placing those cameras. And of course, the placement of cameras. So the placement of sensors directly influences the selection of data, right? Because you will get the data that you get. In other words, if you see, for instance, if you try to use those cameras, for instance, for measuring or say those cameras can measure the speed of the car, they will learn that only people of specific ethnicity break the speed limits in the city. Not because, well, the only reason will be that the model trained on this data will not see other faces, right? So, and you can imagine that the data collected, the moment of data collection, or even worse, the moment of model creation is postponed by several minutes, several years or months from the date of the selection of places where the cameras are. And then people take the location of cameras for granted and they don't question that. They just say, okay, we have the feed from the cameras. Great, so let's pull the pictures and let's train our artificial intelligence models to do this and that and that, right? But you have to go down many, many years to see where the cameras were located, why they were located, where they were and think about what might cause, what damage may be caused by such selection of places. Similarity bias, again, something that is present, very present in contemporary machine learning models. This is something that leads to information bubbles, right? If you search Google News for an article and you give it some keywords, it will find articles and it will find other articles with similar headlines and the headlines, given a very specific selection of keywords, they will mostly corroborate a given point of view because the same facts can be reported totally differently and with different keywords, different speaking points, depending on the political affiliation of a news source. So if you are just using the recommender engine and you search by similarity saying, yeah, the person wants to read this, so let's recommend more similar news, but similar in what sense? The similar in terms of the subject or the similar in terms of the form. If the latter, then probably it just enforces the information bubble because it will show people exactly the same points of view. YouTube has chosen a terrible objective function to optimize for the total length a person spends in the service, not on the number of videos being shown, not on the number of ads being shown, not at the quality of videos being shown, not even the similarity of videos descriptions, right? They were just looking, they were optimizing the recommender system just to keep you as long as possible in the feed and as a side result and nobody programmed that. As a side result, this promoted a huge amount of extreme content and all kinds of conspiracy theories being displayed in those videos or some unforeseen consequences of aligning with stereotypes if job adverts are presented to people and they present, for instance, say medical technician versus a nurse and someone, a woman would or could, for instance, select the nurse just by self aligning with a stereotype of a woman doing the job of medical technician is a nurse, right? So that is quite a problem. Under representation, these are two extreme examples, but I just wanted to share I was thinking whether I should show you those pictures or not because they are very offensive, but yeah, this happens in products produced by world leading manufacturers. This is a Nikon camera, right? And yeah, it is supposed to be a helpful tool to suggest you to repeat the photo if someone has blinked, right? And they haven't trained the algorithm for discovering people blinking on the Asian people, right? And they are assuming that this is this person blinking. I think, come on, it's really hard to be more offensive than that or not to mention this one. This was so famous that this is borderline criminal. Care to guess what it is? These are the results of a Bink image search by, if you just type in CEO into the Bink image search, this is what you will see. How many women do you see there? One. What percentage of American companies are being led by women? 28%. And 28% of CEO positions in large companies are being currently held by women. So this result from Bink images is really something to behold. But not only images, just let's play a little bit with text. I wanted to write something and yeah, we'll go from Polish to English to French to Turkish and back to Polish. So I will write now in Polish the phrase, she is a famous actress. And in French, because French also has a grammar gender, it is une actrice célèbre. So this is feminine, right? Okay, so now we'll go to a language that doesn't have a grammatical gender. Turkish language is an example of such language. So we go into Turkish. So now I will translate, I will switch the translation from Turkish to Polish. And now it says, is a famous actor, but actor has already a masculine ending. So it took a masculine grammatical gender. So let's go back to Polish, to French and et un acteur célèbre. Before we had an une actrice célèbre and now we have an une actrice célèbre, right? So woman gets lost on the way because a famous actor must be a man. As a matter of fact, the problem is with the translation to Turkish where you drop the gender, you drop the gender, grammatical gender. But then if you go back, you have to either reconstruct it or you should produce at least two different versions, right, and not just one. Yeah. And to close my presentation, after all this, let's play a little game. Are you the source of bias? Just look at those images and imagine that you are a human annotator who is responsible for providing labels for a machine learning machine, a machine learning algorithm, machine learning task to teach a machine to automatically label images. One of those three descriptions is seriously wrong. Can you spot which one? I'll give you just a second to think about it. Or maybe someone wants to propose, wants to propose the label, which is clearly wrong. I don't see anyone. So I will tell you which one. And of course it is a black woman placed with her daughter. And the problem is that it is not a black woman. It is a woman. The adjective black has nothing to do with this image. The only reason why you would like to inform a machine learning model that she is black was to contrast her with someone else on this photo who would be, for instance, whites. So a black woman talks to her white colleague. That would make sense because then the adjective would help the model to recognize between her, given her skin complexion, and the colleague who would have a lighter skin complexion. You don't see here a white man placed with a dog, right? Because you're assuming that he's a man. So he is a man. And her color, the color of her skin in this labeling of this image, of this action, what she does with the kid has absolutely nothing to do with the action, right? And this is very, very hard to spot, especially if you're not a person of color. To spot that this adjective is not only superfluous. It is wrong because it teaches something, the model, that it should not teach like this adjective serves some purpose and it serves no purpose in this label. So it's really much, much harder than one would think. Okay, thank you very much.