 But hopefully the aim is already fulfilled, so let's just reinforce it. And just as a side note, I know it's sort of almost the end of the fourth day. So I have some gratuitous references to British pop culture in the presentation, and if you spot anything, shout and you'll get a sweet. I'll try to throw it at you if you're close enough. Okay, so before we dive in, I just want to credit some people who have really inspired me to think about that, and I kind of learned a lot from. The first one is Zaynab Tufeksi. She's a researcher, really interested in technology. And she thinks really deeply about these things, and I can really recommend her. She's got lots of articles, not only about this topic, but also many things related. She's got a book, and there are online talks that you can see. Kathy O'Neill wrote a book on the topic, it's called Weapons of Math Destruction. Also, we've probably heard about it already, but it's kind of the Bible of this area. And finally, Catherine, who gave an amazing keynote yesterday morning, a member of our own community, and she also gave a talk on pretty much the same topic I'm going to talk about today at Piedate Amsterdam a couple of months ago. So I tried to make sort of talk about things from slightly different perspectives, but the areas we cover are kind of similar. So I do my best, but I can't hope to cover everything as well as they did. And I just to note, I think it's also kind of interesting that these are all women with different backgrounds. And I think it also goes to show how important diversity is. And I know that our community cares about it a lot, but there is still work that we need to do. Okay, so does anyone know what that is? There you go. You go, sweet. So this is Deep Thought. It's a computer from Hitchhiker's Guide to the Galaxy. And it was built by super intelligent, multidimensional beings who got fed up with existential angst at some point and decided to build a computer to give them the answer to life, universe, and everything. And they built this computer and then they asked for the answer. And the computer told them to come back in seven million years. But what is a bit of waiting for such an important answer? So they did. They waited seven million years. They came back and then the computer told them that the answer is 42, but it doesn't know what the question is. So this is just a funny science fiction novel. So what does it have to do with reality? I think there is actually a connection and one of the reasons that Hitchhiker's Guide to the Galaxy is so funny is that it has some very pointed criticism of our society and the way we think. And I think it brings up some really good points. So first of all, if your question is rubbish, you will not get a useful answer. And that might seem pretty obvious, but keep that in mind because a little bit later we'll come back to that. It's not always obvious to everyone. But also secondly, let's think about for a second why did super-intelligent, multidimensional beings decide to build a computer to answer this question? Like why didn't they instead decide to invest in humanities and let a well-funded philosophy department sort this out, right? So I think one of the reasons it was that is because we think computers are objective. A philosopher has their own personal biases. They have their own points of view. And basically they cannot be trusted, like not really, right? They are definitely biased in some human way. Whereas computers are governed by ones and zeros and algorithms and logic gates and have no morals. So whatever answers they give us are unbiased and objective and therefore true, right? Does everyone agree with that? That the answers that computers give us are always true? Awesome. So either I don't have lots of work to do to convince you or I'm very good at posing questions in a very suggestive way. OK, so let's say I was exaggerating a little bit. Computers aren't always giving us the right answers. But when they give us the wrong answer, it's pretty easy to tell. Like you might have a convolutional neural network that is classifying images. And it might misclassify an image of a cat as a penguin. So you look at that and you think it's an obvious mistake. It's kind of absurd. It's very easy to spot. So you're not really worried about it, right? It will not trip you up in any dangerous way. Do you agree that when a computer gives you an answer, if it's wrong, you can tell whether it's wrong or not? OK, at least one of you agree with it, but probably not most. OK, so no one agrees. Awesome. OK, so even if you can't always spot mistakes, and I guess I already was going to answer, but do you agree that at least things like racism and sexism are not computer things? There are human prejudices, and computers don't have them? Right, no one agrees as well. That's great. Sure, but when a computer gives you an answer, the question is whether it can contain biases like this or not. So I guess no one falls into this trap already, which is awesome. So do you know who these people are? So there was an article published last year by ProPublica. It's a publication. And basically, they explained how predictive polishing algorithms work and how they get things wrong. First of all, when you say predictive polishing, a few years ago I thought it would be a joke. Like Minority Report style thing was a joke phrase for a lot of time for me. And when I read about that, I was really kind of shocked, but turns out it's not a joke. It's an actual thing. Both of these people were arrested in Florida in 2014. The person on the left, her name is Brisha, and she was arrested for stealing a kid's bike, sort of cycling on it for a few meters, throwing it away and then running. But she was caught and arrested. The guy on the right, his name is Vernon, he was arrested for shoplifting some stuff that was of similar value to the bike. So they were both arrested more or less at the same time independently. And when they were arrested, their risk of recidivism, so risk of reoffending, was assessed by a system called COMPAS. So it's a thing that the police in Florida uses. So they put data about the person into the system and the system says, this person has high risk of reoffending, this person is low risk and so on. In this case, Brisha was given high risk of reoffence and Vernon was given a low risk assessment. Now, the article was published in 2016. So two years after all this happened, and by that point, we already knew that the system got it wrong. So after two years, Brisha didn't reoffence while Vernon broke it to some warehouse, stole much more valuable things and is actually in prison. So, you know, the system got it wrong, but you know, shit happens. We know that computer design is perfect, so sometimes they get things wrong. And let's put aside for now the question whether any kind of algorithmic inaccuracy is acceptable for such an important system because there are even bigger problems that the journalists found. And, namely, there is a racial bias in the system. So people of different races tend to be misclassified at more or less the same rates. However, white people, when they are misclassified, are given too low risk assessments while black people, when they're misclassified, are given too high risk assessments. So there is racial bias in there and it's actually impacting people. And there is much more to that. The article itself is really interesting. There is a separate blog post detailing the analysis that they did. There are also some rebuttals to it. I really encourage you to look into it. It's a very interesting topic. So, let's see how it might happen that a system like this can be biased. Do you know what these are? There's no sweets for that. These are reward vectors. So, I'm just going to quickly explain it for those of you who don't know. In 2013, there was a paper published by Google on natural language processing and they introduced a system called Words2Vec and the idea is you take some corpus of text, you put it into this model and it spits out and embedding. And it means that what it spits out is a representation where each word is described by a set of coordinates. So, in this case, it's something like 300 coordinates, 300 dimensions, but you know, you can think of it as like a three-dimensional space and each of your words has a position. And if you imagine a word men and a word king, there is a vector that takes you from one to the other. The cool thing about Words2Vec is that these vectors, the relationship between words, are meaningful in a way that you can take this vector and instead of applying it to the word men, you can apply it to the word woman and instead of getting to king, you'll get to queen. And this is awesome. It allows you to do this kind of vector arithmetic. There's a question. Axes are arbitrary. There are these vectors in high-dimensional space, so this is just a conceptual representation in two dimensions of this 300-dimensional thing just to get the idea across that there are relationships between words. I don't know. So, Word2Vec is taking a word and putting it in some kind of space that Axes don't necessarily have interpretable meaning. It's just modeling relationships. So things close to each other, for example, might be related. Things far from each other might not be related. I don't want to make this about Word2Vec. We can talk about it afterwards, but it's not the topic of the talk. The point is that this is an awesome technique. It's very widely used. It's extremely useful. And it's been used in many, many papers, including at NIPS. So NIPS is this premier conference for AI, basically. Any paper with neural in the title is very welcome there. And last year, there was a paper that described how Word2Vec is also biased. So, it turns out that Word2Vec is trained on Google News. So it's a huge corpus of data and there's no... Most people, I assume, writing news are not biased intentionally, but just because of the way our society works right now, there are some biases there. And one of the things they discovered is that if you take the word man again and you can get to a computer programmer. So there's a relationship, this vector. You can think of it simplifying as saying it takes you from when to a profession. If you take the same vector, apply it to a word woman, it takes you to homemaker. So there's clear bias there about which kinds of professions people are expected to have, even though it's not a really fair question. Absolutely, so it's the problem of the training data. Sure? Right, so the point is that even though your technique might be completely fine, if you're using biased training data, you will get a biased result. And it's not surprising, but it took three years after publishing the original Word to Vec paper to, for this paper detecting the bias to be found. The paper also shows the ways of addressing that and the ways are basically trying to morph the... Sort of preprocessed the training data to remove these kinds of biases. If you've heard of metric distance learning, it's kind of a similar idea to work the space in a way to satisfy some constraints. Some relationships should be kept, some other relationships should be thrown away. And there are also other things you can do. Some of them... Yes! Right, so if you have some training data that you have gathered yourself and you know that there are some features in it that are actually biased, but they might be biased due to the way you captured it. And you might explicitly, after thinking through it, you might explicitly want to not use these features for classification. So you might just get rid of these columns and you might be fine, but that's not always the case. It's possible that some other features in your training set encode these features that you just got rid of. So even though you get rid of them, if you think of an example of data about people, name and where they live and gender and race and so on, let's say you want to get rid of gender and race, if you look at a role describing, giving someone's name and let's say interest or postcode, they'll have a good chance of figuring these out. And our models are also capable of doing that. In fact, we spend lots of time making sure that our models are capable of building up these hierarchical representations and running with it. So they might still use that to do the classification, even though you might not want them. And to understand why this is problematic, I just want to share a little story that Zainab Tufeksi also mentioned. So she went to this conference for HR professionals at some point and apparently there everyone was really excited about the system for taking a huge pile of CVs and matching the best people to the job openings. So you don't have to troll through your CVs manually, you just do this for you. And that sounds great if you ever did any hiring, it's a huge pain to read all this stuff and then try to figure out who's a good fit for what. So if we had a system that does it automatically, that would be awesome. Let's put a pin for that in that for a second. At the same time, I'm aware of at least two papers who did that. It's possible to take data about someone from social media and predict sort of classify how the risk of this person getting depression at some point in the future. One paper used posts from Instagram, the other paper used posts from Twitter and they were actually able to fairly accurately detect who's lucky to get depression sometime soon before the initial analysis, sorry, initial diagnosis. So it's kind of cool that it's possible and I'm sure it can be used for good but it's also a little bit scary and I'm sure you can now put the two and two together. Imagine that the system that you built for matching people to jobs is based on training data. You have people that work for you already, you know who are high performers, who are low performers and you can sort of extract their features and try to train your system on that. There's no reason why you shouldn't use publicly available social media data. The scary thing is there that the system might learn to discriminate on things that you really don't want to discriminate on. Like likelihood of depression or likelihood of being pregnant and things like this that you are explicitly trying not to discriminate against, it might still be encoded into data and you might not even know about that. And this is kind of a tricky problem to defend against. I think the best tool I know is, I think Catherine mentioned it in her pie data Amsterdam talk as well, is this legal term used in the US. It's called the disparate impact and there's a precise definition in the formula and it kind of treats your model as a black box and you can figure out whether your black box is biased or not by, you know, tweaking the parameters, seeing what comes out. The thing is that it takes a bunch of effort. You need to be conscious of the fact that this is possible and actively try to investigate it. But I hope I can convince you that this is an effort well spent. The other idea is that some errors can be very unintuitive. So like I mentioned, you know, cats being misclassified as a penguin, that's kind of, you know, we can kind of understand it. There is, again, at NIPs, but this year and in December, there will be a competition run for the first time about adversarial examples. And the aim is to, you know, you have a classifier and your aim is to construct a data sample that looks fine to humans but will get deliberately misclassified by your model. So, you know, we can add some specially constructed noise to your cat image to get it classified in a specific way. At the same time, the other teams will be tried to build models that are robust to this kind of thing. And I think it would be very interesting thing to watch. But there's another even more scary problem, perhaps, that there are things that no one really intends to happen but they happen anyway because some models, when they're not interpretable and a lot of them are not, make mistakes in ways that we can't really comprehend as humans very well. Now there was a talk earlier today, it was really awesome about interpretable models and there is lots of research going into that area. I think, you know, things will probably get better from that perspective, but it's still not perfect. And I just wanted to sort of point this out because it's something to keep in mind that the errors that AI tends to make are often very different from the kinds of errors that humans can make. Multiply them, that's right. So, the idea of stupid questions, I promised to come back to that. Did you ever see paper where this image comes from? Okay, so just to disclaimer, this paper was not peer reviewed, it was just published on archive. So, you know, don't take too much out of it but it made lots of waves, it was even covered in mainstream media and I think it's important to talk about it. In this paper, they had a data set of facial images and for each one, they had labels saying, this person is a criminal, this person is not. Guess what they tried to do? Take face and predict whether someone is a criminal or not. And you know, technically it's not really interesting, they just took Alex Nets, if you know about CNNs and kind of retrained it a little bit and got a very good accuracy. You know, after the paper was published, lots of people were outraged and rightly so. There are sort of many obvious problems with that but let's just think about some, like the row at the bottom is labeled as non-criminals, the row at the top is criminals. Like, do non-criminals smile? Like, are people with white colored shirts never criminals? I think that's a little questionable. So, you know, again, silly example in some ways but that's also something important to keep in mind. Make sure that the question you're asking actually makes sense. Finally, this one is a little bit more of a cure. That's right, so it's from a show, Little Britain and this is Carol, she's a receptionist and she often helps people that try to get admitted to hospital or, you know, get a bank loan and her trick is like, computer says no and like to everything. But I think it also illustrates the points. You know, we started off developing these helpers including machine learning but not only to make our jobs easier, to make us more efficient at making decisions and so on. But now, it turns out, a lot of time we just defer our decisions to the computer and when the computer says no, we just are not allowed to do it or unable to do that. And this is again something to just think about and keep in mind whenever you are developing something that might, you know, might be able to help you just consider the implications and what happens when the system gets it wrong. I also really liked the quote and again in Catherine's keynote yesterday, there was something about a guy who helped develop a system for a bank to write checks. You know, he was wondering if it was an ethical thing to do because if they didn't do it, then the bank would have to innovate in a sort of organizational way but they had this new technical thing and they could just preserve the status quo and I think lots of that is happening right now as well with AI. There are millions of decisions that need to be taken every second. We're incapable of doing that as humans so we just give it to the machines but maybe that's not always the right thing to do. So basically what I would like you to do is first of all, read up on this topic. You already seem well aware of, you know, that this is a problem. Start thinking about it and talking about it with your coworkers and friends. There are lots of meetups and conferences and I encourage you to really take this seriously. I think it can be really awesome if we do this right but it can be kind of catastrophic if we don't. So the answer to why computers can be asked for is because we make them. So please don't. Thank you. Question. Do we have time for questions? Okay, so thank you very much for your talk. It's kind of like enlightening. What I was, I want to ask you is like maybe you should turn the question around. Like you're saying the data that we're putting into our models is biased and we need to kind of like prevent our models from this biased data. Maybe we should turn it around if you're thinking about ethical implications and kind of like say, well, we found out that our models are biased and we're actually able to quantify this and we can publish this and can tell you well, there is this kind of bias in this data and kind of like make people aware of these issues because we do engineering models and we do classifications and at the first point we have a lot of false positives and it's the fault of the model and then afterwards we find false positives and we can go to the engineers and we can tell them, well, there is a false positive but maybe your data is wrong and we can improve quality there. So maybe that's an approach we could take in the future. Yeah, I think I totally agree. There's kind of a more optimistic way of posing the problem and maybe that's more constructive. And this talk that I mentioned from Mikhail just a couple of talks ago, I don't know if you've seen it, but he presented a Python package that's called Eli5, explain it like in five. It takes a model and then tells you how the model makes decisions so it doesn't necessarily answer the question whether it's biased or not, but it might help you interpret it and then be able to answer it. So I think there's much more work to be done in this area but going this direction I think is the right thing to do. So what are your concrete recommendations with respect? I mean, you just said be aware of this stuff. I'd like to pick some of the most problematic examples, predictive policing, using that to figure out where to deploy police and predicted rate of recidivism and something that started to happen recently at least in the US is using that in sentencing judgments. So making people's prison sentences longer if you think they're more likely, if the model says they're more likely to be recidivist, is this something that absolutely shouldn't happen or just, what is your specific recommendation, be aware of bias? I mean, as soon as you get into. If I could just point out, you cited the criminal justice system of the state of Florida is like an anti-pattern for how to do stuff. It's one of the worst examples in the world of how to do these things and so any model working in that system is gonna give you garbage results. Okay, sure, it's still something that I believe should be pointed out. So, I mean, as soon as you get into ethical questions and moral questions, there are no silver bullets. So I can't give you a recommendation saying do that because even if I tell you do that and it's most right most of the time, there will be cases where it's wrong. So, I will hesitate to give you a straightforward recommendation. That's why the best thing I can come up with is think about it. If people building compass explicitly thought about racism and tested for it, this could be avoided. Now, of course, compass is embedded in a larger context of the whole justice system of the state and also the US. So, you know, it wouldn't prevent everything but like we have the power to change things a little bit at the time and I think we should try to do that. Okay, there was an idea that we can find out that some model is a biacid and try to fix a bias by some fixes but I don't really like it because if you fix it by hand, you can interfere another your personal bias into this data set. And the second point is you are getting a fake reality instead of something. Sure, so it's a good point but the way I look and I kind of also thought that for a while but the way I look at it and I'm not sure if it's right but it sounds more right to me is that it's not about taking pristine data and messing with it. It's taking messed up data and messing with it in a different way. There is no, neither of them is absolutely right but you know, you're messing with it manually because you have the context, you understand what it represents and you have opinions about, okay, I think this shouldn't happen so I'll try to make sure it doesn't happen. Now, there are very few examples where you can just take data and say it's the perfect reflection of the world. So to me, that's kind of the answer. You just try to make things a little bit better but none of them is like the true state, the data you have after you gather it is not true in an absolute sense. Thank you for the talk. I think it's very important and maybe it's more kind of a remark but I have a question attached to it. So I think, isn't the problem maybe the same as we had in statistics all the time is that we use correlation because we are just unable to get causation really and therefore actually all these methods, I mean as an ethical rule, should we only apply to things which have not really had negative consequences for an individual because every classifier will, for individuals, get wrong answers. So independently of how we do this. Okay, so I mean, I'm not quite sure what the question is but sure, there are, I think thinking about it from like a correlation perspective, it's true sometimes but there are other things that are much more difficult to put in that context like reinforcement learning. You know, there are, once you start getting into that, you know, intelligent agents, they might also, like the relationships between things that they do and the reward they get might be very non straightforward. The reward they get might be maximized in a way that is totally not what we actually intend them to do. And it's just, I think that's part of the problem but there are many different areas that can be kind of, many different problems can be discovered when you think about what you do from that perspective. So I like the idea as somebody mentioned over there of, okay, so you've taken this model and run it against this corpus and found out afterwards, oh wait, there's a lot of bias here because of the corpus turning that back around and saying, okay, so what do we need to do to change things so that in the world, say the news stories are less reflect, are less showing bias or things like that so that we can actually turn it back and fix the world. Yeah, absolutely. So I think, again, no easy answers but the reason why I think this is important is that when you build a solution that's biased in a way, it doesn't only reflect the world, it actually reinforces the bias because of this perceived objectiveness. People tend to trust things like that. So you actually are making things worse if you do that. By not releasing biased models or whatever you can do, that's one way to prevent injustices in the world as grand as it sounds. But obviously we should go out of our bubble and there are also lots of other things we could do, get involved in your, I don't know, local newspapers, have them out or join a political party or whatever you think is the right thing to do, doesn't necessarily have to do with programming. We have time for three more questions. Hello. One of the things that seems to happen is that people publish a story saying, sorry, publish an article saying this news outlet or the news outlet news on Google News is biased in this way but would you suggest that companies try to take these data mod, these machine learning algorithms and run them internally and then analyze purely their own data to see if they can detect bias in-house so they don't need to be named and shamed, they can do it internally to maintain their standards and then try to... Sure, I'll let the bias in. Yeah, so I think of that a little bit like any kind of QA, like finding bugs, you can try to find them in-house or you can wait for your customers to tell you. It's the same thing here. You can have some process for trying to discover the things before you release it or you can wait for ProPublica to come to you and say, hey, you know, you're biased. So, absolutely, I think it's not... I don't think it's very common right now, at least I'm not aware of that, but I think this kind of testing of models, whenever you retrain your model, you have to test whether it's still performing well or not. We should definitely think of incorporating these kind of checks into these processes too. You mentioned there was a study done on Instagram and Twitter users to determine future likelihood of depression. Firstly, how did they possibly find out if people were depressed in the future unless it was a controlled study? And number two, what are your thoughts on... If you're doing a medical study on people, you have to run it by an ethics officer or an ethics board to make sure that it's ethical. Do you think something like that should be mandatory for models which will affect people's lives in this way? Okay, so to answer the first question, I'm not really sure. I will have to reread the paper, or there are references here if you want to do that. It's a good question. The second question was, do we need ethics committee to approve our models when we use them in production, basically? So, should the software industry move to the model that medical industry is doing and have an extensive ethical review? I don't know, there are definitely different requirements. I think some software is sort of mission critical when it governs medical equipment, for example, and it definitely does go through this process already. If you're Instagram, it might seem like, oh, we're just posting filtered photos, you know, what can go wrong? So you might think that's not necessary. But then again, things like this happen and all of a sudden you're involved. So I don't know whether some kind of mandatory official requirement for doing these things would be a good idea because it has a set of drawbacks as well. So you have to iterate much slower and so on. But I think where we are now is at the other end of spectrum. We don't think about it enough. I think we should pull towards that side even if we don't want to get all the way there. I have a last question. What can we actually do to make the general public more aware that computers actually could be as follows? Great question. If you're a writer, write stories. If you're a filmmaker, you know, make animations. I think if we are aware of that as a community and it seems like we are getting aware of that, things will seep through. Like there are lots of sort of knowledge or myths if you want about AI in the general public already. You know, everyone knows Siri or like Google Photos and stuff like this. These things slowly seep through and like my most boring but straightforward answer is just be aware of that and everything else will follow. But if you want to specifically focus on public outreach, I think that's awesome. But I'm not an expert in that area, so. So the question about how did they find out whether the person was depressed was people who reported being diagnosed as depressed. They did a study on a lot of people and used that to predict. So they had posted on Twitter, for example, that, hey, I just got diagnosed as being depressed. So that was where they got the data on, oh, they're depressed. Thank you. Thank you. And let's thank Maciej Gricka again for his talk. Thank you.