 So hello everybody and welcome to this talk about our ethical issues male lurking We thought anyone noticing into your most innocent looking database application. So if you heard about me My name is Sarah. I'm working in the machine learnings in 2012 And I'm currently working at a very cool company called people talk So as Nicole says yesterday, we are Working for human resources and we have some position open to check it out if that may interest you And I want to point out that I have some strong interest in ethics But I'm not an expert or anything. So this is more like a general overview of problems And I don't have any big solution to bring to the table or something so ethics Ethics is this Like part of philosophy which may seem sometimes a bit boring But one thing all moral philosophy love to do. It's a good thought experiments So I propose you to do that today a thought experiments promise. There won't be any trolley involved So let's build an application all together using machine learning What would we want this application to do we want to help? Oh, sorry, but that So we want to have high schoolers people from high school to find the good University for them and the good that to choose the good major Why do we want to do that? Well, we all know that teenagers they sometimes do things which are not really the best thing possible for them and They also like experience. I mean, it's not a fault and they may rely on cliche to choose their career and not an objective information Further more we have in tech and in also many of our domain a problem with diversity So for example, there is very few women in text and that's my base something one one of the possible cause of that is that girls may self-sensorship Themself with respect to choosing like so called un-feminine major such as computer science or math And this field actually may be like the good choice of them and a feeling which they will really thrive and bring something So that's something we want to change So first we want to build a database application. So we're gonna need to collect some data, right? We're gonna say that's pretty easy. I mean, there is no really no big deal about Collecting some data. It's all a fairly neutral process. So we want to measure academic performance Okay, right. We're gonna use grades. That seems pretty trivial Well, I don't know. It is in your country in France. It is well known that some high school are like More strict about giving good grades than other. So maybe that's not Such a fair Matrix like if you come from this restrict high school, you get lower score lower grades and that doesn't really mean that you're World student. So maybe we should rather use like Grades with a weighting sheen based on the high school level But then if you come in a so-called low level high school Even if you've got already the top grades possible, you will be penalized. So that's not fair to you and Maybe the real question I want to ask is do we ever want to use grades at all as Maybe you want to use also like teacher appraisal of the student work. So It's not that simple thing is Whenever we're collecting data we are collecting data which are coming from our world and our real world is like bias and prejudice and So we may encode some of these bias into our our data sets For example, as we said, there's fewer girls in technical Studies as their boys. So maybe the algorithm will learn that gender is a strong predictor of interest for computer science And that's not something we want to do You're gonna say, okay, easy. You just remove all Gender and other protected attributes mentioned in the data set. Yeah, but these attributes are most often Redundant and they occurred it for example, which to these you choose whether you go to the university or not It's strongly the bandit of your social classes That is also The your social class also is correlated with which high school you went to which option you choose in this high school Which language you learn so not that easy and Another thing I want to point out is something that statistics call something bias. That is to say some indicator may be Some indicator may be not uniformly uniformly measured on the on the population For example, let's say we want to in addition to mark two grades We want to consider whether the pupil has taken part into an extracurricular math competition that that's some son sounds like a great indicator to show that This person is really interested in science Problem is like teacher are more likely to ingrate boys than girls to take part in those tech competitions So that's a biased indicator of interest for science Great. So now we've got some data and Some of them it's categorical. We've asked student about what is their dream job. So now we need to encode some text and That's not so easy So there is two main way of encoding text one of the eyes is bag of words back of words is relying on Sparse encoding basically it's a one-hot encoding and you choose a vocabulary and then you put a one if the word at the position of Vocabulary is the world if the world is in the text Another maybe better way to encode text is using word embeddings such as what to make This is learning Dance on bedding into a high-dimensional vectorial space and the great thing about Word to make and word on bedding in general is that very retain some similarity information So for example on bag of words, you can see that every word at the same distance from each other and Whereas for if work to make nurse and physicians or medical doctor, they are closer than nurse and math teacher for example That seemed like a great thing and furthermore Word on bedding can learn Analogies that is to say you take the vector for King you subtract the vector for man You add the vector for woman and you get the vector for the world Queen. So that's very great. Let's do that with Job names. So you take developers to subtract man. You add woman. What do you get? Any idea? Homemaker. Oh, yeah, that's not so great, right? Thing is it's the same problem. I was said before we're learning from all that I will produce in our society So we uncoded the bias and also that were in our society So calisthenics and co-author have shown that all the buyers we can you you can find in the human population With test just such as the implicit association test can be found again in words on bedding and Good news is if you know what you're looking for and you will need to take sometimes It's possible to like the bias this word on bedding So for example, if we take a gather gender thing We've got brother and sister which are gender words and we want to keep this gender information That's all the job names The occupations they should be gender neutral so we can learn this linear transformation to put that Physician and nurse and computer programmers and homemaker all on this neutral line So now that you've got some data it's time for a small message from the security team and The thing is as soon as you're handling data and especially Sensitive and private data you become Corresponsible for the security of this data Main message is that the only data that cannot get stolen is that I don't have a Coal area to that is if you no longer need some data you should delete it And I know it's like the hardest advice to follow ever because we data scientists are big like dragons Holding a pile of gold like that ice or treasure. We don't want to start with it but really you should consider that and Protect the data of the user your own link and if possible you should use you should work on anonymized data Anonymization there was like this tutorials to yell two days ago About that. It's like a wall topic and it's not just removing personal identifier For example, if you take all students of a class, they're probably gonna have the same age so if you got the student which is two years early or two years late in on the curriculum and you just keep the Date of birth and the high school name in your data set you probably gonna be able to identify uniquely this Student so it's not anonymized So now that we've got some nice data and of course we've encrypted it and we treat it with care It's time to go to the more machine learning part and we're gonna learn a score to predict Academic success because as we've seen grades is not are not great So we want to combine several features to do something better first question is How do we? What's we wanted to be fear fair, but what does it mean to for an algorithm to be fair? A first answer would be well, it should be calibrated That is to say if we have a certain reference score the probability of graduating is the same for everybody Regardless of which subgroups you belong to here by subgroups. I don't know. That might be a male versus female or that may be ethnic subgroups or social class subgroups or pretty much anything you you can split in so calibration is important then as a University gonna they're gonna use our score to to choose which students they should accept So they want to know if we they set the score to a certain threshold and they take everybody that is above that threshold What's a probability of them graduating and we want this priority to be the same along also groups once again We don't want to have a difference between for example White and non wine student If you're a student what interests you is more like the false negative rates That is the probability that even if you would have graduated you get a two-law score to be admitted to this university and that also we want it to be the same on all subgroups and Another way to express that is that's the average score for all student graduating is the same if you Split it in different subgroups You can do the same for false positive rates that is students we predict it They would graduate and they fail and we want also to have no difference between subgroups and same average and last we might want to have some equal acceptance rate that is to say that the Students in the university reflect the diversity you have in the general population So you have the same probability of being accepted regardless of your which group you belong to and At least for me all this criteria seems quite Resonable the problem is you can't have all of them and it's not like Wishful thinking it's actually a mathematical theorem There's quite a lot of literature on the subject And you even if you pick only three it's difficult and if you pick two and you add this Individual fairness requirement that is to say you can't choose the threshold based on which subgroup the student belongs to You might not even be able to have two of these criteria So we need to choose what you what you've talked about when you say my algorithm in spare And you need to advertise that it's fair with respect to that metric So that's everybody know and the shareholders no one can make like informed decisions Okay, now it's up to the fun part. We can build some models. So Probably just go to deep learning. It's really happy and everything, right? Let me tell you a sorry. There is that story once about to go in a Hospital far far away. There are some experts who wanted to to learn a predictor of whether a patient who has Gone undergone an operation would could safely go home and should be stay Should or should stay at the hospital to be like they have a high risk of complications So it's better. I stay here. She stay at the hospital and They learned decision trees and it works to really well and it was great And so the doctor say okay Yeah, what are the criteria that are used to determine whether a patient to go home or not and they realized that asthmatic patient asthma was like a predictor of low risk and medical doctor like oh that's strange because actually asthmatic patient has raised so high risk that we always kept them in intensive care units and so as there were this Patient where kept in intensive care unit that a very low rate of complication and that's what's the algorithm pick up But we definitely don't want this decision tree to go into production So that was a good thing that it was interpretable, right and neural network We won't have we won't have been able to pick that out So as you seen at every stability can be used to debug and get a feedback from domain experts it's also in the Area of ethics it enables to make sure that decisions were made on fair ground and not use any discriminatory measure and For example for GDPR there is this right to explain an explanation That is to say you have to be able to explain how your algorithm is working So as I said so huge topic so free a quick point before building a model you can have some Interpretability using exploratory analysis such as clustering and all Visualization techniques so principle composite analysis the casting neighbor estimation While you are building your model you can choose to and force power to enforce interpretability For example, so using decision trees using everything that is rule-based using everything that is prototype base Which has can iris nailed about Using ensuring sparsity so that's not too many of your features are coming into play so I wanted to show quickly because it's your python so that's a slide from the Python library cause called li five so from the meml explain like I'm five and it's a pipeline combining bag of words and logistic regression and words that are highlighted in greens are actually a world which are which were the more predictive of a classifying this text as medical literature and That's a great way to check. That's your text model is actually learning something useful and not rain too much on noise That's better at least deep learning is still really useful So you can have some interpretability after you beat your model So I thought building a surrogate model which would be an attachable model You learn on the input and output of your black box model or doing sensitivity analysis such as this performed by line Which stands for a locally interpretable model like no stick Explanations and so they fit a little linear model which is a bit like a first-order limited Development and so that you get an idea why this point in spirit in particular was classified as it was Another huge subject if you're a bit in the field of machine learning you might know minority classes and Minority sub concepts. So the problem is that's the less that I you have the less accurate you are So you can get something like that like if you've got really few points it gets really messy and The problem is You we all tend to use accuracy because it's easy. It's a single measure But you can have a very high accuracy on your majority class and then still perform very badly on your minority classes And it's not acceptable in many sensitive field for example Imagine a process in for college application For white students there were really four least to study the application and decide and for all of us to dance They just throw away all the application in the air and they pick which form quite close Like a random toss of a coin you don't say oh, no, that's not process You want to have so really just don't add that in your algorithm and say it's okay It can get even worse For example suppose that we have in big cities if you are in big high school You have a lot of options to choose from you can learn fancy language So you have a higher chance of academic success and if you're in the countryside High school are not that fancy and you don't have any fancy fancy mention anyway So if you're a small high school, you have a lot of support from your teacher and then you get better chances as success But then if you put everything together The minority sub concepts or what's happening in the country? It's just becoming noise for the majority classes so just actually Drowning out a total So if the experience of a big part, I mean minority, but that's gonna be still a lot of people It's drawn by the majority concepts Okay, now that we've done some Algorithm we want to evaluate that and we all do that while it cleanly So we are gonna evaluate on a validation that I said because we know that's the right thing to do Yes Except that's your pre-processing step or maybe off a true selection step I was a talk about that this morning. We did that on the world that I said didn't we right? Yeah, that's not so good, right? And we had three algorithm and we choose the best of three on our test data set And we also presented the result on the test data set. Yes, so that's a bit like indirect overfitting maybe And also we use accuracy even if you had minority classes, that's not really honest so it's really important to be able to Give an honest feedback and on a honest evaluation of what our good your algorithm is performing So that's stakeholders can decide with all the information whether It's okay to be put into production or not The fact is that bases are not only in the data We asked that I am as data scientist. We are human and doing good scientists really hard and We are subject to cognitive bias as everybody like Apophenia so apophenia is the tendency to see patterns in noise and for data scientists It is not at all an occupational hazard, right? and then there is illusory causation or confusing causation Correlation and causality these graphs is about the number of degree awarded in civil engineering in the US and Consumption of Mozart at cheese, so maybe the causality link is a bit weak and Once we find Some causality we are you think it happens we tend to only notice facts which are Well confirming what we already believe Now It's a good time. We're gonna put our application into production. So here are some quick numbers in 2017 the dropout ratio was 10% the general dropout ratio and so it was 9% for boys and 13% for girls and the gender ratio was three quarter to one quarter and we want to minimize the dropout rate So we're gonna learn our algorithm blah blah blah He says okay very more girls which are dropping dropping out then boys so probably it's like we're sending girls Which are not really motivated. So let's do that and try less girls more boys Said that as there's even less girls the atmosphere grew more toxic There's more maybe sexist jokes or that kind of things or so there even more girls which are dropping out But still it's fair in the minority the dropout rate is still decreasing not by marriage but I still at least it stays stable and That's exactly the kind of thing that can happen with speed back loose is that's three easy to spin out of out of control and To if you're not checking the right metrics to not to even realize what is happening and to have this Reinforcing self-reinforcing loops in which you are doing something you really do not want and you're doing it more and more Okay I've got some great news. Just got a phone call and now you can deploy our algorithm to the wall of France How cool is that right? Yeah, except that even if we managed to get to this 99% of accuracy out of the 700,000 students which are Taking their baccalaureate this year in France that still mean that we're going to get 7000 students Wrong affectation a wrong choice of university and that's quite a lot and given all what we said before about Minority classes and the bios to our majority in match and learning It's probably going to be students from minority and Privilege backgrounds. So that's really really bad. So that's well K-Pheonial called a weapon or math destruction a blackbuck algorithm doing self-reinforcing damages in a large scale To wrap it up a few key a few takeaways. First thing is that that is not neutral That a collection It involves a lot of conscious and unconscious choices and these choices can be challenged and we should challenge them The other thing is that's algorithm They may see up they may seem objective because they're like only Mathematical function and research, but actually they're learning for bias data and they'll be tuned by biased humans so they're not Farer or more objective than humans would be and the last thing is that well that essence is already human and we should try to do to keep that in mind and not feel any better and try to improve and Reduce bias in our algorithm So that's all for me. I don't know if that's time for any question