 Okay welcome everyone. Our next speaker is Vincent, who's going to talk to us about putting artificial intelligence back into people's hands. So please welcome Vincent. So, hello, my name is Vincent LeCartier and I'm a PhD student at the Lyon University Hospital in France and I'm also an FSFE volunteer. So today I'd like to talk about how to make artificial intelligence more accessible, transparent and fair. And my talk is there for divided into those three parts. So first I will talk about accessibility in artificial intelligence. And to that first I will introduce just basic concept about neural networks. So what I say applies to neural network but also about other algorithms. So our work is that you have the input on the left, the output on the right and you have tiny neurons that are organized into layers. And so the input goes through the network and to the output on the right. To make a neural network accessible, meaning that it can be used by everyone and everyone can train the model. What you can do is to use already trained model and you leverage that model to train the one you want to create yourself. So you freeze the part on the left, that is you freeze the parameters of the model on the left and you only focus on the right part, which is the parts that you are going to train. And so that means that you have only a small amount of computation to do and you don't have to train the whole model yourself, meaning that you can train the model on your computer. So here I'd like to explain that the bigger model are not necessarily the more accurate ones. So how to read this graph? So on the x-axis you have the number of operations required to train the model. On the y-axis you have the accuracy and the size of each blob is the number of parameters of the model. So this is for image classification but this applies to other tasks. And you can see that the biggest model are not the more accurate ones. So that you can have a tiny model yet have good results. So you don't have to train a large model by a large company and you can have a tiny model and have good results. So that you can have an accessible AI that you can use yourself. To make AI more accessible, you should consider releasing the model in a format that makes it easy for people to reuse it. So you can use a format called for example ONNX, which works across frameworks for developing artificial intelligence programs. And you should consider releasing all the parameters of the model. Also you should consider releasing the training code for the model as a free license so that other people can improve the training code and can also replicate your model. And also the dataset set goes with it if this is possible. And also when designing a model you should consider the accuracy of the model and the size of the parameters and the number of operations required. So the second part of my talk is about transparency in artificial intelligence. We have to realize that artificial intelligence is not only used for classifying dogs and cats but also for important stuff like loan approval, justice, healthcare system, self-driving cars. So very important task. So this is necessary to have transparency for this task because it allows you to interpret the result that is making sure that the model is picking up the right thing in the input to do the classification or the prediction. And so it allows you to trust the model and therefore you can debug it more easily. So you trust the model and you are able to make sure it's accurate in a good way. So I will let you a couple of seconds to read this short comic strip. So the thing is the parameters of the neural networks are like a giant pile of linear algebra and it's really complicated to make them transparent because there are a lot of parameters and it's hard for you to understand what's going on inside the network. So we can create an AI that is designed from the ground up with transparency in mind. Here I introduce a solution called LIM for local interpretable model diagnostic explanations that allows you to have insight inside the model so that you can understand what's going on inside. So for this example on the left side. So you have two different algorithms that try to predict something. And on the left you can see that it highlights the words in the document that was used for the prediction. And here this is a prediction for the text about religion or a taste. And here you can see that it picks up words that are relevant for the task. And on the right side you have an algorithm that tries to predict something about religion with meaningless words such as like an NTP posting host or like garbage or like useless data of an email. So the two models yield the same result. The one on the left is better because it shows the right word. And so this framework also works for images. Here this is a task of dog classification and you can see that it says the head of the dog was used for the classification. So basically how it works is that for images it cuts the image down into manageable pieces and it runs the part of the image through the network and it gets probabilities each time out of the network. And then it tries to use a logistic regression to try to approximate the network into a simpler model. And with the weight of the simplified version it says it knows what part of the image was used for the classification. And so this also works for tabular data. So here this is 2D like tabular data and this is for predicting the salary between less than 50K or more than 50K. And you can see all the columns that was the features that was used to do the classification. Allowing you to ensure that your model is well trained and is transparent so that you can give an explanation for your predictions. The next part of my talk is about fairness in artificial intelligence. So fairness is like about having an accurate model but also a fair one that can be used for predictions and is not like unfair. So to make a fair algorithm what you can do is just try to remove some features of the model. Let's say you have a sensitive attribute and you want to have a fair model you can just remove the features that are sensitive. So this is easy for predicting the speed of a car because let's say the color of a car is very sensitive. Then you can just remove it because it does not contribute to the prediction of the speed of the car because the color of every color has the same speed. But for a regular dataset with personal data it's complicated to do that because here for example if you try to predict the salary of someone you cannot just only remove the gender of the person because you have correlation in the data. Here in the OB feature you have someone which is part of a volleyball team and it's the women's volleyball team. Meaning that to have a completely fair model that does not take gender into account you would have to remove the gender and the OB of the person. So that you would have to remove half of your dataset to have a fair algorithm. So there is a balance between accuracy of the model and the fairness because the more features you remove and the less your model will be able to be accurate. So you have to think about correlation for removing an attribute. I will introduce some vocabulary for the next slide. So you have the true positive and true negative that in a binary classification of the algorithm this is where the prediction was correct. So like you say ground truth was true and you predicted true and vice versa. False positive and false negative are where you predicted true and the ground truth was false and the contrary. So here we present an algorithm that was still used in the justice system in America. So that was used in Florida. So the goal of the algorithm was to say to give a score about recidivism for criminals so how likely they are going to recidivate. And in terms of that the algorithm was racist because if you measure the false positive rate and the false negative rate of the algorithm is completely different for black people and white people. So for the black people the false positive rate was almost 45% meaning that the algorithm predicted that they are going to recidivate but in fact they did not. So for the data they say that someone was going to recidivate two years after the crime. And so for the white defendants this is the contrary with the false negative rate that is much more higher. So my point is that you can have a very accurate or like performance model yet this is complicated to know how fair it is if you don't like try to figure this out. So you cannot say okay my algorithm is accurate so this is done and I continue and I shift this model in production. You have to make sure that the model is so fair. Another example is this time in healthcare so this is again in the US. So on the x-axis you have a risk score of an algorithm so the risk score for your health. And on the y-axis you have the number of disease of the person like chronic conditions. And the algorithm gave the same risk score for a thicker black patient and healthier white people. So the issue is that it was again a bit racist because well you would have expected that the risk score was the same regardless of your skin color. So this is really an issue with bias in healthcare. So to sum up an algorithm can be unfair because the input data itself is biased. That is you have a lot of example of black people and a few ones for white people for example. And you can also train the data with the model with the wrong metric just like we saw with the previous slide where they don't take into account the health of the individual to do the prediction. And so you can also create a blood prediction model that will also be unfair. And bias can be ended hard to notice because if you have a good accuracy you cannot try to detect bias all the time. So as data scientists we really make sure that we create fair algorithm and accurate algorithm. So I thought about this problem of how to make an algorithm fair. And I had the idea of building fairness into AI during the training of the algorithm. So please don't be scared by math formula but I thought is that you consider the number of values of a predicted attribute. Let's say gender then you have like two possible values. This is an example. So you have the number of values of a predicted attribute. You create a function that say how fair is your algorithm for the one value of your predicted attribute. Then you sum up all the measures for all possible values of the predicted attribute. And you divide by the minimum. And after you sum with your loss function which tells how good your algorithm is performing. This is used to update the parameters of the neural network. And so that allows you to then build fairness into the training of the artificial intelligence program. And you have a way to decide what is more important for you between the accuracy and the fairness. So you will have a fair algorithm and you don't need to remove any data for that. You can have any predicted attribute inside your dataset and yet have a fair algorithm with this kind of mathematical functions. So to sum up, don't only focus on accuracy but also focus on transparency, on accessibility and fairness. Thank you very much. We have time for some questions. Hi, I'm going to be greedy and make two questions. I own a Mazda Miata sports car and everybody knows that red cars are 10 kilometres faster than other car cars. It's just known. Sorry, I did not hear your question. My sports car is red. It's faster. I work for a company that's doing machine learning and supply to breast cancer detection. We're doing the same work as Google DeepMind. We're in competition with them. Our current models are trained on white Western women, UK and American data samples. But we do want to work with healthcare providers in Japan and China. But it's a physiological fact that oriental women have denser breasts. We know our models won't work on as well with women. We have to train our models with those populations. I first thought you were talking about fairness in terms of social fairness. I think you must recognise that in the healthcare field there are differences between different populations. Your model has to be trained on those populations. That's our next step as we're going to do that. Hopefully make it as valid. Right, so what you could do is to try to have the distribution of the populations that are even. That is, you have the same number of examples and samples of different kinds of people in the population. You can also try to, as I said, have a mathematical representation of fairness. And try to use that in your scoring function to have a fair algorithm. If that doesn't work, you could also try to train one model per type of your population. That works so well. My question was more related to designing of the loss function. So how is it so different from class weighting, which is essentially used right across all the datasets that we use today? Which loss function? Class weighting. Weighting the classes. So let's say you have the data and you don't have to really balance them. The bias can be balanced. So the question was how this last function is different from having class weight in the loss function. So this is a bit different. This is a bit different because here with class weight you only use the distribution of the input. Right here, this is diagnostic to the input. You can have whatever function you want and so this is much more flexible than having class weight. I just got confused whether we are introducing bias by doing this or rather removing it. It's again very personal. But yeah, this is another solution. We still have time for another question or two. Any other questions? Okay, let's thank our speaker again. Thank you.