 So I'm a PhD student currently in my first year, so I'm just getting started, and in my spare time, I'm a FSAV volunteer, and today I'm going to talk about the subject of giving back artificial intelligence in the hands of the people and how we have artificial intelligence that we can control. So this is split in three parts. First is how to create accessible intelligence, artificial intelligence that you can use yourself, so you do not depend on big players in the AI field. After I will talk about AI transparency and finally about fairness and how to design fair AI that are ethical. So first on artificial intelligence accessibility, the issue is that you cannot have the resource at home to train the artificial intelligence yourself because you don't have the resource all the time, all the knowledge, and because of that it is a huge issue for AI because you cannot use it yourself because you don't have the resource. So one possible solution to make AI accessible is to use some method that is called fine training. What you do is, so you have a model that is doing some classification, and so here I'm talking about deep learning because this is the field I master the most, but it's certainly applied to the other part of AI. But the idea is that you take other models that are trained by big companies and you try to train only the part that matters for you. You leverage the previous knowledge of other models and you use that to train your models. You only train these parts like the last bits because the first part of the model is long and very resource intensive to train, but the other parts are much faster. So you can use this kind of method to use artificial intelligence on just a laptop, just with training the last layers. And also, so there is a study in 2016 about the size of models because the size of deep learning models is quite huge. So those models have like millions of parameters. So how to read this graph? So here you have the number of floating points of operations like the computing of the computers. Here you have the accuracy and the size of each circle is the number of parameters of your artificial intelligence model. And so you can see that the most accurate models are not necessarily the ones that have the most parameters or the more intensive computations. That means that you can have tiny models that work just as better as the big ones. And because of that, the myth of just throw more power and more computing power to a problem is just not true because you can have like tiny models and they just work the same as the big ones. So how to make AI accessible? So you can just leverage other models and take their knowledge and use it for your particular problem. And you can also release the code and the dataset as a free license. So other people can also train your model and use it for their purpose. Like they can use your parameters and use that to create their own model and classifications. And also if you can use metrics to know how your model is performing and you can say also how complex is your model when you release it to the public. So people know what computation is needed for using the model. So in your design, it's really important to say consider the number of parameters or the time required to train the model. So on the topic of transparency, we have to really understand that AI is used for really critical matters like for loan approval or justice or healthcare or self-driving cars. And because of that, we must make sure that AI is transparent because then we can trust the model. So this is really like a critical field and we will have to make sure that it's transparent and you can understand how it works and how decisions are taken by algorithms. So we want to have AI transparency because it allows you to interpret the result. So you can know not only the result but also the process that led the AI to the solution. Just like when students are doing their grading exam, you ask them for the result but also like how did they found the solution so you can trust the result. So the same goes for algorithm. You must have the result and also the process. That's really important for you to trust the solution. And this is also a tool for debugging because if you know how the AI found the solution, then you can debug and know if that makes sense to you. And yeah, so we require people to justify themselves. So if there is a justice decision, we must have some way for the people to explain what is the reason behind the decision. So we must ask also that algorithm can also be transparent and explain the decision. So the issue is that currently the parameters of deep learning models like there are millions of parameters. So because of that, that's not designed to be transparent because there is so much parameters and you cannot use the parameters yourself and interpret them as a human. And that's because AI wasn't designed with transparency in mind but we can maybe create some algorithms that are transparent right from the start. In their design, they must be transparent. Instead of like trying to find work around transparency, designing something that's really designed to be transparent. So here I present a framework for AI transparency. So it's called LIM for Local Interputable Model Anostic Explanations. So this is currently working. This is implemented in standard artificial intelligence libraries. It allows you to know what led an AI to do a prediction. So here this is a data set about like building this discussion and the AI try to predict if this is an artist discussion or not. And you see on the left side, the algorithm considers that this is an artist because it speaks about like God or like courage. So like you know that the AI just identifies what led the decision to be an artist because while this is a world that is relevant to the subject. And on the other side for the second algorithm, then the AI picked up words that are like not interesting like NNTP posting host or whatever. And so like you can understand that the AI is not performing well because it seems that it is because for the wrong reason. So it allows you to debug the model and also to have trust in your predictions. And so this is those kind of things that can make transparent AI. The same goes for image. Here we have a classification of a dog and the AI was about to tell what part of the image triggered the decision. So how it works is kind of simple. For example, for image, we cut the image into pieces and after we create an approximation of the model by analyzing how the model is reacting to the input of part of the image. And we say what part of the image is contributing the most to the prediction. And with this we can say, okay, this is this part that is more contributing to the decision that this is a tree frog. So that way we can have transparent decision of AI. And this is also work for tabular data. Here this is a data set about predicting if the income is less than 50K or more than 50K. And you know what variable contributed to the prediction. So that works for image, for text, and also for tabular data. So this is really like something that is model agnostic. So you can use it regardless of your intention and problem. And I also want to discuss about this subject of fairness because there is a lot of misconception about fairness in AI. Let's say you have a protected attribute. Let's say you don't want your AI to be unfair about gender or religion or skin color or other sensitive attribute of someone. So some people can say, okay, I can just remove the variable and say I will not include the gender in the input data and just call it today and say, okay, I'm done with fairness. But unfortunately, it doesn't work that way because here, for example, you try to predict the speed of a car. You can remove the car color and that won't impact your prediction because, well, okay, you know that red cars are faster. This is obvious. But I mean, you can always remove the color and you know that won't have any impact on the prediction, right? But here, let's say you have a real world data set and you want to predict the salary of someone and so you have the gender and you want to remove the gender. But the issue is that in real world data set, you have correlations and you see that in the OB, like you can guess the gender of someone because of the OB because, well, you know that someone is doing like artistic swimming or something. So the AI will pick up those correlations and use it as a proxy for gender. So that is an issue because then if you want to remove every part of gender, then you will have to remove basically half of your data set and so you won't be able to make accurate predictions. So there is a value like a trade-off between accuracy and the value in your data set and fairness. So an AI can be unfair because of a lot of various reasons and this is really the concerning part here. So the bias can be in the data itself like we just saw in the previous example, but the AI can also have the wrong metric as we will see just after and we can also have just bad models that are doing bad predictions and because of that be unfair also the bias is really hard to notice. I mean, you are only looking at the accuracy of your model but not about bias so you detect this only if you are looking at it but this is not that obvious and this is why as data scientists we must make sure that this is our responsibility to check for those issues about fairness, transparency, making things accessible. This is really important for us to consider those decisions when making algorithms and do not also just consider the accuracy. So just some vocabulary just to explain. So let's consider just like true positive. This is when your AI like you have a binary classification yes or no and your AI guessed right like it predicted yes and the true value was yes. After you have the true negative, false positive this is when the AI thought the decision was yes but in fact that was the wrong decision and the reverse. And so you have the positive predicted values which is the total number of positive predicted class for an algorithm and the same for the negative predicted values. So here I want to present an example so this is a real example so this is not science fiction. This is a compass so an algorithm for the justice in the US. So the algorithm detects how likely you are going to receive after you have committed a crime and some journalists found out that the algorithm was racist because here the number of false positive was much higher for black people than for white people and the same goes for false negative. It was much less for black people. So like it's a real issue I mean so we'll just accelerate a bit. So next is another example about again black and white people in the US for predictions. So this is used in the Boston hospital and here you can see that the AI just used the wrong matrix for prediction because they started from the assumption that the amount you paid for your healthcare is a proxy for how likely you need future care. So people that paid more for healthcare were more likely to have privileged care in the US and because of that white people were more have better care solutions that black people. So please don't be afraid by math. So this is a loss function and we can create some mathematical formula for fairness in AI. So what we do is we consider K to be a number of possible values of an attribute like gender or like religion or whatever and we consider a function that say our fair is the decision for a value of an attribute and after we say how well the algorithm predicted how good was the prediction and we also add mathematical functions that say how fair was the decision and we try to both like optimize for accuracy and also optimize for fairness at the same time so that the algorithm is both trained for those two and we have also accurate models and also fair models at the same time during the training. So yeah, that's it. Please ask questions.