 So we have a next talk called towards a more transparent AI descripting ML models using Lysha. So over to you Lysha. Thank you Kalyan. Hello everyone and welcome to today's session. Firstly, I would like to take the opportunity to thank Icon India for having me here today. So let's begin with today's talk. So before we delve in, let's just quickly discuss what all we'll be seeing today and what's this buzz around explainable AI in transparent AI. So we'll be digging deep into understanding that why is there even a need to go and understand what your model is doing? Isn't black box just enough? Aren't we just doing fine with it? Then we will be discussing a magical library called Lime which solves all of these problems that we'll be dealing with. We'll also be looking at some interesting examples of some data sets from image to text and we'll be closing the session with some final thoughts and some open source contribution opportunities as well. So let's begin. A little bit introduction about me. I work as a data engineer at coutu.ai and I love participating in hackathons and I've won couple of them at Microsoft, Sabre and Mercedes. I also occasionally host podcast with Co-Learning Lounge and I'm a technical writer with them then. So let's get started. You all would agree that we all have been here. We've trained deep neural networks and have come to a conclusion where we have no clue as to what's happening with our model and sometimes even it fails at recognizing a dog and identifies it as a cat. So the whole idea of this session is to understand how do we come up with these situations and how do we build more trust in our model so that we can have better, we can relate to our users more when we are building an AI backed product. So right now with wide number of libraries that are coming in each day, ML has literally become a black box. We are moving towards more and more accurate models but we are at a place where the interpretability index is going really low. Like deep neural networks do perform really well but we do not know what's actually happening inside all those stacks and stacks of neurons and hidden layers together. So there's a good amount of trade-off that we need to decide when we are dealing with any ML problem use case at our hand. So let's quickly discuss what are the models that we generally work with and which category do they fall in. So the higher the interpretability of your model is definitely it's easier for someone to comprehend what's happening inside it. Let's say you are building a model to recognize whether this person will default in a loan or not. So what good is a high performance model if you cannot draw insights out of it whether why a person is defaulting in a particular scenario or why your model is giving a lot of false positive. So in today's business-centric world it's very, very important and vital to get quality insights from your data than just the performance. Any black box model is going to give you very good performance but the whole beauty of ML lies in how well you're able to build trust in your users and your customers or how well you're able to explain what you are doing and how you are doing. So with the same ideology we have understand actually why is it actually important. But this is the place where exactly line comes into picture between the black box and the humans. Let's say you're pitching in your product to someone and you want to explain how this is done. Now when if someone knows how your model performs definitely they are going to trust what you're doing especially when you're dealing with use cases like healthcare data or you're working on chatbots with really sensitive topics. In those cases interpretability becomes a great deal and the whole idea is to ease your process of interpreting your model. So in further slides we'll be comparing why you should leave your traditional methods of EDA behind and go to this new LIME library that I'm talking about. So we understood why we need interpretability. Now let's talk about LIME which is what the entire talk is about. LIME is basically a novel algorithm which was created by Roberto, Samir and Gastrin and it also comes as a Python package. So that's amazing. Now what does LIME stands for? Sounds very fancy. LIME basically stands for local interpretable model agnostic explanations. I know that's a lot of words but let me explain it to you one by one. So local stands for it gives you a great deal of insight into what your model is locally learning or gives you a very deep insight into the decision boundaries which is something your traditional methods do not give you. Second thing is the kind of interpretations you get out of this library. Those explanations are easy to understand and even a layman can understand those because either they are graphical or they are simple visualizations. And the best thing about LIME is that it's model agnostic. You give it text based model, you give it a deep learning image classifier or you even give it a regression problem. It can explain possibly any model at hand. Now you understood where LIME comes in place but now it's time to understand how is it able to do all of this and explore the secret sauce behind this amazing library. So we right now do not pay much emphasis on what the model is doing. Most of an ML practitioner's time is devoted into dealing with the data, making it better, the feature engineering part of it. But looking into what your model does is really important. I'll be giving you some interesting examples which will definitely change your viewpoint on using Black Box model just off the shelf. So look at this example. Let's say you're trying to kind of differentiate between dogs and cats and you try to kind of explain why was this particular image recognized as dog? So the area which is not highlighted as gray are the pixels which the model thought were important into recognizing why it was a dog and that's exactly what you wanted to do. Now in case if you have a lot of false positives in your data you can definitely go and look at it instead of just sitting back there and thinking what went wrong with your training data or was there something wrong with hyperparameters? That's why LIME comes as a very easy to use solution. And now let's go and explore how LIME actually does that. So there are very simple steps into how LIME functions. First thing is that LIME learns a local model on top of the model that you've already trained and what does it do with this model? So first thing is let's say you're trying to explain a set of, a sample of your test set. So what we'll do is it will create permutation of your input data from that observation and find out all the predictions on those permutated data set. Once it has done that, then it will do the prediction on this new data, which will be a very simple model because it is nothing but a local fit. And one interesting thing about this is that when you are generating new permutations of your data, more weight is being put on the data that is similar to your original data. That's how you're going to get the locality or the local fidelity in your model. And all that is backed by a very beautiful math equation. It's easily very simple, but before we go into that let me just give you a graphical explanation of that. Let's say Xi, which is highlighted in red, is your sample. What you'll do is you'll create a permutated data set which is highlighted in blue. Now, when you are training a new model, which will be a local model on top of it, what essentially you're trying to do is that you are going to take models which are in the close neighborhood of that particular Xi so that the model that is trained on above of it is locally aware and is able to explain what exactly went behind giving such a particular prediction. To break it down into simple terms, let's say you're just trying to identify whether an object in front of you is a chair or is it a table. So if an image is recognized as a chair, there will be certain pixels that will be contributing to it. So that's essentially what this whole mathematical equation is doing. And to put it into mathematical words, it comprises of two things. This is nothing but a square-based square loss. And the second term is the complexity of your model. Complexity of your model is nothing but, let's say, if you're training a decision tree, it can be the number of levels you have in your decision tree and if you're training, let's say, a regression model, it could be the number of non-zero weights that you have. Now comes the important part which is the loss. If you're training a model and you want it to be as accurate as what was originally trained, you would want to minimize the loss. And that is called the locality aware loss. It is nothing but, let's say, F is your observation which you are trying to explain and G is your model. So this is nothing but the unfaithfulness of the model. You want to get as faithful explanations as possible, so you would want to decrease the unfaithfulness of the model. You try to approximate F using this G and pi x is nothing but the proximity of the observations. You had your original observations and you created a permuted data set. Now between these two, pi x is nothing but a proximity factor which you get using Gaussian functions. It's basically a Gaussian kernel. When I show you the code of how it is done, it'll be much clearer and you'll have much more insight into how it's done and what I'm talking about. So you've heard what it does, how it does, but some of you may still not be convinced that why would I leave the comfortable things that I'm already doing and I'm good with it. Now there are some really good things that LIME has to offer and it also fulfills the disadvantages that are there in the traditional approach. The most important thing being traditional approaches cannot comprehend the decision boundary when you are changing your inputs. Also you have close to or little insight into what feature importance is for your new data points and at the same time LIME is very well explaining you the decision boundaries in a very easily understandable form and it's model agnostic. And the most important thing is that it is giving you locally approximate global model. I'm not saying that it is able to explain the entire model but it is at least locally faithful. Like it can approximate that locality based on the global information that it got from your original model. Now this is something which would be familiar on the left side. This is nothing but a TSE plot for MNIST dataset. You know if you're trying to understand how a model performed we would do something like this but this is something which is not very interpretable. Let's move to the right side where we have LIME. We're trying to understand that if the input was eight and the output probability for three was high why did my model give me this false positive? Because the pixels that are highlighted in red those are the regions which look like three or resemble three which could have contributed to eight being recognized as three. Now if your model has a lot of false positive you would ideally want to know what went wrong than just thinking and kind of trying to find all the measures and all the FN scores and accuracies. This is something which is very visually interpretable. And moving on to the model agnostic part it pretty much does everything for you. First thing being it comes with a line text explainer it comes with an image explainer and a tabular explainer. In case of image explainer as I told before it is going to highlight the important pixels in case of text. It's going to highlight the important words or you could say the tokens which contribute to the prediction variable. And in case of tabular data it contributes to the weight a particular feature would be getting. And we'll quickly be seeing a little insight into digit classification on MNIST data set. But before we go there let's just quickly understand what we are doing here. So we are training a simple decision tree classifier on MNIST data set. And what we essentially do when we are trying to explain possibly any model we have goal is to create an explainer object which is nothing but line image explainer. And if you're working with images and want to highlight certain parts of your image which are important you would ideally want a segmentation algorithm. Now in segmentation algorithm I've chosen quick shift because it's faster. This kernel size is very important because this is going to decide the proximity factor between your original data points and the permuted data points that your model is going to create. So the larger this value is the more faithful predictions you are going to get. And once you've created your object you've created a segmenter now you want to create an explainer. So for that it's a standard syntax that you go and call explain instance. Now what you would ideally want is which instance you want to explain what is the model you want to explain and how many number of samples you want to take in your neighborhood. So that's really important because that's going to govern how well your local explainer can work. Now this is essentially how we do it. Let me just show you a very quick demo of how this particular thing works on a different set of points. So let me go to digit classification. Let's say you took a data point and its actual label was let's say eight and it was predicted as one. And let's say there are many such examples where eight and nine were predicted as something else. So what you would want to do is that you would want to inspect what went wrong. Do I need to add more samples? Do I need to augment my data set? That's how line is going to help you. What we're trying to understand is if eight was positive for four or if eight was positive for five even with probability greater than 0.5 why did that happen? Right, so the areas that are highlighted in red is basically telling you that these are the regions which are contributing to your eight being misrecognized as something else. And if you visually see it a standing line can be a part of either four or it can be a part of five also. This is something which is really helpful when you're trying to do something like image classification and object detection. Now we saw that it works well with ML-based applications but let's see if it works with deep learning-based applications or not. So it's a very famous example of classifying wolves and dogs. Wolves are generally outdoors but dogs are generally indoors. And the authors of the paper trained the model and they found out that the model was recognizing wolves and dogs with a great accuracy but all it was learning was the background. So the pixels that you see in gray are something which was not contributing in recognizing a certain image. So that's how explainability in AI is really important. Your use case might be working really well but internally it might be learning something completely different than what you intended to do. And that's why we'll kind of go and see dogs and cats classification example but before that let me give you a context into what we are doing. We are using an inception v3 network to kind of see how well it performs on dogs and cats classification. A simple product function is used and when it comes to the image explainer it's similar to what we did in digit classification. We are going to use a segmentation algorithm and top labels I forgot to tell you before but it's the number of classes you have in your data set. Let me show you quickly of how it works. So let's say we have this particular image of a dog with a ball and we are trying to analyze if label 208 if it was for dog did it really contribute the image pixels from the dog itself? And it's essentially what our model is doing and there are let's say if there is a case when the ball was also recognized we would want that if the ball is predicted with a higher QAC the pixels around the ball should be contributing to saying that it's a ball. So I've shared these notebooks on the GitHub and a link is also there in the slide deck so you can definitely go and kind of run it for yourself and check how differently it works. Let me show you when there are more than two objects in the image. So these are the kind of predictions we got and ideally this is an English foxhound and hence the prediction accuracy is also high for that but you also want to understand if there is very low prediction accuracy what went or what are the kind of pixels that were being picked. So in case of, in case when we are talking about dog so 168 is the English foxhound. You see that the area in the face is contributing towards it being called a dog along with some areas in the cat's face also which is something which we don't want. So we see a clear difference between what we wanted to do and what actually happened. And that is something which is also going to help when you are doing a transfer learning you're using existing model and fine tuning it on your data sets. So you can see if your use case what you intended to do is actually working as expected. And you can also similarly inspect the images for low confidence results as well. Let's say if the dog's face was not at all contributing your model was not at all performing well but it was learning something else. And that's exactly what ML calls for. You understand what you do and you have trust in what you are building. So now we know that it works for classification of images it works with ML, it works with DL but does it work with NLP which is one of the biggest areas where ML is used today? Yes, it definitely does. And we'll be exploring quickly a text classification problem on Stack Overflow post-intag data set. We'll be doing some pre-processing and training a logistic regression model which will give us the required predictions. And let me quickly explain you how a line text explainer works. So here again we have something called kernel width. The higher kernel width you give the more faithful predictions you get but the more time it takes for your text explainer to solve that local model. And feature selection is currently auto but if you want you can also give your own feature selection. Let's say there are certain features which you want to build upon your data set you can also give those. And once you have created your explainer again while doing the explain instance all it needs from your end is the number of features and labels, labels is nothing but if you're trying to give an input which you know that this question was for HTML but you want to see if it was also identified as a question for CSS and JavaScript why and what went behind it. And to show you how beautiful the output looks like like it's really easy when you are dealing with graphical interpretations in ML like you do not need to scrunch your mind and go inside all the logs and inside the numbers and see. So let's say if you have a prediction as SQL you know what are the keywords what are the tokens that are contributing it to being called SQL. And also you can highlight the words in your paragraph. Another great functionality is all the graphics that you create using line you can put them in HTML using the HTML class that comes along with line. Just to tell you there is also the GitHub repository which I was talking about you can go and check out all the notebooks and the slide deck is also there. Let me also show you some more examples for text classification that we have. So let me just go down. Let's say we were trying to understand the HTML probability was 0.99 and .NET was 0.11. What were the words which were contributing it to not being called HTML? So the confidence is very low and you would see that yes your model is able to clearly demarcate between HTML and .NET. Now such kind of analysis can be really helpful when you're trying to deal with cases where your classes are very closely related or let's say you have a class in balance and you want to see if your model is able to handle the biases even after you have done some over sampling or stratified sampling. And let's look at another example where we are talking about CSS and what are the words that are contributing to it being called CSS. So it gives you more insight into let's say what length of the vector you should choose when you are doing an NLP based problem. And it also gives you much more insight into if there is some more level of cleaning of text that you need to do in order to serve your use case better. Moving forward, there are a couple of things apart from this which you can also explore. So there's this great lively called SHAP which also gives you explainable AI techniques which you can explore. And then there's an amazing book called Making Black Box Model Explainable by Christoph Muller. So if you're interested in this field you should definitely check this out. You know, we have seen different examples of how lime works. And we've also seen that why lime is better than the traditional approaches we have been following. Because first thing is it helps you build trust which is really, really important in this business centric world. Second thing is transparency. So if a user knows what's happening when he is using a certain app or a certain web application, definitely they'll want to use it more because they know how it's done. Third thing is public perception. Like if you're trying to explain your approach to someone in a very lame and form lime gives you utmost functionality to do that. And the fourth thing is improvement through feedback on which we have stressed a lot during the presentation. Lime is something which helps you understand things at a very granular level without having to dig in deeper into the mathematics of it. And based on the feedback you get you can go and improve either at your data level or you can improve it at your model level. And again, improve it. And since you have an improved feedback you will definitely make much more informed decision than just working your way around hyperparameters or fixing random things in your data. And the best thing about it is that the explanations are really short, they're selective and they're possibly also contrastive. But with every positive we have some drawbacks. First thing is that the complexity of your explanation model has to be defined in advance. That is what is the neighborhood that you're trying to consider. And another thing is that the sampling is from a Gaussian distribution. So each time you'll have to sometimes tune your kernel width. So if you're looking for more faithful explanations you will have to kind of either increase it but at the trade-off of increased time. So these are the kind of drawbacks that we have to deal with but the positives definitely outweigh them. Now talking about what we have learned today that's really important. First and foremost thing is don't just trust models without explanations. If you are, let's say if we are maybe creating a model which is trying to explain the churn rate or which is trying to predict churn rate in a company. What good is a high performance model if it is not able to give us business insights into why the people of the company are leaving. That's why do not trust your model without having the kind of insights you're trying to draw from it. And I couldn't agree more on the fact that interpreting the machine intelligence is tough at least for humans now. And for the business centric use cases especially in the world where we are where ML is possibly in every vertical we talk about we must ensure that the model is learning the scene description well. We just saw in case of wolves and dogs what was happening. Lysa, we have five minutes. Sure, thank you. I'll quickly wrap up. So that's something which is really essential and talking about what the future of interpretability is there'll be a renewed focus on model agnostic interpretability tools like SHAP there's also libraries like skater which are quickly coming in. And right now we are analyzing data we'll slowly start analyzing our model in a sense that we'll start analyzing the decision boundaries. And we are slowly moving towards a time where the data scientists will automate themselves because the problems like data drift and concept drift are something which are also contributing to the concept of explainability. So definitely there's a large scope for interpretability and some of you are interested in exploring it further do check out the original paper the blog by the authors and you can also contribute on the library it's open sourced and the link to the paper is also attached. I'll be happy to take any questions that you have. So here is the first question Lysa as you mentioned it shows a local part from a model. So I want to know how much local like what how much of a domain model we consider in calculation. So essentially right now what line works on is it uses either a linear regression model or a decision tree based model when it's trying to learn a local model on top of whatever model you input to it. But when you're talking about how local it is again that comes to what is the width that you want to choose for your kernel. So if I summarize what I was talking about you pick the point you want to explain and you choose a neighborhood around it. The proximity is decided by what is the width of your kernel more the width of your kernel more the proximity of the points which will be taken for the model to train. So it's upon you whether you want more and more faithful explanations at the trade-off of time or you want a decent explain explanations of your model but you want very quick computations. So that's that's the trade-off that you have at hand. So the next question is after visual interpretation how can we tune model based on it? For instance, how do we use significant pixels information to modify the model? Right, so I'll again take the examples of let's say digit classification, right? There we were trying to understand that if there are certain level of false positives and false negatives, what do we do about it? One thing which we can definitely go and do is we can augment our data set. If we know that a lot of eights and nines are getting misclassified, we can check the distribution of what is the misclassification happening. And since we know the regions, we can take augmentation like we can invert the images or we can kind of divide or cut the images in half so that we kind of train half an image take those pixels which are important or crop the image into certain size those kind of things because now we would know what our model has already learned. We are adding an additional intelligence based on what Lyme has given a feedback to us. So definitely this is something you can do in case of images, but when it comes to text in case of text, one thing is that if you know what are the tokens that are important and let's say you have long text and the tokens in the second part or the lower part of the text are not at all important it will also give you an insight into what is the vector length that you want to choose when you're building NLP models or you're training chat boards. So yeah. Yeah. So the last question is Laisak I mean like could you share the links of a Git collab the notebooks which you have explained in this presentation? Yeah. So in this particular repository all the notebooks are linked along with the slide deck so you can go and check it out there. Thank you so much Laisak it was a great talk. Have a good day. Thank you.