 Close up, close up, close up, speak closer to the microphone. Oh, now? Yes, yes. Okay. So hello everyone. I'm Akansha, and I'm from Rakuten Institute of Technology, and I'm working in the development field. And today I'm talking a bit about just giving an overview of the incredible machine learning and how we're trying to use that in our daily analysis. So quickly how the contents would be, I first gave a brief introduction about myself and the company I work for. Then I started with a bit of a story on how we actually started looking into and why we started looking into other leadership positions. Then you brief overview of the landscape as of now and as of what I know. And roughly again, we go through one of the techniques that we're using. It's called SHAT values. I'll finally sort of go through the story with some examples of how we used it. And then I'll go through some resources, which could be, some of them are really good resources, which could be useful for anyone who sort of interested in taking up further and further up. So let me real quick about me. So in my previous life, I actually worked as a software engineer for Adobe for their print analytics. Post that I actually worked as a researcher in NUS. I actually did my master's by doing Scratch at NUS. But I worked pretty much on social media stuff, so summarizing non-social media documents and so on. And for the past few years, I've been working at Drakuten and specifically in their data science unit, which is called RIT. And my focus is a lot of user behavior. So it deals with... I work a lot with marketing as well as product. So I identify drivers for customer churn, acquisition of any prediction models for the same. I do a bit of marketing, budgeting and attribution as well. And I also do a lot of growth experiments, something we recently started, where I work with design and product to sort of run these experiments and analyze them. A quick intro about Drakuten. So I'm not sure how many of you know about Drakuten, but it's a very big e-commerce player in Japan. Though it's Japanese, but it's actually pretty much a global company. The Drakuten Group itself has a lot of companies in America. It has presence in Europe and even in Asia. And Drakuten Institute of Technology is really a very... It's a very interesting position because it's a horizontal. It sort of sits across all these companies. So as when these companies have needs that require data science or some data inputs, members in RIT sort of work with them and collaborate with these individual companies. So we have really diverse projects and diverse companies. For instance, Drakuten has... It's into e-commerce, it's into payment, it's even like golf, renting, it's into travel, it's into video streaming. But all sorts of really nice data. And we have really nice people in RIT working on areas from machine translation, like we are living here with our senior scientist on machine translation. We have a voice recognition, a lot of stuff around image. We have AR, VR, like tons of stuff that you guys are interested in. Anyway, so I've been pretty much involved in this whole Drakuten system. I've been involved with RAKUT and Viki, which is essentially a video streaming platform that streams mostly Asian content. So Asian meaning Korean and Chinese. And most of their markets, so if you're wondering why in SG, you won't see a lot of shows on Viki.com, it's because most of these good content is licensed for the Americas or the Europe. So that's their main audience. But they have a very great R&D and engineering presence here in the Singapore office. And their business model is pretty much like cool within US, if you're aware of that. So they have a free platform where users sort of watch stuff for free, but they have all the money to ads. And there's also a subscription-based system where users pay a monthly recurring fee, but then let go of the ads. And so let me start off with the story. So like I said, they have a subscription-based plan. So here my focus is on the subscription-based part of the business. And now customers coming in monthly paying to watch content. So that's gay for the company that the dollar is in. But then you also have customers leaving the subscription and that's not good because, again, the dollar is out. So when one find a management decides to go to marketing and they're like, hey, why don't we start using AI to identify which customers are going to turn next. And marketing is really excited like, oh, yeah, yeah, sure, let's not do that. So then they come to me and they're like, can you help us identify who will turn next month? And I'm like sure, like, you know, I can use past data and I can try to build a model that will predict who will turn next. Fair enough. So I get all, you know, I go to the data or the usual feature, data cleaning, engineering, you know, building features and, you know, different features like video viewing, click stream, subscription behavior. And you train this on the past and then I come up with a probability score for the users who will turn next month. And I sort of work on this model, try out different models, and eventually a model that works on vision and vision trees and is performing better than the baseline that they have. So I'm really excited and I give this score to marketing and let's see what happens. So the time goes by like, I think almost a month or so I started looking at some other projects because I thought I'd take that and test it out and then experiment with it, but nothing's really happening. And I'm like, okay, what's happening? What's wrong? And just to mention, we've got quite a few companies which are probably just starting our data science. This is the part where POC comes out from the data team and who actually is taken and who actually sort of dies. So what we guys did was we sort of, let's see whether there's another side to the story. So we went to marketing and in terms of, the problem was really simple. They're like, okay, you're giving me this score, what do I do with it? And why should I do what I do with this score? So we realized something that we should have probably realized that if marketing is unable to action on these predictions, then the model won't be used. So what we were giving marketing was we had a model in place where our customer X, Y, Z, and I'm giving a number assigned to that. So Y, Z have a higher score to turn as compared to X. But one thing we could also sort of give an explanation along with each prediction, saying that, let's say customer Y is someone who's, you know, a fairly recent subscriber, subscribed to get this one, but that's a history of, you know, subscribing and cancelling, and another one was going to cancel because he's someone who's lost interest now. So that's when we actually started looking at a model interpretation. And this is a fairly new field in the sense that sometimes even the definition of the word model interpretation is sort of domestic. But roughly it is how humans can understand the choices that are made by the model in their decision-making process. So after this, I'll give a bit of an overview of what's there these days in terms of a model interpretation, a very high level. So typically when we talk of the landscape of interpretability, so first we have models that are inherently interpretable. So my regressions where I can interpret the weights or simple decision trees. Then we have something called thyroid models, and I'll go into that in a little more detail, which essentially what they do is we already have a very complex model train. They train another simple interpretable model on the predictions of these complex models. And then eventually there's a lot of research also going around how you visualize these explanations so that it's easy for the end user to understand. Some of these which are at a global level like your partial dependence dots and all I guess some of you might have already used. So I won't go into too much detail into this one. I'll focus more on these survey models. There's a bit of that study that's there. So typically model interpretation techniques can be classified either as local or global. So global is something that we try to explain over the entire data set whereas local is one that we try to explain for one particular prediction. You can also classify them as model specific or model agnostic. So model specific is one that's limited to specific model classes like interpreting of regression weights in the linear model. And model agnostic techniques are one that can be used on any machine model and they are usually applied post hoc after you already trained the model. So this is again a very, very high level of view of the different techniques out there. I won't touch on the model specific ones. And in the model agnostic ones I typically focus on the survey models as the last few. I have a lot of resources at the end. Anyone who is interested you can always look them up. So I go to survey models. So the basic idea behind these is that see how a really complex machine learning model you train this and they have a complex model in that. And what we then do is we train another single model on the original inputs and the predicted target value of my complex model and then that's how I interpret these decisions. One of them is it's been out there for a while it's really popular. It's called LINE which is a local intractable model agnostic explanation model. Local means it's doing it for each individual instance. Model agnostic means you can sort of plug it into any underlying complex model that's making decisions for you. And again at a very high level it's definitely more sophisticated than what I have here, but it's the interest of time. So what you do is you select your instance of interest and sort of perturb your data set and get predictions for new points. So in this case if my interest, the point of interest is the bright red one I essentially get new data points around this. And then what I do is I weight these new samples by the distance to my point of interest and then I train a weighted intractable model of this data set with these variations. And then I explain this model. The problem with this method is that actually there are quite a few I have referenced at the back which you can refer to. One of the main problem is how do you define this neighborhood around this instance. And believe it or not currently the number is hard-coded in the LINE course. If you actually go to the LINE repo and see that you can see that it's hard-coded. So, and I sort of tried playing around with it and for different sort of use cases it seemed to require some sort of tuning. So if anyone wants to play with it they already have an R library as well as a private library so I would suggest you keep this recording in mind when you're using this model and then play around with it. Another thing that sort of a problem across interpretation is correlation of features. There's nothing that's even there same simple intractable models like regression. The two features are correlated how do you sort of integrate them. And recently another work came out which is an article in the archive right now which is actually shown the instability of this model. So what I mean by instability is that the authors actually conducted experiments where two explanations that are really close the explanation, the points are really close but the explanation is very degrading. So this is something to keep to watch out for. I'll move on to another technique that's called SHAP which is the most recent work on model interpretation and what it does is it unifies several previous methods. What I mean by that is it sort of says that there are six, there are currently existing methods such as diamond deep lift and it basically says that all these methods are able to prove that they belong to a particular class, they all belong to the same class. And if this class was to satisfy certain properties, again I won't go into the details but these are properties that are important for interpretability, you guys can go through it, we can go through it later by some time. And this class has to satisfy these properties, there's only one unique solution that's possible and this unique solution they say is given by these SHAP key values. So this really big looking formula is how SHAP key values are derived but the details you can go to the next paper but I really give some sort of an intuition on how what these mean and these values actually come from game theory and what these guys, what SHAP people try to do is it tries to connect game theories with modern interpretation. At a high level what game theory does is that say I have a team that's making $400 in profit and actually people working on it how do I attribute an individual's contribution to this team. That's the question that game theory is trying to answer. And again in a very high level point of view how it does is it sort of takes all possible team interpretations as a normalizing way for them. So once we know the individual when the team is working alone and we know say a contribution of two players then I can find the intersectional game of a member when he is working in this team. So for instance A in a team that's contributing $400 I'll find this contribution as $400-100 which is actually these contribution when he is just working alone. Similarly for B these contribution would be $400-200 and who are able to actually A's contribution when he is working alone. In this way we can essentially find out what is each team member's contribution across teams of different sizes and sort of normalize this and eventually get how much you attribute to each individual member. So how does this relate to one later prediction? Well essentially you can envision these individual team members as being the features of a prediction. And the amount that they are actually contributing is my prediction that's above the base value. So my base is quantified and that's my average prediction without any using features. Then my gain is 0.7 in minus 0.5. So actually you think of this overall sales as you know the prediction score above my base value. So that's how you sort of at a very high level join analogies between theory and interpreting my model's prediction. So this is what we eventually used and this is what the results on our model play. Previously what we gave to marketing for a particular prediction was this score. So for this user you have a score of 0.71. But using model interpretation this is shape value. What you were actually able to give was a reason as to why we are saying this score is higher than the base value. So he's subscribed for less than a month. He's not viewed much content and he's someone who seems to have not in the past. Versus a user that has a lower prediction than the base value can see someone who's been there for a while and he's viewed a lot of content. In fact he's just watched content. He's been very active. And another thing you can do is if I were to take all these let's just say horizontal explanations, clip them vertically and kind of stack them together I can actually see my users together in one bow. So for instance I have these, this set of new users who are not watching any content they have my set of disengaged users over here and one thing I didn't expect it to do but it also helped me qualitatively evaluate it in my model. So there was this particular area where it didn't make sense to me. So these users are not probably likely to churn but then I see very weird features be attributed to that saying that they had a lot of drop in their minutes view, they had a lot of content in the past and it actually made me go back and revisit my model and data and I sort of realized the data was ugly my threadings were off for some of these so I actually managed to correct some of this so instead of just cranking up my accuracy I actually sort of qualitatively got to see what's happening here. So Shathi is great but again qualitative features is also an issue there are millions of attributes there yet The article that I referred to is recently being out again argued for the fact that some of the explanations even for shaft values were not stable though they were more stable than line and also sort of a trade-off between shaft and line is that shaft also builds a prediction model so you can do things like you can do regression such as if I change my feature by this much how much will the output change but for the shaft value which is an attribute you can't do that sort of modeling so if you say denying someone a loan with probably in the future with that you can say so your income is less than 10,000 but maybe if you increase it to this much your loan would be approved so there are things like that which you potentially right now came up with with shaft. Again this has a very good implementation that are likely to be available though I haven't tried it. The Python one is really good and the good thing is that the author is very active and the data is actually open to contributions as well as to answering questions So in summary what I would potentially like you guys to take away would be that it's possible if you're building stuff especially for marketing for some other business stakeholders they might especially were not familiar with how predictive modeling works so you might not take action on the predictions unless they have some more reason to trust your model and in this and in such situations model interpretation could help you Currently the method of using local surrogate models such as 9-metre shaft seems to be the way it seems to be promising but as a portion it's still in development phase so there are a lot of problems that still need to be solved so I mean you can use it to explain to a lay person what's happening for a heavily regulated system like you know answerable to the one if you're talking about why you're denying a loan I probably wouldn't use them right away yeah there's still a lot of work to do but like I said this means there's scope for us to contribute so any of you guys are actually interested in this you cannot, the authors are all very responsive and on behalf you can just, I mean one of them actually wrote down another query for me and sent me an email so they'll be in response so I'll touch on some resources and they're really good there's this book by a PhD grad student his name is Krista it's really good, in fact some of the content like the structure of how I want to do it was through this book he has also very interesting stories like black mirror kind of stories to start the book with what would happen if you don't interpret our models it's a nice way, it's also very concise and to the point and for cases when you should be going out like he refers you to papers and everything really good Kaggle also came up with a machine learning explainability course if anyone's interested but that's kind of a shot up and there are also very nice tips on getting interpretability practices in your machine learning process this is by H2O.ai very nice stuff about how you should constantly test models for sensitivity and a couple of other tips then this is the original line paper then they've also got a YouTube video explaining it in short again you have the shot value this is the guy's github code base really good, really useful have a look this is his lips paper that came out and he actually also came up with another paper specifically for ensemble trees because usually when you're doing all this feature for application there's so many features, so many kinds of it's actually come up with a faster method of doing this specifically for trees you can have a look the robustness paper that I was mentioning there's that one there have a look for that as well and if you're specifically looking for experts in Singapore I don't get any expertise in this area and sort of exploring this and trying to incorporate this in my machine learning process and see business stakeholders where is this lab called ubiquitous computer lab in the US you can check out their link they're doing some really nice stuff right from visualization to actually working on this model and the grad student who actually introduced to me was actually a former master student by research so that's his website if you're interested and funny thing I actually asked him whether I could put this on slide last week and he actually said they were artificial intelligence so if any of you are interested it's happening next week on April 4 I think he has a really small room book because he doesn't expect a lot of people it's a first need of but I guess if you guys are really interested and you get on the waitlist he's probably willing to get a bigger venue or maybe you can talk next here I think he's one of the really good experts in this he's the one sort of guy that need to start using it and definitely if you guys have questions, feedbacks or any sort of collaboration that you would want my link is already there the link is there in the meetup as well but this is my personal email as well as my company email if you guys want to get in touch I'll be more than happy to work on some problems together it's a very interesting area and finally please my manager got me here so you guys are interested it's across the board engineering just full stack engineering data scientists senior scientists across the board I guess it's a diverse company there are loads of problems so if you guys are interested please go and check it out I think this year so thank you if you guys have any questions I'll answer them to my best as much as I know of this you guys I hope I didn't bore you will you guys can reach out to me later on on thing day or like my addresses I will probably hand over this to the organizers so you can have them and I'll come to you once again some of the examples for internal but just reach out to me or to these guys at NUS who are trying to do some stuff of interpretation and have fun with it it's a nice it's a very interesting board basically what you guys made up of thank you