 Good evening everyone. Thanks for joining. I am Rahul, Rahul Agrawal, part of American Express Credit and Fraud Risk. So I am sure all of you must have been, you know, gone through a very heavy dose of machine learning, data science, a lot of code algorithms. So I promise this is going to be a very light session. You are not going to see any piece of code or anything very heavy on your head. And I am going to keep it, you know, in a way that there will be a lot of stories around what we are doing. So we started this journey of, you know, data quality long time back. But there were a lot of challenges we faced in the journey. And this story is all about how we solve for those challenges and what worked for us, what did not work for us. So I will just talk about that. So I will start with one of the interesting stories, you know. There were three British researchers, you know, who were, you know, looking at ozone layer across the world at how this is performing. And they realized in 1985 that there is a big depletion in ozone on Antarctic. Almost 90% depletion was there. And they were surprised that there is actually a, you know, there is actually a satellite, Nimbus 7, which was monitoring this entire ozone on Antarctic. But why Nimbus 7 was not able to capture it? And when they looked through the data, they realized actually Nimbus 7 was able to capture the problem seven years before. But the data was considered by software as anomaly because values were too low. So it was completely ignored and the problem remained undetected for like seven years. So that's just a start point for, you know, how anomalies can make a difference to your results. And I'm going to just show some of the quotes I took from internet, you know, the average financial impact of poor data quality on organization is almost 15 million per year. That's by Gartner. And then bad data cost US $3.1 trillion per year. So these are some big numbers. I mean, again, every company, every organization has some impact of poor data quality. So now, you know, coming to American Express wise, this is so important to us. So if you see American Express, we are a truly global company, you know, with presents in 134 countries, with 59,000 employees. And if you see your build business, it's around 1.2 trillion dollars. And just to give a context or size of the business, if you look at the size of Indian economy, we are at around 2.6 trillion dollar in terms of GDP. So we are close to 45% of that Indian economy in terms of our build business. So that's the size and numbers we deal in. Now, why it is so important to us. So on a daily basis, you know, we are making decisions on billions of dollars in build business. We are touching millions of customer every day. And we are processing 1000 of new accounts application who are applying for our card and products. Now, let me just give you some interesting scenarios from the past. So when you visit a merchant, you swipe your card, what is your expectations? The expectations are that within a second or two, you will get a decision and most likely that will be an approval and you are done. But what you don't observe is that behind every such decision, we have close to 10,000 variables, 4000 plus rules, which are playing a role behind every such decisions. And a couple of years back, we have an interesting scenario. Then there was a new variable which was introduced to one of our models. And there were some initialization problem with the variable. Now, see what happened. All of the sudden, our decline rates went up from 0.5% to 12%. And that trend remained for four hours, five hours. It's not only a financial loss or impact to us, but more than that, it's a branding impact, which was very huge. Coming to the, not only this is not an issue, only with internal data, I'll take another scenario with the external data. So we deal in a lot with the data coming from third party, you know, bureaus for our new accounts acquisition or approval processes. So what happened sometime back, one of the bureaus changed the bureau score version of that data, which we didn't update it in our system. We are expecting data to behave in a certain fashion, but with the new scoring altogether, the models were not, you know, accustomed to that particular variable. And we start consuming it in the same way the way we were consuming the old score. And we can see immediate change in our results and outcome, you know, from the acquisition standpoint. So these are some of the examples how these change in behavior in data impacts us. So overall, if we see the margin for error is very small. Now, coming to the problem that why this problem is a very complex problem to solve for, you know. So if we see, you know, what is anomaly detection? It is nothing but you are trying to define boundaries around your data and anything outside the boundary, you are calling it an anomaly. And anything within the boundary is the pure data. Now, the challenge is where to define the boundary because there is no right or wrong answer to say, hey, this is the right way to define the boundary and this is the wrong way to define it. And if you define it too tight, you may end up getting a lot of noise, a lot of, you know, alerts. If you try to make it too relaxed, you may lose out the real problem and real issue. So I will exemplify it with one of the interesting, you know, very interesting example. So, you know, when I was, you know, young, I used to play, you know, love playing the video games. And, you know, there was one of the most favorite one for me was the, you know, the Mario. And within that, there was, you know, Nintendo was a creature for a creator for the Mario and this person again created another interesting series in 90s called Finding Waldo. Now, Waldo is a interesting character, you know, with red and white sweater and red and white, you know, hat. And he wants to be an astronaut, he wants to go to the moon. And by figuring him out in the images, we can get more points to take him to the moon. So we are helping him out. So, let's look at this picture. How many of you can see the Waldo in this picture? I can start seeing some hens 5, 6, 7, still quite a many. Let me change the complexity of the problem. Now, how many of you can find the Waldo over here? So, it's difficult. Now, now look at what has changed between these two, you know, images. First of all, the number of characters in this image are way too high than what they were earlier. So that's what exactly is happening with data. What we used to have data 10 years, 15 years back, we are dealing with a smaller amount of data. Now, we are dealing with a huge amount of data. So, data has gone multi-fold. The second is there are a lot of similar looking patterns with red and whites. So, what we call, this is the noise in data. There is so much noise in data, it's very difficult to identify which is the real Waldo versus which is the something red and white. So, that's the another problem we are seeing. The third problem is the time to market. If you see, you can definitely detect the Waldo in this picture, but you will take a huge amount of time in figuring out where is the Waldo. And that's where the time to market has gone up for all these reasons. So, if I summarize these challenges, what we are facing, you know, we are having a lot of high false positive in the data. We have lack of scalability, very high time to market, low adaptability and a lot of inconclusive alerts. So, that's the problem statement which we are facing. And let me just talk about the approach, how we were solving for it. So, our original approach which we are using couple of years back, we were using multiple statistical functions, you know, which were just generally industry standard, which were giving us a UCL and LCL basis on that we are trying to detect the anomalies. Now, what was happening? I won't say this was not working, but what was happening, we were getting lot of noise, we are getting lot of high number of alerts. We have to tune every time series differently because every time series are a different behavior and it used to take a lot of time for us to just tune it and there was lot of inputs coming from our user base. So, can you change my time series to plus minus 3 sigma versus this and that. So, it was a very big exercise for us at that point of time and there was not normal distribution for every time series that was another problem. So, we started with that. Then we thought, you know, as a company American Express, we are pioneer in machine learning. We started on this journey, you know, in 2013 itself by setting up a AI lab in Bangalore and, you know, from 2014 itself, we start using the machine learning models into our fraud detection, servicing, customer acquisition and today, if I see, this is the backbone of almost every decision happening in American Express. So, we thought, why not we try to leverage it for data anomaly detection and see, you know, how this can help us in this entire journey. So, this was our first novel approach, what we tried. We took, you know, difference type of model, you know, and then we try an ensemble approach. So, in ensemble, what we realize is that, you know, every type of model is solving for one given type of problem and basis on that we are looking at, you know, where is the majority vote coming? If majority is saying this is an anomaly, then we are considering it as an anomaly. If majority is saying it's not an anomaly, we are not taking it as an anomaly. So, with this approach, we were able to take care of a lot of seasonality, trends and a lot of such behaviors which we were getting earlier. So, noise went down quite dramatically and if we see some of the results which we are able to generate out of it, our accuracy went up by almost 50 percent, which was a huge jump, you know, in terms of, so, and almost, so, in another word, we are able to cut down our false positive by 50 percent. Time to market almost 3x gain over in time to market and why is it so? We don't require to take any inputs from user, we don't require to tune the, you know, the limits taking from the users, but still there were some tuning we have to do for each of these different models which we are trying over there. The execution speed was very nice because we got a new big data cluster, very powerful cluster and we were able to run all these models over there. So, time to execution was very, very fast and the detection time went down just because the number of now the user needs to look at less number of alerts in place of what they were getting earlier. So, our detection time really went down, but then we realize what next can we do on top of it and that's where we went to the, you know, revised ensemble approach, which was a weighted ensemble. Now, here what we were doing, we are putting all these different type of models on historical time series and basis on that we were saying, you know, which model is better suited for that given time series versus which is not and basis on that we were defining a weight for that particular model. The another interesting thing we also did is we also had another algorithm to quantify the anomaly in terms of a score. So, what we are also looking at that if there was an anomaly, then how you quantify it? Is it a small anomaly? You can give it a weightage of 20 points versus a bigger anomaly which is a weightage of 80 points. So, now in terms of your historical time series when the model is running on that, what type of anomalies it is able to detect? The smaller value time series anomalies are the large time anomalies. So, that was another interesting learning which we had in the journey. So, with the revised ensemble, if you look at the results, the accuracy went to 80 percent now. We went up by 80 percent in accuracy. So, almost now every other alert was very, very relevant for us. The kind of alerts we were getting were now very, very relevant. The time to market really went down because there was no user input of any sort. All you were giving us was that we have a time series. We know we have to just monitor it, put it to the learning techniques and then within models will get tuned on their own and then we can just apply this entire mechanism right away. So, almost you give us something to us next it is up and running. So, that is the kind of benefit we got in the time to market. Execution was again good and detection time went again quite low by 70 percent now. So, that was the other benefit out of it and within detection time one more thing as we did was that we are further giving the scoring for every alert to people. So, they know exactly if they have limited amount of time which one is a bigger you know anomaly versus a smaller anomaly and they can pay more weightage to that particular feature you know that particular alert. So, I will just come to some of the key learnings you know which we had in this entire journey you know. The first thing was supervised versus unsupervised. We thought initially when we started that can we go with the supervised I will take questions towards that. So, can we go with the supervised you know model learning model over here, but what we realize is that number of data points we have from the historical learning is very very low. There are because it is not like you know you are getting in a time series you are getting 50 alerts every you know there are 50 issues happening in reality. There is hardly one or two issue which has happened in last two years or one year and for you to know that issue exactly and then basis on that can you train your model it was very difficult. So, we went with the unsupervised and what but what we are doing is now every alert which we are generating we are taking a user feedback on that that is it a relevant alert or not if they say it's a relevant alert we are feeding it back to the system. So, that's the one thing which we are doing the other part was explainability. So, you know what we realize is that you can definitely generate lot of alerts, but can you really explain it to a user that what is the problem here. So, until you have a good explainability for a given behavior coming out of machine it is of no sense or no use for people. So, you have to relate it to the actual business problem and more contextualization around it that why this is happening. We are also exploring you know some other tools and utility which can provide us the explainability more in a grammar English grammar type of thing where we can have all this as a verbiage rather than user needs to define their own thinking that what is happening over here, but that is very important the explainability of the entire thing multivariate versus univariate. So, we we started with univariate and that that's working absolutely fine for us. One of the thing which what we learned about is that a lot of time suppose we are getting bureau data now there are 15 variables coming from bureau and all of the sudden one fine day their missing percentage went from 0.1 percent to 12.372 percent. So, now I can correlate that these 15 variables are behaving similar in a similar fashion because you know there is a correlation in terms of their missing percentage because it can't be a random thing that all of the sudden my 15 variables have a same missing percentage to that precision. So, we are able to correlate all these alerts together and basis on that we were able to further reduce our number of alerts because now I know if a lot of time what used to happen if there is a problem it will create problem for almost 40 or 50 variables in one shot, but by correlating that you know the percentage of deviation we can easily correlate those variable and basis on that we can reduce the total number of alerts. Talent in house versus external this was another interesting thing. So, for us the good part was that we have a lot of strong in house talent on machine learning there was a good practice of machine learning you know in different part of the organization. So, we were able to leverage that I would also say we were having some citizen data scientist on our side plus we got some external talent from market. So, by combining all these things we were able to take it forward, but if we just purely think of can we do it purely with the in house talent it becomes difficult unless they have that that level of skill set or you know and bringing some external perspective you know also speed up the overall journey. The other part was the place small bets and fail fast you know this was very interesting you know we because we never know what is going to work what is not going to work in this journey. So, we tried many things you know what I have shown is just few examples to you, but we have gone through a series of things which we tried in this journey some of them really work well some of them didn't work well. So, but we were never stuck to a given point we were always you know experimenting. So, that worked very well you know we fail we have failed many many times in this journey, but the failure the cost of failure was very low for us. So, I will just conclude that you know we are able to find our Waldo in some form and shape I am sure this is going to be journey for all of you as well. So, with that you know I am up for any questions you have. So, you know we are we are using a family of algorithms for each of these you know models if I come across right now we are actually filing a patent on this we are still under there. So, we can't disclose exactly which particular algorithms we were using, but I can give you an example you know if you look at its statistical function EWMA you know it is a it is a very good one which is giving us good results or in terms of trends and seasonality double exponential that is also working well for us or hypothesis something like Grubb's test that's giving us good insight in terms of the hypothesis test that you know is it working well or not. So, some of these things you are trying but these are just few examples I am giving but there were a series of models you know which we are working through and this is a constant journey you know we are learning few things and applying few things every now and then and making constant changes. So, I won't say that you know we have tried anything in exactly from the deep learning standpoint at this, but what we are doing is we are working on a crowd sourcing model within the company right now. So, we are expecting some results in another month or two and basis on those results I think we will be able to further improvise it. But at least at this point where we are with the current set of techniques which we are using the results are really nice I mean any alert which comes people just can't ignore it and the good part is there is a scoring around it. So, if you see a result with a scoring of 10 or 15 you know this is probably a marginal alert or you know you can ignore it but if anything coming like 50 or 60 you know exactly this is a problem. So, basically we have got a I would say it's a mix of things you know we are a we are looking at a lot of knowledge which is available on the net right now. We are going through you know what people are doing you know in conferences like ODSC even what people are presenting and talking we are we are listening. So, whatever we can get from the external perspective that's definitely very very useful. Second as I said we have a very big in house practice. So, on the machine learning. So, basis on that again we are getting some very good insight from that. And lastly as I said we are working on crowd sourcing you know already there is a project right now in the crowd sourcing within the organization itself and we are awaiting some results probably in few few few weeks or so. Sorry unfortunately I've been asked to stop it now. No I'm done I'm already done I was taking few questions. Of course of course we are and please visit if you haven't got a chance please visit Americans Express stall over here and play some interesting games over there. Thank you so much. Thank you.