 So, I am a director of machine learning at American Express, so here I wanted to give a brief talk on some exciting application that we have used using generative adversarial networks on structured data sets. Normally, you would have seen a lot of success for GANs on images and unstructured data sets. So, we wanted to share our learnings on how to do GAN on structured data sets. So, let us get started. So, American Express I guess should need no introduction to this crowd. So, we have 59,000 worldwide employees, 114 million cards in four. So, what that means is we continuously get information about 114 million cards and customers and we are continuously making business decisions on them regarding their financial risk, regarding their marketing prospects and so on and so forth. So, there is a lot of data and play here and hence there is obviously a lot of scope for machine learning and that is where our whole team is set up. So, we have a decision science team that spans across the world across different locations. So, we have a large machine learning setup in Bangalore here, 250 plus MS and PhD graduates. So, the idea here is to first write in-house algorithms as well as apply those in-house algorithms on business problems that are sort of proprietary to us and solve for those problems. So, what sort of problems do we normally solve? So, our problem solving capacity sort of varies from financial risk. So, what I mean by financial risk is what is the credit risk of a customer? Will the customer be able to pay back or will it be willing to pay back? What is the fraud risk, whether transaction is happening, whether the incoming transaction is fraudulent or not? That is a sort of risk models. We have marketing models, we have customer service models and so on and so forth and then a sort of line models. So, like what is the ideal credit line to be given to a credit card and when should it be changed, how should it be managed and so on and so forth. And all these are automated scientific decisions powered by mathematical models each have its own algorithm and entirely machine learning is at play behind this. So, today I just want to touch on one particular aspect which is the customer management part of our business. So, what I mean by customer management is, so just a show of hands, how many of you here have an American Express card? So, we are managing all you people as we speak. So, we are not managing you every time, but yeah. So, what we, it is a more of a virtuous cycle. So, when a new prospect applies for a card, the first decision we have to take is whether to issue a card or not, what is the limit on the new card, deciding on the credit worthiness of the customer. But once a customer is on our books, there is a lot of decisions that continuously happens around real time credit management. Whenever you go swipe at a restaurant or swipe in Uber, we are taking decisions on whether your the genuine person is the genuine customer is he transacting or is it a fraudulent person or is this transaction credit worthy of the past behavior that the customer has established and so on and so forth. And we also do offline credit management which is basically managing your payments, whether the payments are coming on time and so on and so forth. And if you want further financial instruments, more credit cards, more loans and so on and so forth, we do further customer underwriting. So, this circle is what we call customer management. So, before you are into our books, your prospects and once you are out your defaulters, but when you are actively in our books, we want to manage the risk of the customers that we have today. So, this obviously means a multiple decisions, point of sale assessment which is what I said. Every time you swipe a card, whether to issue that, accept or deny that transaction, take lending actions and assess willingness to pay, manage payments, etc. But all these have to be counterbalanced by customer experience. So, we do not want to disrupt any genuine customer just for the sake of doing risk management. So, we have to balance all these things. So, any model that we produce has to be extremely accurate because the event rate of someone not paying this is going to be very low. So, we have to accurately identify those very small proportion of customers and take actions only on them. So, that is where we have invested a lot in machine learning and in deep learning over the last many years and deep learning more so in the last three to four years. And we have had multiple successful applications running today on CNN, RNN and GAN. So, this talk is something that we have worked on on GAN and we wanted to share the learnings with you. So, before I jump into the technicalities of the presentation, I just wanted to bring out a photo of the Airbnb I stayed in in my holiday last month. So, I was at Los Angeles and this is a very nice place and you see a lot of nice pictures and this thing. So, I am just kidding. So, but would you believe that this is an entirely made up fake Airbnb? Everything right from the pictures, the pillows, the lights, the furniture, the flooring and the text, the caption, the owner's face, the owner's name and the description of the place. If you read it, it will look almost something that an Airbnb owner himself would write. So, this is an entirely synthesized data. None of this is true and there is this website. This rental does not exist.com. If you just go there and refresh every time, it will ditch out a fake Airbnb every time and across the world. So, how is this being made possible? Images, text, contextual captions, everything, a computer is generating on its own without any manual input needed. So, this is being made possible by a wonderful algorithm GAN and which is probably taking the deep learning world by storm in terms of data synthesis and data augmentation and lot of fancier applications and lot of much more important applications in medicine and in other retail industries are also happening. Sorry. So, okay, another quick show of fans. How many of you have already worked to some extent with GANs? Okay, lesser than those number of people who have credit cards, but still. So, GANs were introduced in 2014 by Ian Goodfellow and then it was the best paper and nips and it took the world by storm as to what can be the potential opportunities with this algorithm and the whole idea of adversarial training was born. So, adversarial training itself is like a very innovative way to use a neural network and so everyone here would know what is the supervised learning network. There is an input, there is an actual that you have to predict and then there is an weight training that you do using neural networks be it an MLPCN and RNN to just discover a function which will reduce the error between actual and inputs. But let us say I do not want to actually predict anything but I just want to make more of my inputs. So, a very interesting idea was to go with an architecture where there are two neural networks where you feed in a random input and you make a neural network predict a vector of the same size of the input that you want and then this predicted which is what let us call it synthesized data. We are passing synthesized data along with the actual input data to a second neural network which will try to guess if the synthesized data is looking very similar to data not. So, it will try to discriminate the loss between the two data sets but not necessarily between x and y and this loss is back propagated to both the neural networks. So, basically what happens is this neural network tries to differentiate between the two data sets. This neural network tries to make more and more better synthetic data sets which looks much similar to your original actual data set. So, they are pitted against each other like adversaries that is why it is called an adversarial training and at the end what you get is like really nice like the picture that we saw was something that is very close to actual and nothing you cannot not indistinguishable to the human eye. So, GAN has seen a lot of success as I said in the unstructured world where you have images, you have audio, you have text etcetera in synthesizing newer data but what does it have to offer for MX. So, the question that we asked ourselves was in customer management we have an even more unique tough problem for a new credit card customer. If someone let us say is just of the campus like is just a student and is just caught into a corporate sector or a very new customer someone who is just relocated to a country we do not know enough of them. So, and for them we are just newly acquired them. So, we do not have their payment behavior, we do not have their spending behavior. So, it is very difficult to do risk management for these people. So, what we do if on the actual data that we have on them today which might be very limited and very low quality if you were to build a model any supervised learning model it will suffer a lot due to lower volumes and the model is generally weaker. So, generally in our low tenure environment we normally see a lot more customer experience disruptions etcetera and we wanted that to be solved. So, what we thought was what if we solve this volume problem by synthesizing GAN like a synthetic credit card customer and add them back into your training data. So, there are no such customers I am just creating synthetic data points but I am not using them to take actions I am not denying credit anyone but I am just using them to stack up my training data and then actually now I have much higher volumes and can I build a stronger model. So, that was the question that we wanted to ask and how did we go about doing that. So, first challenge that everyone here has to sort of recognize is GAN the literature if you go read upon GAN today everything is on images and unstructured data sets. There is not much literature on how to work with a neural network like this unstructured data set and when I mean by structured data all we have on our customers we do not have their photos or we do not have voice recordings or not much of text also all we have is numbers how much you are spending how much you are paying what is your history what is your credit score etcetera. So, these are all numbers and categorical information. So, how do we train a neural network on these sort of information and a pixel is homogeneous a pixel can always be explained by red green and blue but what data we have is very heterogeneous and very high dimensional. So, what I mean by heterogeneous is we have amounts like what is your spending amount could be in a decimal value could go from any negative value to a very high positive value balance a credit score something like credit score could only be a 3 digit value between 600 to 850 or whether you are delinquent or not could be a binary indicator and so on and so forth. So, think of it neural network is a weight training method. So, it is it is obviously going to be much better if all your data points all your x 1 to x n is homogeneous but if each of them are in different scale different meaning different connotation it becomes much difficult. So, how do we solve the heterogeneity problem? So, these are all some initial challenges that we faced on using GAN on structured datasets. So, what we thought was so we have gone through this journey for almost an year now. So, thanks for the musical is it from my phone? I think so nice. So, we wanted to share six key learnings on how to work with GAN on structured datasets. So, the intention is like the last the next three pages when you go out of this room in the next 5 to 10 minutes you will be at least having an general outlay of if you have a structured dataset and if you want to apply GAN on that what are like possible pitfalls and portable methods that you have to follow through. So, the first thing is true of any weight training methods people who have built a logistic regression here would know that it is most important step here and it has come back in Vogue in with neural networks like after the boosting and bagging phase. So, even in credit card customers there are like extremely high spenders or an extremely highly affluent credit card customer. So, there are variables which are going to be extremely highly values or if you are doing a corporate credit card there will be companies with extremely high revenue, high employee count and so on. So, how do we sort of treat them before you do a weight training? So, if you are doing a boosting or a decision tree process these outliers do not matter at all you will you can easily put them into a separate part of the tree branch, but in a weight training you will have to do the truncation before you do any other steps. So, this is so whatever I am saying these six key learnings I would recommend that you execute in order. So, the first thing would be to do an outlier truncation and the best way is not a min max or any other form of truncation, but to do a P1 P99 truncation because you do not even want your weights to be moving towards the tails you want to cut out the tails before you start doing this neural network training. The second part is something key for American Express itself like because as I told you there are many different kinds of there is an amount which is a decimal which could go from negative infinity to positive infinity, but a credit score could only go between a 600 to 850 or a different or customer tenure could only start from 0 and count onwards till like a few months or a few years. So, again GAN and for that matter any other neural network suffers a lot if you were to just use them in its continuous form throughout in its own scale. So, GAN is best worked with when you bring them all to a single scale which is what we do so much success in pixels because all of them are in 0 to 255 scale and if you can do a Zscore standardization before you do anything else and bring them all to a 0 to 1 scale or a minus 1 to 1 scale your GAN convergence will be much better as well as much faster and the third point is this is something you would not see in the literature today like how do you do missing imputation for GAN because we have searched a lot and we did not find anything. So, how do you do like today again customers we do not have all the piece of information about all our customers for some customers we might not know their verified income for some customers we might not know anything about their how much they are spending externally etcetera. So, we might not know their credit bureau information. So, but still you will have to create if you lose all those customers from your analysis you are already suffering from lower volume. So, your models will become even more weaker. So, without dropping those records how do you treat these missing imputation. So, the general way of doing missing imputation in neural networks everyone would say that do a sort of a KNN based imputation, but KNN based imputation suffers big time in GANs and we have seen that a median treatment works much better in GAN as compared to a KNN based imputation and the last one is an optional point, but still very valuable if your data has a class imbalance if it has a skewed its distribution all your variables as a large class imbalance GAN will suffer again in modeling those things creating synthetic data points around those variables. So, if you can use a boxcock transformation to bring them into a normal distribution all the variables into a normal distribution this again would be very valuable and I will stress again it would be better if you do this in order first treat outliers then standardize then impute then normalize and then. So, this is as everyone else in today's presentations have said half of not half 90% of machine learning is just data cleaning and data processing and this page just tells you like the success of GAN lies in cleaning your data and bringing it to this point then it is just a click of a button you can run GAN. So, yeah thank you. So, 0.5 is okay now you have built GAN you are now you started seeing synthetic samples. So, how do you there are some RBNBs which might be absolute bullshit it might have pictures of an instead of a house it might have pictures of a castle or something which is not very realistic. So, how do you pick which are the good synthetic samples and lose the ones which are not. So, here we have defined a principle of orthogonal selection. So, you will have to identify those synthetic samples which are orthogonal to your actual data points. So, if you have already some data points of some nature. So, the whole point of data augmentation is to create synthetic samples it will aid your initial model. So, you will have to create pick those pick a strategy which will do ensure that your new data points are orthogonal. And the last point okay now you have picked orthogonal samples do you want to use them as is this is again something that you will not see a lot in literature you will have to treat the synthetic samples again. So, now it becomes a sort of an complex or an not contradictory cycle like you want the computer to tell you a synthetic data point then you want to manually intervene and change the synthetic data point to look something like actuals and then use it I know it looks sort of fishy, but it works this way the best. So, if you were to so for example a credit score can go only from 600 to 850, but if you are going to give me a synthetic data point let us say credit score of 2900 that makes no sense to the model it will unnecessarily kill the model's efficiency. So, you will have to first address out of range values before you can actually use those synthetic data points. So, these are the six key learnings that we wanted to tell you so before I show my results I just wanted to sort of embrace you to a potential pitfall of GAN which is called mode collapse this is true for images any sort of GAN that you can build. So, when you are building a GAN you might see that the results look good your average predictions are looking very closer to your actual predictions, but what we will see is that this is the sort of distribution as we try in GAN with more steps we want to see a nice distribution with a good distribution of values, but and the average when you compare this average with your actual data real data's average it will look good, but what you might actually see is all of them collapse towards the mode all the values just generally you are not getting a distribution of synthetic points you are just getting the same values again and again, but when you take the average the average will look good. So, this is something that you have to be very careful about how do you solve this to solve mode collapse you have to significantly increase the complexity of your generator and discriminator add more dropout or add more layers and then that's the only way to solve this there is no straight forward way, but this is something you have to solve case by case in your problem. So, we have done all these steps solve the pitfalls and all and then what we see in our credit card problem is that for when we create new synthetic load in your credit card members they look very intuitive to the actual people. So, a someone who is a defaulter is much more credit hungrier defaulter synthetic sample is much more credit hungrier than a non defaulter or a defaulter is much more utilized on his credit limit is like very high balance on his credit limit or his credit score is much lower. So, you see that the synthetic data points that we can get from GAN is able to help us actually mimic our portfolio much better and bring in more volumes which can enable us to improve. So, when we added it to our models we were able to see significant jump in model performance which is mainly precision and recall of our models and just by without having to collect more data for customers with existing data we can actually give them a much better experience. So, that is my last slide. So, what we are seeing is GAN has a lot of value even in structured data sets and GAN will provide a lot of incremental risk prevention to American Express as well as make our card number experience much much better. Thanks everyone I will take questions if any. I think the mic is coming. Yeah, we have time for only one question sorry. Okay, we can speak outside afterwards. I want to know that because you are as you told that you were working on the less tenured customers like using GAN against the less tenured customers. So, this is before selling the cards or after selling the cards when you are like using that. This is after they come into our books. So, we have already issued them a card. Now, we want to do active management of their risk that is when we do it. Okay, so now then. Guys, only one second if you have any questions just meet us outside or we have our booth here I am going to be there all day. So, please just visit the booth and they can chat more. Yes. So, GAN will work for text or images or audio. So, GAN will work for which GAN will work for giving performance in the text or image or audio. So, that is the most standard use case. So, everyone has proved that GAN works for images and text and audio. So, this is something new. So, for images we can speak separately but that is much more proven. Okay.