 Hi, I'm Gabriela, I'm a machine learning engineer and tech lead at Legitim and I'm going to be talking about how we apply machine learning models in production to prevent online credit card fraud and increased conversion for Brazilian e-commerce and how we do it all on top of Kubernetes. First, I'd like to give a bit of a background about how credit card fraud works. In Brazil, where Legitim is based, credit card fraud is considered in some circles to be the national pastime. Brazil alone accounts for 45% of global credit card fraud cases. So let's just say there's plenty of demand for a good anti-fraud solution here. Any company that sells any sort of high-value item online is a prime target for fraudsters. Half-priced plane tickets, discounts on pet products. Sure, why not? A fraudster has a stolen credit card in their hands. Anything that they buy is essentially free. They're not paying the bill for the credit card. So that means they can resell whatever they purchase at a discount price because any money they receive is pure profit. Many of you have probably been on the other side of this experience. While reviewing your credit card statement, you noticed the purchase or a few that you didn't make somebody either stole your credit card or got access to your credit card information and made a bunch of purchases in your name. So what did you do then? You called your credit card company. You explained to them that you've never made this purchase. Maybe you canceled your card even. And the bank tells you they're sorry about it and that they'll refund these charges and you even will get a new card right away. So that's the end of story, right? Well, not exactly. So you see credit card companies aren't exactly known as the most benevolent folks out there and they definitely aren't the types to just give you your money back and put the loss on themselves. So the moment a credit card company refunds those charges to your card, they just turn right around to the original merchant who authorized that purchase and demand the money back from them. So all that money that the credit card company refunded for you, that's actually coming straight from the merchant who let that purchase go through. So this process in which the credit card company essentially reverses a previously proved transaction and forces the original merchants to reimburse them is called a chargeback. They happen all the time and can be disastrous for merchants. If your platform that takes only a small percentage of the value of goods sold on your platform, so say 10% and you originally took in $10 out of a $100 transaction that later comes back as a chargeback, you're now going to need to pay $100 for that chargeback. So if you made $10 on that sale, you're going to now need to make that same sale 10 times to get the value back for that $100 chargeback that you're having to pay. So you need a lot of sales to make up for the losses from one single fraudulent transaction. So this can be devastating for a business and controlling losses from chargebacks is obviously a very major concern. But one thing that's important to notice and that we're trying to convey with this image here is that stopping losses from chargebacks is only one part of the equation. The other set of this coin is conversion. So allegedly, we always say that stopping fraud is easy. If you just reject all of your customer's purchases, you'll never have fraud. That's your solution right there. Well, that's obviously not applicable. So the hard part is balancing your fraud prevention with conversion or more concretely rejecting as few valid customer purchases as possible while blocking as many fraudulent purchases as possible. So you want to reduce your error on both sides of this, not just by not letting fraudulent transactions go through, but also by reducing the amount of legit transactions that you block. So that's where Legiti comes in. What we do is we train machine learning models on vast core pieces of credit card transaction data to deliver customer-specific anti-fraud models to our e-commerce clients. And we're pretty good at it. So we've managed to get credit card fraud in half within six months of entering at one of our customers and increased conversion of valid purchases by nearly 4%. All of this without ever having a single human analysis of an incoming transaction. Our model is deployed in production and when a customer of ours wants to know if they should accept or reject a potentially fraudulent transaction, we respond to them in seconds with a high quality response by consulting the data that we collect from our SDKs and our APIs. But it takes a lot of work to make it happen. So that's the part I'm going to dig into with all of you and not terribly concerned that anybody here is going to be copying our work because as I'm sure many people here know, it's pretty hard to set up a massively scalable data collection and ML model deployment system from scratch. So even when you know what you want it to look like, it's still really hard. And also it all depends on the quality of the data that you have. So you're all going to get a very open and honest look at how things happen behind the scenes at Legity. So let's dive in. As I mentioned, our system is fundamentally based on machine learning. So every time a user is trying to make a purchase with any of our customers, which are the merchants that integrate their platform to our system, our customers reach our API and we consult machine learning models to give them a decision on whether to reject or approve that transaction. But machine learning is only as good as the data you figure models with. So a huge portion of our system is actually focused on data ingestion and processing. So we have both SDK APIs for collecting data in real time. And with that, we get access to transactional, user, session, and behavioral data. And we also have lots of real-time data processing pipelines that get this data in within seconds of us receiving it. The data gets into our database. So this is one example here of API pipeline that we have. So our customers reach us through our API gateway, and we start passing the data through a few different Lambda functions. We use Kinesis SQS to get the data in, and this is all happening within seconds. And then the data is available for us in our database, which we're using Amazon RDS here. However, collecting the data is only the first step towards being able to feed a machine learning model. After that, we still need to transform their raw data into something that is valuable to our models. That's what is called machine learning features. So for that, we use both a feature store and a feature generation Python package to generate those machine learning features. So in our feature store, we store some features which can get pre-calculated so that at usage time, they can simply be fetched from the store. So what happens is that we have some EMR jobs running recurrently. These jobs fetch some data from our database, process this data, get features out of them, and stores it in two different locations. So we have an online store and an offline store. The online store is a very low-latency storage solution. So we're using a Redis cluster running on AWS's Elastic Cache solution. And we also have the offline store, which is a storage solution that has less rigid latency constraints, but is able to store larger amounts of data. And that's one part of the features that we calculate. So the other parts are some features that cannot get calculated beforehand. So we calculate those at real time. And for that, we're sharing code between model training and inference by making usage of an internal Python package. And by sharing that code, we're also actually reducing the risk of getting training serving skew in the feature values that we use. So once we have those features, we can finally train our models. So we use those features to train a different model for each customer that we have. And we might actually have even more than one model per customer, because our API is built in a way that allows for us to have AB testing with our models and also shadow model functionalities. So we get actually quite a few different trained models. Once we have those, we serialize them. That is, we save it to pickle files, basically. And that's where we integrate it with our serving system. So what we have is our order evaluation API. And this is how our customers request and receive our real-time fraud detection decisions. And we put those trained models as embedded models into this API. This is the API that we're running fully on top of Kubernetes. Here's an example of what the request to the API looks like. It's pretty much a post request where we get a lot of data related to that purchase. And with that, we can relate that purchase to a lot of other different transactions that we've seen in the past. So on this point, one thing that's important to recognize is that this API is reached out in a critical part of a customer sale process. Our API gets used while they're trying to decide whether to approve or reject a transaction. And this might be part of a synchronous flow. So in that case, all of this is happening while their customer is waiting to receive feedback on their purchase attempt. And it delays on this flow, cause a direct impact on their customer experience. This means that this API is under critical latency constraints, and we need a system that can scale without causing degradation in the API's latency. So to achieve that and have that API running in a scalable and reliable manner, we're using Kubernetes. We're on AWS, so we have a Kubernetes cluster deployed on EKS. And that takes away some of the work that would be involved in getting the cluster up and maintaining it. Abstracting away the lower level cluster management tasks, like the management of Kubernetes control playing and the cluster nodes. And we're using EC2s for our nodes, and that's very easy to define the size of the machines we want to use for our nodes, make any modifications to it, should we want to change it. So that's a very flexible solution for us. We're using Terraform as well to manage our infrastructure. So all of this can get code managed and version controlled, and we're able to easily integrate it with the rest of our infrastructure and services network. We built our API using Python, as it integrates well with the machine learning code that we have. So the server runs as a Flask application. The server here, here is this Kubernetes API pointing the middle of this diagram. So this is where this Flask application would be. We then connect this to the rest of our system by using a VPC link to map requests in our API gateway to our EKS load balance, which then routes requests to the Flask apps in the Kubernetes pods. And we build this application as a Docker container and push it to ECR, which then allows for a seamless integration with EKS. This container build happens as part of our CICD systems, so we're able to version the builds and trace back builds to get commits. There are quite a few things that happen inside this application, which are represented by all these different arrows pointing to and from the API. So we parse and validate the content of the request we receive from our customers. We extract some of the features from the feature store, which pre-calculates features. Some other features get calculated in real time by consulting a DB and computing features on top of the query data. We consult our machine learning models. We might even consult a few rules our customers can design themselves in the format of allow and blockless. And we log all of our decisions with information about which decision was used, and that's what allows us to monitor our metrics. We also log decisions for shadow or unselected AB models, and that allows us to compare metrics even between our different models. So all of this means that we have heavy objects loading to memory, which are the models. And there's some intensive CPU usage to calculate features on the fly. As I mentioned before, all of this is happening under very strict latency requirements, so Kubernetes has been serving us very well here by providing a robust way to serve this API. Now, we've got our server up and running, requests are coming in, getting features evaluated and models consulted for decisions, we're properly sending back responses, everything looks great. So our job is done, right? Well, what if out of the blue we start receiving an abnormal amount of requests, and then we'll learn that one of our customers is having a promotion, which is what's accounting for the peak and requests. We don't have any time to make any changes to our setup, but we can't always foresee the type of increased activity beforehand. And we cannot degrade our performance due to that. So that's where Kubernetes auto-scaling functionality comes in really handy for us. We're using Kubernetes horizontal pod auto-scaler, which allows for an extremely easy auto-scaling setup. We've configured it to get triggered on CPU usage, which is the main indicator of degrading performance that we have as an overloaded CPU would result in slower feature computation and hence in an increasing latency. With that, we can now seamlessly and automatically scale without degrading performance, and it really just takes these 20 lines of codes to achieve that. So it's a very easy solution that gets a lot of results. However, our job was not done once we had the system first deployed either. We're constantly making code changes for bringing in improvements, adding new features, or making adaptations to new customers. We're also constantly changing our models because we're constantly retraining them, be due to new data or to the modification or addition of new features. So if deployments are going to happen frequently, we need it automated. To allow that to happen, we're using CI CD, so that changes are automatically deployed every time there is a merge to our main branch. As I mentioned before, changes trigger new containers to get pushed to ECR, and then we just use Helm for upgrading our application release, which is what's happening in this picture here. So this is an actual part of our CI pipeline. This is the Helm deployment step. This is what it looks like while it's updating it. So Helm allows us to redeploy our services live. We don't get any downtime when we do that, and we make usage of the readiness probe configuration to inform the deployment of when and whether new pods are actually ready to start receiving new requests. If that doesn't get reached in a given timeout, the deployment is aborted without any effect to the currently running application. We're also using pre-stop hooks to prevent from pods being killed in the middle of serving requests. So the bad thing about having it automated is that it's easier for errors to slip in, as no one's manually triggering and watching the deployments, right? Well, not in our case. So we've built automated test suites that run not only at the unit and integration level, but also at end-to-end level. That means that each new deployment starts getting tested as soon as it's put out in our actual production environment. And should anything go wrong, rollbacks get automatically triggered to restore the application to its previous version. So here we're also using Helm. For this, we're using Helm's rollback functionality, paired with S3 as a registry, where we store the ID of the latest successful deployment. So if our end-to-end test fail, which is the case in this picture that I got here, Helm just automatically triggers a rollback by using the time sent or the ID for the last successful version. And if end-to-end tests are successful, then that last successful release identification gets updated. So that is the one that's just been deployed. So that was it for today. We hope that with this talk, we could show you how we use Kubernetes to build a real-time fraud detection system that is bringing great value to our customers. And that even a simple low-manage Kubernetes setup can get you a long way. So thank you all for the attention.