 Hi good afternoon. So before I start off I'll introduce myself. So I am Manus and I work for a company called Episodes. I lead the data science and energy practice and we have been working on building clinical intelligence systems for the past few months. This talk is mostly going to be about how we are building solutions which are cost efficient and in a serverless manner. So this is the brief agenda and I'll just kind of breeze through it. So we'll talk about the problems, what are the challenges, some of the architectural paradigms that we consider and then three out and then talk about some caveats and the final impact that we had through the architecture that we deployed. Now a brief about the company now Episodes is a healthcare risk management firm which means that we take data from insurance and the hospital providers in U.S. and make sure that the clients are reimbursed properly by the U.S. government which means that we have to analyze the anti-medical discharge summaries and claim records and understand what are the diseases that have been identified and whether or not they are reimbursable or not. So the context is that Episodes started focusing on ML and NLP more specifically because they were pricing pressure that is starting off in the market. Everyone wants automation and everyone wants the cheapest price possible and that is where we come in. So we want to build a scalable information extraction engine using NLP and machine learning. Now there are multiple challenges. Now the first challenge is of course that you have to have a scalable platform atop which all these data science products will lie. But the biggest challenge than that is since we are dealing with patient care data and especially we fall under the U.S. healthcare market we have to maintain HIPAA compliance which means that data has to be secured both at rest and in transit as well. And that makes it even more challenging because you can't use all these services. You can't use all the card providers and if you were to go use bare metals you have to ensure that regular compliance and audit checks are done on a regular basis. But this is the first primary challenge. The second challenge that we have to be cost efficient. Now we are paid per chart. We are not paid per disease which means that at the end of the day I need to make sure that my company is profitable. And this is something that I keep on saying again and again to many of the data scientists and my peers out there is that the job of a good data scientist at the end of the day is just to ensure either of these two outcomes. You can either make your company money or you can save your company's money. These are the only two outcomes and it is your job as a data scientist to make your company look good. And for us in that essence is that the KPI for us this what is the cost per chart that will come out after the NLP has been done and that has to be kept low as much as possible. And that is where the entire philosophy that we have built up on our entire architecture comes in. Now something like the first three pieces that scalability fault tolerance cost effectiveness and lean architecture is something that anyone who is building any kind of architecture will focus on. So let's not kind of focus much about that but we're also talking about immutable configurations and but we're also talking about self-healing what happens if a region goes on what happens if a availability zone goes on while the process is happening. So you have to make sure that the entire process is self-healing in nature and we are not repeating and setting up configurations again and again and that's where tools like Ansible and Docker come in. And this is the last pieces. I'm sure this is slightly a controversial piece that when I started off and when I started building my company and hiring people for that I did not want to hire people who are pure DevOps or pure ML people. I wanted to build a ML ops company or a practice where the where a person can create his own algorithm and deploy it as well. So what I wanted to create was ML ops team. So the obvious solution for that is serverless because at the end of the day at least this is my belief that no ops is the best DevOps. If I don't have to manage servers on a regular basis apart from some maintenance I think I'm doing the best DevOps. And there are multiple tools out there in the market nowadays which are quite popular Docker Ansible AWS cloud scripts cloud watch and so on which will give you way more control on your entire architecture the way you're trying to build it. And the biggest plus point for me was I did not have any servers to manage. So anyone who has been working on servers will tell you that the moment a server goes down for even a minute the amount of headache that it entails and that's something that I actively wanted to avoid. So what are implementation points? Now the first piece was that I had to make sure that my entire configuration is immutable. So when I develop algorithms on my system I should be able to test it on a real time basis. So that is where I am using PyCharm as my ID and Docker is my remote interpreter. So I am ensuring that my algorithms are tested in real time and there's no problem when I'm deploying it on the servers. I'm using Lambda AWS Lambda is a service which is serverless service of AWS and we are using it for mostly some amount of ETL data manipulation and NLP tasks as well. We are using Boto3 for generating security credentials on the fly. We don't want to save and have those AWS secret keys written on our code. So we need to generate security credentials on the fly. So we are using Amazon STS for that. We are using Ansible. This is like doing the 80% of our heavy lifting, triggering all the manual tasks, the stitching into a servers, setting them up, pulling data, pulling the EBSs, encrypting those disks and so on. And of course the final data pull and push to three buckets. And the final one is the NLP part. So this is the algorithms that are being taken care of. So these are some of the libraries that I mentioned. Some of the libraries were used and now we have moved on to other libraries. For example, we were using TensorFlow. Now we are moving to PyTorch. So this is our broad stack and the process flow that we have for our entire systems. Now what is our entire stack? Now to give you a brief background, there are always four important metrics in any data science product. Accuracy, precision, recall, and F scores. Now depending on your use case, some of these metrics may be important. Some of these metrics may not be as important. And the accuracy is an overhyped metric, I believe. But in our use case, recall is a very important metric because for us, a false negative is losing money. I don't identify disease properly. I'm losing money. Okay. Hence for me, recall is more important. So what happens is the moment I have new data coming, which is my training data essentially, or my new coding push, for example, I pushed some new changes to my algorithm. So I trigger this entire piece where Lambda triggers another set of servers and my training happens. The results are then saved and compared against the results. And if my recall is better, I deploy this new model. Otherwise I don't. I then log my model results and do my necessary logging for my purposes later on. So this is the ML deploy. So there's not entirely auto ML person, but this is automating the ML model deployment. So this is the first piece of serverless that we have done. And for this, most of the pieces that we are using are again, Lambda, Boto 3 and Ansible. But this is our most important pipeline and this is our entire document processing pipeline. So the documents are then uploaded on S3 by a client or by our own in-house people. And we are triggering Lambda after that. So we are essentially saying Lambda. Hey, there was a S3 upload one hour back or there were 1000 files uploaded one hour back. Why don't you start processing? So I trigger a Lambda, which will trigger another servers. It will then launch into a master worker configuration where the data and tasks are pulled from SQS. The reason we have a master worker configuration is that the master and the data processing has to be done in a private subnet. Okay, and the master needs to and we need to SSH into the worker. So master essentially like a jump box for us. And once the data is processed and everything is done, we push it back to S3. And then we delete all the servers and there's a secure API that we have created to interact with the results. Now this is the good parts about it, but that trade offs and the tradeoff is very simple and this is across any serverless platforms that you'll come across. Now you have Google Cloud Functions, which is a competitor for AWS Lambda. You have AWS Lambda on the other hand. Now AWS Lambda gives you a maximum memory allowance of one and a half GBs and the minimum is 128 MBs. I'm pretty sure Google Cloud Functions will have similar ones, but atop that AWS Lambda only allows you to launch one Lambda for no more than five minutes, which means that there will always be this trade off between your heavier workloads or lighter workloads. Now the moment you want to process higher workloads using AWS Lambda, it will cost you more money and it's going to be way, way more expensive than having a server 24-7. And on the other hand, and this is the trade off that we have made. So we have only ported some of the ETL tasks and I'll mention that in the next slides to AWS Lambda. Now that is costing us low memory. We are operating at 128 MB only and it's costing us around $35 for around a million invocations per month. So that's the price point. However, it will run into $3,000, $4,000. If I do the same amount of invocations with more than 500 MB of memory. So this is the trade off that you have to keep in mind whenever you're building or trying to port or build any serverless solutions per se. So this trade off is actually very real. So what are we using Lambda's or any serverless architectures for mostly? Now we were using it for minor ETL tasks. Where example, we are getting data from next in the format. We have to convert it into text formats for ingestion into an LP engine. We are using it for assisting into servers, which is the master in the public subnet and set up all the servers and run the Ansible scripts. There's also serverless DB API, which we call in house. I'm not sure if it's the correct technical term. So we have our data finally stored on in JSONs in S3. Now the moment my Salesforce application wants to query the results of a process, it can query this API. What this API will do is launch up a Lambda, go to that S3 bucket and retrieve the necessary data. So that's a serverless DB API backbar S3. And that's a minor cron jobs for monitoring and logging that we use Lambda for. So these are the few tasks that will be used by AWS Lambda. Now it may not be for you. And this is in pretty absolute terms that I'll put forth today is that there are many people who are exploring serverless solutions and then back off because they feel that serverless solutions are pretty expensive. Now it will be expensive if you want to run a proper machine learning model prediction on Lambda, which is not going to happen because one and a half GB for some of the heavier NLP models is not feasible. So if you have memory intensive workloads, serverless is not for you. So you don't want to go serverless if it's a memory intensive workload. If you have ultra real time response requirements, like for example, you want sub 10 millisecond or sub 50 millisecond response time, you don't want to go serverless because even though we are getting around 100 to 200 millisecond response, even with the API and the launching of the cron jobs, that may not be necessarily feasible for the other applications. And of course, if you have too many library dependencies, so we are using built in functions of Python, hence we don't have to create a package and then upload it. So if you have too many library dependencies where you have to have five, 10 external libraries, then you also have to package them in a packaging file and then upload it. And then again, there is a limit of 250 MB, which you can upload. Okay, so there are many caveats associated with and of course, what serverless ensures that you don't have a tighter control on monitoring. And so if you have, you need a very granular monitoring, it's not for you. So, so mentioning what are the challenges now these are the two broad challenges that we faced during the entire serverless paradigm is that what do I do if something fails? And how do I monitor each and every step? So there are around maybe 30 steps in my ansible scripts. How do I ensure that I'm monitoring each and everything because I need to do that? And how do I ensure that I'm logging and monitoring my MS scripts? So that is what we did. So for fault tolerance of self-healing per se, we decoupled it using Amazon SQS. You can choose your own poison in terms of messaging queues and you can decouple these tasks. So what we are doing is the moment a task is launched, we don't delete it from the queue unless and until it is pushed back to S3 or to our database. So that's way that way we are ensuring that the self-healing properties are ensured and after that we are also launching lambdas to re-initiate those failed tasks and run through the architecture again. For monitoring and alerts, what we are ensuring that we are using cloud watch and we are using custom logging from our ansible scripts and our python scripts to push back metrics to S3 buckets and cloud trades. While this is not entirely foolproof, we are surely being able to launch and analyze our entire logs and that's the way we are currently doing it. And as I mentioned in my last slide, if you're looking at doing very granular monitoring of a system, you are better off porting only some of the minor functions to servers, serverless solutions, but the major parts, you are better off not doing it. So this is the impact that we have had when we are launching the servers in a serverless fashion. So to process around a million charts a month and that's the volume rate that I have showcased here, which is roughly around 500 GB to a terabyte of data per month, it would have cost me around 20 to 25 thousand dollars per month to process that much data and send it back to our databases. But with the serverless solution what we are able to do is make it roughly around a 10th and the entire cost of this entire architecture is around 3000 dollars if we go to on demand instances and around 1500 dollars if we go to spot instances. The reason why there's a range is because you won't always get a spot instance. The demand may be high, but you may not be allowed your spot instance fulfillment. So you fall back on the on demand and that translates to roughly 20 paisa INR per chart and that chart length can be anywhere from 30 to 50 pages. So this is the cost efficiency that we're able to bring in by focusing on having a pure serverless architecture, not running servers 24-7 and ensuring that the most important tasks are ported. And the best part of it is that we have ensured that the entire architecture is a HIPAA compliant. So the data is secure at both at rest and at transit and that helps because AWS has many of the HIPAA compliant services, but that is was a major win for us. So this kind of wraps up what I had to share in terms of what work we have done at Episodes. You can surely reach me out on this coordinates and let me know if you have any more questions for me. That's it. Questions? I see one there. Very basic one I'm sorry, but is your architecture entirely serverless or you are actually using Ansible to deploy stuff on servers? Okay so serverless is what triggers our entire architecture. Now since it's a pure ML kind of background and product that we are building, we can't have deployed entire solution on serverless solutions. Now the way I look at it is serverless is I don't have to manage servers. That's my definition of serverless. Now if I'm able to launch my architecture on on-demand basis and not run servers 24-7, I think I'll save lots of headaches and monies for my company. We are using Lambda to launch those servers as well as some of the ETL tasks and the cron jobs that would typically run on our server. Got it, thanks. And you also made a pretty strong comment about the serverless and monitoring. So I think it's pretty straightforward to have everything on CloudWatch and have alerts on them. So what is the gap? Okay so when you're doing it on AWS Lambda, okay you would have CloudWatch logs which are generated, okay. However, not not all the metrics are generated. So there are the free metrics that CloudWatch gives you. They're the custom CloudWatch metrics that you have to generate on your own. And then you have to make sure that what is it that you want to monitor. In a serverless solution, do I want to monitor my memory requirements? Of course not because I've capped it. But I would want to monitor the time that it takes to go from one function to another and see where my optimization opportunities lie. And some of the requirements like where is my task failing. And it actually happened couple of months back where our task was failing at one specific ansible task and we are not able to figure it out. That's where granular logging is very important. And that's where serverless gap will be lying. Question? You mentioned about HIPAA compliance. So I wanted to know if you're using Lambdas, Lambdas are the HIPAA compliant? No, they don't follow under the HIPAA BA agreements. But we are not processing any data. Or we are not storing any data on Lambda. Atop that we are just even encrypting the file names while you're moving it from S3 bucket to another, because sometimes PHI data, the patient LK data can also be written in the file names. So Lambda is where we are not doing any data transcriptions itself. So the ETL task that you're looking at just to do the conversions from one and place to another to ensure that we are not being not non-HIPAA compliant. We are also doing it on private subnets. Okay, got it. Thank you. So it's running on our own subnets. I'm sure we have many more questions for Manas, but we're running out of time. Kindly take the questions offline. Thank you. Thanks Manas for your presentation.