 Hi guys, I usually start with asking the audience to guess from where I'm from, but fortunately I'm from Chennai right now. Thanks to Shreyas and Jainab for bringing us here to Chennai. And welcome to Madras Place, I call it Madras for the name of it. Let me introduce myself, my name is Kiran Darisi and I'm the co-founder and the director of technical operations for fresh goods. We also know as Fresh Desk. Yeah, it's a fancy name, so that's what the title means is, if the site is full, they're gonna call us the Jainab. So let me tell you how I work with this talk too. So I will tell you why we chose the cloud and how we architected the cloud. And for the past seven years being on the public road, what all the learnings we have, what's the future and what is cooking in Fresh Desk right now. I will just give you glimpse of what is then and what is now the Fresh Desk. Yeah, you'll see that it will come from the cloud. That gives us more energy, at least in Chennai. Okay, so the first one is the happy scenario of living to a string layer straight. So you have your couple of applications to have your DB and bad processing and those are the happy days, what we have. And the next thing what you see is the more complex architecture of Fresh Desk right now. With more layers, a lot of microservices and microservices talking to each other and what's not, all the other micrologs. So these are some metrics right now, we have it at Fresh Desk. You can think of it as a show off slide right. So like how much we do on a day to day basis. The best line I like in this slide is half a billion requests we cater for a week or a week to be market. That is I feel that's greatly great. And more roughly, this is on a scale of the timeline. Like how much we did on 2012. I haven't categorized 2011 because we just released the TDA in the middle of 2011. And until 2017 right now, we are rare in the end rate. So putting 2018 as a futuristic approach. I just want to call it. This round this hard way we just added on Ero Reels. It's just an app and mode mode and not replaceable. So you can say that you combine all this app and all this in another way. So that is the number of softwares we use right now at Fresh Desk. Okay, moving on to why to chose cloud right. So at least the initial theme, the six of us, we are from a developer background. So three of us are the developers of the first six. And we all have a developer background and we don't have any ops knowledge. So we like to chose the infra, where infra can be easily repeatable and can repeat multiple times. And also it can be configurable easily. Okay, because we don't want a lot of ad hoc systems where we want to configure to go and do the things that are like that. So we chose the safest and most easiest option that is pass platform on the service. We chose engineer. And engineer runs on AWS, but it gives you a pre-built cookbooks and there you can just say that, okay, give me a couple of apps. Give me one TV instance and one monitor for the TV instance for the backup and one background system for emails. Like that you just need to select on the UI to do this up here. And the other point is we are pretty new for the hardware itself. So when you architect, so we need to know that for an architecture what is the best hardware we like to use it, right? And so this AWS has an array of instance classes. You have a CPU, you have this much of RAM or whatever. So we need to experiment on which instance class we can use it to get that experimentation done for the cheaper price or the public load is the one. And when you go to a data center, which is a physical data center, I'm speaking, we need to know how many servers will be managed by one day. Okay, that's how they market. It's like for thousands of hours they'll have one day to look after it. Whether it's the mechanicality, it can be power, it can be backup, what not, right? So sort everything, so you need a server to man ratio. For a six member team who is doing a development and also apps, we didn't feel much of it. And to get to the proof, we didn't have much of money to decide who can data some of the great features of the public load for it. And why, when we chose the public code, we went with some basic principles. So I call it as a DevOps triangle. DevOps is a word written word these days, right? So for still I use the DevOps code here. So you have cost, you have performance and availability. I can throw an infinite amount of money and achieve 100% of it. But still I won't reach the unit of economics for the SaaS company, right? Because I'm running SaaS for a reason. So I can't throw in a lot of money. And as a developers, as a product owners, we need to understand what are the critical parts on your system and what are not. So we need to define, okay, my email system, my emails and this system should be okay for a 99, okay? And say my phone system, which is so critical, you can't get it out. I need four names of it. So you need to allocate there. So that is how we need to strive for the cost. So that is the matching. So this is like equilateral triangle. It can't have one dimension to exponential increase. So you need to have... It should be equilaterally enhanced. Even for the performance, we can say that for a ticket system like Fresco, so how much time it takes for ticket to create? So that is say 200 milliseconds. So you can even do it in 50 milliseconds. But whether it's worth it, whether it's worth the cost or the time of the developer, right? So we need to... Based on it, we need to say, okay, this much cost I can actually spend and this is how much performance I need. This is the availability metrics I have. So once we have this basic principle, we move on to architecting the cloud. I'm architecting a product for the public cloud. So how many of you know this? So it's a famous Simeon Abhi from Netflix, right? So where you can... It will just go and knock off all the instances or switch off the databases, but still you need to be highly available. But I don't know why Netflix spent that time and did it right? Where AWS was for free, okay? So you can't trust a share base for anything. So in any instance, right? It will just reboot, it will just go off. Your EBS volume will stop working and your IOPS will go off. So first we thought of Hian MV layer. So it can be an immutable system or immutable system or a data storage system. So for example, we chose the RTS. It's on the master. We have the Monty AZ. And we have Redis with some of our parts. So you use the elastic ashen, fallbacks. And some of the parts we use Redis latches, which runs in a custom mode. And once you chose on the public road, before I told you which instance class to use it, right? When you run your load tests on your public road, like AWS, it all depends on who is your library. But because we are running in a VM, another VM sitting next to you, say if it is self-cut, definitely on the morning times, you will see a deteriorated performance. So all the performance what you get is not proportional to what they promise. Like the number of RAM or CPU is not the same thing. It all depends on how much the other guy is putting them on it. So we always did the load testing on dedicated instances. Even we need to spend 30% more for it. We just did the dedicated instance. That is how you know that okay, for this workload, this is the right instance class we need to use. And from the day zero, I mean day zero can be debated. So it is for, for instance, it's from two years after starting. We started from the templates. We started templating the infra. As I told, we want the infra to be repeatable. This is the best way to repeat the infra. You need to have the templates, so that if you want to put another layer, okay, this is how you need to boot it. The templates is starting with the recipes, but we went to the cloud formation. Now we are on Terraform. After two years of starting, from starting, we haven't wrote any bascripts or any Linux scripts to just run our environment. We start using the, just from the early days. And we thought of horizontal scaling at all layers, right? So I put something. So the early investor and investors asked me in the early days, I asked him, which actually don't even move slow. So it's like RAM is going to be cheaper on a day-to-day basis, even the hard disk, right? So why I need to start? So I will just vertically scale the system, because the day before I read some blog post from 37 signals where Jason Frey told that you should never charge your system, charge your database, because it adds your complexity at all layers of your application. But at this point, investors is shattered with double number of disks because we change the attitude, and we said we're shattered. You need to think of vertical scaling at all the levels. It can be database systems, or your search systems, you use the elastic search for it. And yeah, definitely immutable systems like application servers, you don't need the horizontal scaling, right? You just put up anything under your HR process in the scale. And it is from... If your product reaches some production scale, it can be money or number of users, you need to think of it as dear for your data storage system because if your entire US East region is gone, if you want to go to US West, the data storage systems are the most time-taking things to actually come up. Yeah, we are kind of... we are kind of for the cloud and we are running in the cloud, right? But how do you actually measure it? So when I say measure, right? So it's not only performance or availability for the end customer. We need to even take care of cost. We actually use Netflix Science and also our own homegrown tools for the costing. We started allocating the budget party. At this scale, FreshCast has 40 different AWS accounts. We use AWS organizations for handling all the AWS accounts we have. And we allocate the budget with respect to revenue. Maybe for some social startups, it can be number of users, right? So you can say for these many ex-users, we can have this much dollars we can actually spend for it. And for response time and availability, we came up with a new idea from Deloitte Midricks. This we derived ourselves. So it's like say, for example, you take a ticketing system, there are multiple actions in the product for the consumer. It can be a ticket create, or tickets send, or update the ticket, something like that. For each action, we kept it short. And every day, we monitor something like, so whether we can able to serve at least 95% of the request of those within the control. So that is our Deloitte. And combining with number of 500 others, I would say 5 ex-users, right? How many 5 ex-users? So the combination of these two is our Deloitte. So we religiously follow this. We won't showcase to the customers, because for the customers, they will only care about whether the app is up on line. Nothing much, right? So we do it internally. This is the Deloitte Midricks for each team we have. And how you monitor all these things, right? So lots of where we send in all the information. So we use a lot of event logs, and we send all the event logs to the super object where we'll monitor and have alerts to it. So a typical event log looks like for this customer, for this action, this much image, many seconds it will get. And what is the status code for it? It can be 200, 500, 400, whatever. So that's how we monitor. And we set alerts in small logic when it crosses with the short order below the threshold. When I say below the threshold, which means that your microservice is not sending anything or is not sending anything. So when he says, even you need to put a threshold alert when it goes down. So when you start sending alerts, this is when you're going to wake up all the SRA teams in the United States. So for every alert, you'll just send a page to GDR, you'll send a mail, or you'll just send a unique call. You need to come out of the alert for each. Once you get into a lot of alerts, you'll start getting the alert for each. And you don't even act on the real alerts you have. So for that, we have some homegrown tools where we say that a DB is down. So since he's acting up on it, you can just go and pass it and say that, yeah, I already know that DB is done, so I'm working on it. So I'll pass it for a while, and I'll start and do a play after my DB is up. And we have another feature where we actually group tellers. So once there is an incident for the developer or the SRE person needs to know what exactly happened for that point of time, right? So we group the alert saying that, okay, these many DB alerts you got, these many infra alerts you got, these many app alerts you got. So for them, it will definitely improve the MPT app. So from day zero, Prestus is a global company. Predominantly, we have the development center in China, but we saw the customers from U.S. and Europe, right? So that is the best thing where the database will come in the picture, where you can be near to your customer. So it can be your ELB, your CDM, whatever. But after some seconds, the customers will want their data to be also with them. That is when the data centers will come out. So in Germany, these are the latest data centers of three data centers, what we have. The data centers are the parts, all these things will work only when you have standardization at 12 parts. The plan says that in U.S. I will have chat, and I don't have chat in Frankfurt, or I don't have email in Ireland. And you shouldn't have multiple branches of recipients or branches of cookbooks between the data centers. You need to have the same recipes going through, because you will start, stop doing the right thing. So when you have multiple branches serving for multiple data centers, I just kept giving here that is where, that is what it tells you is actually you need to have the data to be decided in your local system. So after being in the cloud, the public cloud for seven years, we have a lot of learnings. I will just tell you some learnings, what we have. Being in SaaS and being on public cloud, you need to think of velocity. That is the only thing that makes your business to go forward. You need to think of velocity. When I talk about velocity, it's like how fast you can deploy, and how fast you can actually know that there is something wrong with the deployment, and how fast you can go back. These three things actually make sense to be in the cloud and be in the SaaS. That is how we need to actually go. So we call it as intelligence. So as an obstacle, you don't want any change to make your system, because it is working and it needs to be working. But as a developer, you need to fully push the features, right? And when you start pushing the features, that is where you see the bugs and go. So if you are actually tagging your course, it's really good, but if you are not doing it, start doing it. And you need to start tagging your infrastructure. So now we came to an age where there is no difference between your infra and the core. It's just a time to get the bet. It's just a marriage which happened, and it is not going to be divorced. You need to watch on the infra in a way that when you roll back your core, you only infrastructure to roll back. So there is no point in infra just running there or having a different configuration. And in Prestus, we do blueprint deployments. The easiest way or the safest way is to push out the changes, the incremental changes. For the guys who don't know blueprint deployments, it's like you have a blue stack running with your old core to have your new core. So you will put up the similar infra. You call it as a king stack and send your 1% of the traffic there and just monitor the others, performance, everything. Once you are okay with everything, so you push the other traffic in stages like 10, 25, 30, 40, 100. So that is the safest way where you can do it. When I say similar infra, you need to think of even the cash systems are ready to save the thing. So you can play around the namespaces and all, but yeah, when you say true infra, you are actually booting up everything. Apart from MySQL, MySQL is a bit tricky if you have blueprint deployments so that your core should be aware of pulling back the schema. It needs to work in the previous schema and that's the only caveat to do the booting, but you need to have everything up, like your background systems, the cash systems, everything. And the intelligent proxies helps us a lot of times in developing the traffic between the thoughts as well as between microservices. When I say intelligent proxies, why do you use a proxy, right? Why do you use a proxy for throttling, for the communication, for the connection pooling and whatnot for the security everything, right? So we use Internet Express LuaStrip in a combination for the proxies, but we are evaluating the lifts in Envoy, right now, that's written in CPP and which is a very good right until now. It's still not proxiness at first test. You need to spend generally for more inputs, whether it be in APM or log, alerting, whatever. So as a developer, we have this motion rate, so it's like when you see one part where you need to pay, you'll think of whether I can build it. Okay, but when you are six people, 10 people or 20 people, we can't actually do it. You need to pay for it, even if it's going to cost it, that's going to serve a lot of things. And if you are on AWS, and you just start paying for the reserve instances, which will give you a 30% discount on the front line, at least you can start with your data storage systems, like MySQL, which are not going to change for one year or two years. I would say that you pay for one year, don't pay for three years of reserve instances, because AWS has the habit of releasing new instance classes for every one month. That's the record, I don't know whether they're going to change for the next six months. So when I say that it's actually, there is no difference between your code and your infra, which means that the developers should be in front to actually do the production support. You can't say that, okay, there will be a centralized development, which will take care of the availability, and you guys don't need to care. So it's not going to work out anymore, because when you're there, writing the application code is something you need to know, it's highly available, it's horizontally scalable rate, what systems they need to select out of the bunch we have. So we'll not put the centralized DevOps team and our developers itself will serve it, but we have a small SRE team, that's only R and D. Their job is only like, okay, you have something which does only one X. Can we do this thing in ten X? A lot of things will fail, but that is the R and D work we do it. So when you are scaling and you are getting the revenue, you are getting the funding, all these things, one thing that the most prominent thing in what we actually market is security. So in public cloud, when you have running on the public cloud and you have a lot of developers developing it, so security is not optional. It should be embedded when you are actually developing the code or doing it. So we use AWS Prostrate Advisor for all the accounts. It will tell you two. One is, it's actually like, whether there is any security incident happens, any permission changes are open to the world. It will tell you, okay guys, these are open to it. And another thing, they tell it on the cost, that it's like, is there any system which is ideally running? I actually put it at the security because the security creeps in when you have the code in legacy mode where you have it updated for a while, patched it for a while. Think of some, a couple of developers just using two instances and let it running for a while. And it's running the legacy code it never prepatched. AWS Prostrate Advisor will tell you that, okay, there are two instances running with 0% CPU for last 10 days. You like to take an action. It will see those costs as a side effect, but for me, I think it's more of the security practice. Even you have a right developers and a security team, everything running inside your app in the system, you need to think of a big, bug-balled program. So you will earn it, like, trust me, you will earn a lot of things. It's actually, it can be either code or it can be infra everywhere. You will earn a lot of things when you start a bug-ball. You need to hack them for it. And we have audit trail enabled from day zero in AWS where we send, even that is an event log rate, we actually send all this to SumoLogic and alerters when there is a change in the security group or change in the permission of this S3 bucket or is there any modification in the infra service we use where all these things will be categorized into audit trail and then we get the alerts on it. And when you have your developers supporting your production, you can't make them as first level citizens and just go and test the production or production in your online. So you need to have some interface where they can safely go and keep up the system. It's like, they can't run a drop query on a MySQL. You should have an interface where it should restrict them not to run the drop query on a delete query or anything to safeguard your production environment when multiple people are touching it. And it also comes with the PIA data, right? The security where you can't, not everybody can see the PIA data. So you need to be certified all these things are there. And if you are running an AWS, you should run in the community. See, there is no matter what. And also at this stage, you have multiple microservices talking to each other how to run in the say a different VPC VPCs and do the VPC PIA in between them. You need to take care of this idea of the engines but that's the minimalistic change. You can actually check it in the operability review or whatever, right? And never use the AWS APHs. So when you start using the APHs, you will be tempted to lose it anywhere. Either digital or you'll keep it in your system, you'll keep it in the email, keep it in the chat, what not, right? So start using the books. So books is the way where you can actually say you can actually speak to multiple surges of AWS. So these are all the learnings we have. Then what is our future and what's currently cooking in first test? So functions with some production scale or good scale in terms of revenue or number of users we will have. You need to start thinking of what critical systems we have and what are the services, three-wheel services we're going to use. I will give an example. Like say for ticketing system, the EMN is so important, right? And we were using Centred but we can't say to the customer saying that okay, Centred is down, so I could be also down, right? So you need to start coming out of your previous services. So critical thing. We can have non-critical services or something which we just pushed right now to use a three-wheel service but once we reach some certain scale, there is no escape. You need to come out and have your own three-wheel services. Email, write-all runs from Freshless Teamfra. And we need to start thinking of the agnostic deployment, code agnostic deployment. So some part of our code write-all runs from GCP. And as Imran says, there is no escape for it right now. We can't just go and attach to one cloud. You should have multiple clouds. You need to still figure out the latencies and what needs to be deployed in the other data centers. But yeah, you need to have this and think of homegrown monitoring tools. Freshless we actually use about 2,000 odd instances. And it's not an easy job for a not team or a SAP team, just go and see that everything is fine. So the one which you are seeing in the left top corner is 2.16. So where it, I just made it clear what is the real world scenario between. So these are multiple databases we have where with a single view where it will tell you if it turns out which is, which means that they have some problem. Like that you can actually extend it to your studies. You can extend it to your easy instances like that. Okay, with one class you are able to tell out of these 100 instances which has a problem. Which should have a problem. And you need to think of or we are thinking of the developer activity. Like how fast a developer can push the code a little bit. So we developed a tool for the deployment, a pipe deployment where the developer just finishes the code and just click on the deployment. And it will take care of pushing the code, monitoring it and doing the blue, green, everything. Switching the traffic, everything. Only if it sees some problem it will just notify the developer if you want to go and check it out that it might be some problem with your deployment. And you need to start doing the anomaly detection. Say for example in a B2B smart care you can't product the traffic. Say for an e-commerce for Flipkart when they have billion dollar sales they know that there is a spike in the traffic. For us, say Flipkart is our customer if Flipkart has a bad day we will have a spike in the traffic. Maybe a buggy code they push it out they have a lot of support you guys. Just like with my show if a robot 2.0 releases there will be a lot of traffic and generally it triggers freshness. So we need to think of anomaly detection. It's like threshold based alerts are no more relevant to the system. You need to think of some mathematical function where we say you have this kind of a traffic and it's just going beyond what is not expected and you have it, take a look at it. And see some parameters give context to the developers Okay, this is what is going in your traffic. Okay, I think so that's it from my side. Thank you. It's still 10 seconds left. Okay. But I don't know what's the last question. Nice, Kiran. Any questions that you have for Kiran? No questions? Any recent particularly the U.S. and U.S. and U.S. and U.S. and U.S. and U.S. Yes, Kiran we haven't any analysis on my quick look at the part of them we want to use past N-genot has the only option of either they need to use their own data center or database. So at that point in 2011 there is no real competence for the AWS. So there is only one public platform only this year we are actually doing some analysis across the multiple clouds.