 Hello, everybody. My name is Gaurav. I head the cloud platform and services team at Snapdeal and I have with me Arun, my colleague in free charge and today we just want to take some of your time to give you a little bit of insight into what we are doing at Snapdeal and free charge and as you probably already know these two companies have merged and now we are working together on handling some of the challenges that we have faced individually. So while we were thinking of what should we present and what should be the topic and everything, it was two thoughts coming together. I wanted to talk about some of the stuff that we are doing at a much larger scale in Snapdeal and Arun wanted to share his journey of building the free charge platform, struggling with all the nitty-gritty things and we thought that's okay let's try to put it together and that's what this title came about, the phantom of the DevOps. So basically what I would start off is the challenges, the ghost of DevOps that happens in probably all organizations when they are trying to build an application of scale, the challenges that they face and what ends up happening in a DevOps organization and what's the real world. And then I will give you some sneak peek into what have we learned from last few years and how are we building our next generation platform. So to start with, I like building teams and I like building teams who are solving a specific problem together and I think DevOps for that matter is the right, is the culture for that innovation from what I wanted always to inculcate in my teams. But over time what happens is these thought processes bifurcate as we are doing firefighting. DevOps becomes a separate team and developers becomes, development becomes a separate team and they try to work with each other but the responsibility is split. I truly believe that the true way to do DevOps is when developers and operations people come together and build a platform that they maintain together, they are responsible for keeping the uptime, they're responsible for understanding the requirements of each other. So from that philosophy we want to create a platform where everything is a service. Individually we can imagine a lot of data center or a lot of components in an infrastructure and traditionally we've seen this in the shape and form of IaaS and platform as a service and software as a service. These are the paradigms that probably all of us understand by now, these are the paradigms that all DevOps and development organizations work with. But I think we need to go much deeper than that and we need to build more services in our infrastructure and the ones that are most common that I want to highlight are things like database as a service. We've seen some tracks here about using MySQL, using Postgres as a platform or as a service that different applications use. But when we start writing an application and we say, okay, I want to use MySQL as my data store and here, hey DevOps team, I'm opening a ticket on you, can you create me a database environment? How many times do we actually communicate things like what is my SLA of this database environment? How many transactions per second am I going to do now and how many transactions per second I'm going to expect as my application scale? What is the data size? These communication channels are not there most of the time when we are thinking about an application because we are thinking small and we don't know how viral our application will become very soon. It would be much more intuitive if these services were available or these applications were available as a service and the DevOps team knows about the different scale of these services and maintains it for you. For example, the DevOps team creates an endpoint and gives it to the application server saying, go right to this endpoint. This is your database and I will take care of the scaling part. I will take care of the backup part. I will take care of the disaster recovery of your data. That's something that we can build as the part of the platform itself. So goes with message queue as a service and backup as a service, disaster recovery as a service, tools as a service. We all use CI CD. There has to be a way that this layer is available to all the developers from day one as they are using these applications or these services in their platform. I don't have to, as an application developer, need to write or think about how my CI and CD works or how I will code into the development. That's all, the tool chain is all done as a service together and then used. What's the need of all of these things? Ultimately, we want to get here. We at Snapdeal are working on an architecture where we will have a public cloud and a private cloud hosted on different data centers. The public cloud is out there, maybe AWS or something, private cloud is something that we will build and there is a tunnel between them. It becomes one cloud. It becomes a hybrid cloud. The applications are agnostic of what platform there is. They are completely agnostic of whether they are running on a public cloud, whether they are running on a private cloud. Workloads will migrate from one place to another because application does not know whether my database is local or remote. It's the platform that is intelligent enough to understand whether this is a transaction heavy application. It needs a certain number of IOPS. I can't provide it in a public cloud. I need to bring it into my private cloud infrastructure. Software defines everything. That's what the vision is. That's what we are working towards. It goes much beyond our applications which we are touching and feeling. It goes all the way to warehousing. We have a large warehousing capacity in Snapdeal and we are automating the process of how do I find a certain item and which aisle it is sitting on. There might be sensors. There might be devices on the aisles who are telling me when I'm walking with a mobile phone saying, oh, okay, this aisle 1C is where you will find the iPhone that you are looking for. All of these things are possible because if you are abstracting the platform and exposing all of these things to the service and the devices can continue. That is why when free charge comes on onto our platform, they can write their services on this platform and can get up and running very quickly. So this was a quick sneak peek into what we are working on and all the services that I was talking about, these are the work in progress and some of them are in production. Some of them we are working on expanding. The idea over here is to come and talk to you guys to share the vision and then share one story which is a free charge story on how quickly did they grow and what challenges that they face in that growth and choices that they make. That's what my friend Arun will walk you through. This is the free charge journey all the way from day one to how did they migrate to the cloud, what choices did they make and what problems did they encounter and how does this framework help them? Thanks, Gaurav. So I thought I would just present, play this video as a short demo as to what free charge is. So as the video shows our aim is to make free charge as the most convenient way to recharge whatever. So we are starting with prepaid mobile but we will soon get to anywhere there is a payment. It could be any utility BSNL, electricity bills and so on and so forth. So this is what our product is. Our next few slides, I'll take you through the journey of our application as well as infrastructure, how it has grown from day one to where we are right now and what have been some of the learnings that we have taken through this. Yeah, so this is where we were initially. So as simple as it can get, I think first session someone talked about having a monolith app, it could be MVP. But the thing is we ran in that MVP mode for quite a while. So for close to two and a half years, we were in this setup. So it's just that we threw some hardware at Tomcat, we threw some hardware at database. But this is all that setup was, no caching, no nothing. Everything was stored in a database, application was a Tomcat and then we had the web server as a HF proxy. So then as we started, of course, this is the thought that comes to everyone, what's going on here. And then once we started scaling up, I'll talk you through some of the iterations that we went through. So the very first thing that we did was to offload all our static content serving. So until now, all our static content like HTML, CSS, JS, images, we were serving out of Tomcat. And then as a double downside of that, we were also incurring very high bandwidth on our data center, which was hosted out of India. So what we did was we put CloudFront as a very first scale up measure. So it did several things for us. As a scale up benefit, now we didn't have to worry as that much while we were running some of the campaigns. So because at the top of the funnel, this is where most of your load is going to go. You're trying to serve static content. At the top of the funnel, I mean you're right at the someone who's trying to just visit. Then you'll have to serve all your images, all your HTML, like CSS, JS, and so on. Once we offloaded that, we pretty much stopped worrying about top of the funnel scaling. And then the second benefit, of course, was the monetary one. So the bandwidth usage pretty much went down drastically. So this was the effect. So the circle one, as you can see, that was a period when we deployed CloudFront. It can compare it to the previous day where we were. The bandwidth usage went pretty much drastically down by 90% or so. So then we started caching things. So all the configuration data, all the various static information that you have, such as it could be operator mapping, any master table. All of them, we were again going to database for reading every time. This was one of the fairly obvious things. If you can retrospect now, it occurs. But that time, it wasn't. So we introduce readies at that point in time, and then we still continue to use it. Again, so what it does is, apart from giving you very good latency, it frees up your database load. Now your database can take that much IOPS and that much storage space as well. And it will help you scale that much. And so then the next target for us was to getting session storage out of MySQL. Until that point, again, session was stored out of MySQL, and then we were serving out of it. We moved it to Mongo. It again freed up a lot of space on MySQL, space in terms of both bandwidth as well as the actual raw storage. And then it has nice properties such as you can time out and you know, you can expire, you can archive the data and so on and so forth. Timing out was a huge pain on MySQL. We had to roll out our own jobs and work with our DBS to change the data, archive the purge, session store and so on and so forth. And every time you do that, we ended up taking a downtime, which was pretty bad. So as a next step, so what started up happening was that MySQL was turning out to be one of the bottlenecks, right? So we found it out in a pretty funny manner. So we were getting up for one of the pack campaigns, which was actually one of the very first campaign that we did. You know, it was a pretty big one. We partnered with Pepsi and then gave away codes on each of the Pepsi bottles, like two liter bottles. And then we did a pretty big media campaign around that as well, like, you know, Dhoni and all came. They would do, like, did the advertisements. So at that time, once we launched it, MySQL turned out to be the bottleneck, right? So at the bottom of the funnel, when someone is trying to do a payment and then recharge, MySQL would just hang, like it would stop. And then we, at the time of debugging, we'll figure out that, you know, it's holding a lot of connections, but very few of them were from actual applications, right? And then, accidentally, we, like, not accidentally, but in a complete panic mode, we started disabling a lot of, we found out that there were a lot of spurious users on MySQL database, like, reporting, and some of the individual users had their own accounts on MySQL database. We started disabling them and found out that load went down, like, by pretty much 90%. And that's what led us to figure out that a lot of, and pretty much all the reporting was being started out of MySQL master, even though we were running MySQL master slave mode, right? And that's when we took a call to, you know, let's move all our reports to and all the reads to MySQL slave. And by, and then there again, there are, like, different kinds of reads, right? So one is in transaction read, where, you know, you're trying to read something, hold a lock and then do something else. Those reads would still go to MySQL master, but slave would take reads such as historical data, you know, someone is trying to look at their recharge history, payment history, and so on and so forth. Those we started sending to MySQL slave, like, these were the ones which were driven by the app server. And all the reports we again started sending to MySQL slave. So now reads become, became pretty much, you know, infinitely scalable, so to say. So just to scale it up, we just had to add one more slave server, right? Yeah. So out of this, right? So one of the, one of the big learning at this point in time for us was there is no single kind of storage that solves everything, right? That's when we took this call as to, started looking at our application characteristics, right? So then at a very top of the thing, so we have something as a asset store which requires, you know, atomicity, you need transactional semantics, but it need not store huge volume of data, right? Some of the examples are it could be cart, it could be payments, it could be wallet and so on. They require very high availability as well as transactional semantics. For them, we use MySQL. And then we have strong consistent key value store, right? Again, the classic example is session store, fraud profile of a user, right? When we use, while someone is trying to make a payment, at that, during transaction, we try to figure out whether it's a fraudster or not, by looking at his previous history, recharge history, payment history and so on. For those, we started going to MongoDB, right? And we have eventual consistency. This is, so we use DynamoDB for that. By MNPI main, mobile number portability, we have a feature where if your number is ported, we detect it automatically second time around. And then some of the analytics, fine-grained data, like the result of some of the analytics, like your coupon analytics, it could be your payment analytics and so on. Those results we store in DynamoDB, because that need not be strongly consistent, right? So this was one of our quite a big learning for us, right? So then the final step for us was to, at this point in time, we were using a lot of cloud services, SQL, DynamoDB, SNS and so on and so forth. And then we pretty much jumped into AWS completely, right? So we migrated to all our servers and services to AWS. So apart from giving us elasticity and, you know, ability to scale up and down, what it also gave us was it cut down a whole bunch of class of errors for us, right? Earlier we used to get hit by, you know, DDoS attacks and some spurious load issues, right? Then we would do a series of debugging, right? So try to look at our app server. It could be happening here. Okay, it's not here. Okay, where is it? Then we would go all the way to routers and switches, figure out what's going on and not. With this, we now are fairly confident that it cannot happen at that level, right? So a whole class of load issues completely disappeared for us. Now we don't have to worry about is a router a bottleneck, is your firewall bottleneck, is your switch a bottleneck, right? Now if there is a load issue, we pretty much know that it's in our app layer, right? So next, I talk for a few minutes about the philosophy behind capacity planning that we undertook. So once we moved to cloud, again, we were doing a lot of capacity planning. How do we set number of Tomcat threads? How do we set connection polling? How do we set number of queues, queue size, this and that, right? So rather than going to the, walking you through the elaborate exercise that we did, I'll just talk about some of the tidbits, right? One of the very basic things, even before you start doing capacity planning is you have to measure first, right? Because if you don't measure, you don't even know what and where things are improving, right? So it's even before you take an exercise, you have to measure, like where, at this point, we had built our own very good metrics, collection infrastructure, like we use graphite, we use statsd, and then there's a simple Java library that we use, like we built in-house and then it collects data from various machines and pumps into graphite. It allows you to, you know, with various dashboards, we know what is going on, right? It's like getting a blood report, right? So as our founder says, so then looking at it, you know where things are going on. Now, if you make certain changes, now we can clearly measure its impact, whether it's doing good, doing bad, nothing has changed and so on and so forth, right? Because if you don't have data, there's very little you can do about it, right? So these are some of the pitfalls that I urge everyone to avoid, right? So the first one is, as I talked about, if there is no data, there is no measurement, there is nothing much you can do about it, right? Even if you throw three or four servers, you don't know how it's going on, right? And the second one, the graph that I've drawn, depicts how the average load is very different from your day-to-day load, right? So while you're doing capacity planning, you don't, you should not be thinking just about average numbers, but you have to think very clearly about where, how am I going to service like 99% of our customers? Because your average will, as the graph very clearly shows, if you had planned for average, then even if you, let's say, add like 10, 20% headroom, it's not going to help you, right? And some of the systems that we are building, if it goes into a negative spiral once, then you are doomed, right? Let's say if a load shifts to a database, mySQL database, restarting mySQL database when it's taking load is clearly not fun, right? As some of you, pretty much a lot of you would have experienced it, right? So don't plan just for the average, right? So average in our, in our scheme of things means very little, right? And the third point is you have to plan for the spikes, right? Again, spikes have this nasty characteristic of taking down the entire site with like one or two spikes can take it down, right? Again, it's a matter of going into negative spiral, right? The failures start cascading and then the bottlenecks start shifting to lower and lower level. It could be database, it could be hardware. So you have to plan for the surges, right? Again, so planning for surges doesn't mean that you have to serve all the spiky traffic, right? You could have throttling, you could have, you could have some kind of queuing mechanism, but you have to plan for it, right? So plan for the surges. So this is what I call it as a rule of thumb for capacity planning, right? You have to plan for 99 or 95th percentile and then add some room for the surge, right? Then you'll be in a safe zone. At least you know what your capacity is, how you're willing to take, and then you can go to your marketing people, then tell them, you know, I can safely handle this much of load, right? So this is how we did the stress testing. So there are of course various ways you can do stress testing. In our philosophy is that unless you stress your production traffic in a realistic production scenario, you won't get real data, right? So we used to do a lot of flash sales, like buy one, get an offer for one hour, and then announce it on our social media, like today from five to seven, your recharge is free, right? And then we used to get some crazy kind of traffic, like things like unheard of, right? So then it revealed us very interesting things. It turned out that our logging was a bottleneck, like we were using non-SSD hard disk, it was a bottleneck because our application was logging so much that it turned out to be bottleneck, bottleneck. Then we moved to SSD, then we changed the whole bunch of things, right? So this gives you, the advantage of this over traditional stress testing is that it gives you realistic traffic. You, no matter what you do, you cannot simulate a real traffic, right? So this is a huge advantage that we had. So yeah, so now I'll probably walk you through a couple of battle stories. So one, one fine day, you know, sometime in the evening we found out that our site was down, Tomcat was running out of threads, and the impact was across all the channels, like a website, Android app, all the apps were down and so on and so forth. We were trying to figure out what's going on, and then it turned out that Google plus API was behaving erratically. It was having very high latency, right? So we have Google plus login as one of the login mechanisms, and a way to authenticate is we call their API and then we authenticate. That was taking quite a lot of time, right? And that resulted in a lot of open threads, and Tomcat was running out of threads, and so on and so forth. So that chart, actually the dots that you see on the right-hand side are the, this is Google's App Engine dashboard saying that they had a few down times. So this was very, very interesting, right? So if you compare to the previous day traffic, the right-hand side, like towards a late afternoon, traffic started going down, right? We wouldn't know what's going on. We say and looked at pretty much all the places, logging, metrics, alarms, everything was absolutely fine, right? And then it was still clueless, you know? Nothing was happening, and then all of a sudden, traffic started picking up to normal levels. It was after 5.30 or thereabouts. Again, we were like completely clueless what's going on there and so on. Then finally, some DevOps engineer, one of the DevOps head said that, you know, turn on the TV and this was going on, right? Which was that Rohit Sharma was approaching double century against Australia. And naturally, not many people are interested in doing, recharging their mobile site, prepared mobiles, right? So then it turned out to be, you know, one of those natural act of gods that you would call, right? So it, yeah, it was funny and then, yeah, so this, I'll leave you with this, what were the learnings for us, right, in an entire thing? The first one is while you are in the journey of scaling up, it's very easy to get muddled up and trying to do change too many things, right? Let me change, let me throw more data, like hardware at the database, let me put this, let me throw the cache, let me do this. But I urge everyone to just take it one step at a time, don't try to do, change too many things, it might just worsen the situation for all you know and then you don't know what's happening, right? You don't know its impact. So just change one thing at a time, right? And the second learning was, again, the don't trust external services, right? By trust, I mean have correct time outs, have error handling mechanisms. So trust, treat them with very high suspicion, right? And finally, of course, that's where we started our journey, right? It doesn't make sense to serve traffic, static content at all, right? So this has been our brief journey, brief in a sense, it played out over a period of one and a half years or thereabouts and yeah, so we are where we are right now. So I'm ready to take questions. Hello, can you please tell some of the experience with DynamoDB? Sure. So yeah, so I'll just, the question is about DynamoDB, some of the experiences. So DynamoDB is very, very good if your access is always given this key, get me this value, right? Even though it supports some of the where clauses and so on, if you have use cases like, give me all the users who satisfy certain condition, right? Don't go for DynamoDB. DynamoDB is not optimized for scanning, right? DynamoDB is optimized. So analytics is, we store the final result of analytics in the DynamoDB, right? So when we want to fetch users, like it could be Koopans profile, it could be some profile, we store, we know the user ID or email ID and then the result is what we store, right? So don't use DynamoDB for scanning. If your use case fits that, it's fantastic, right? It scales pretty much infinitely, scale up and scale down, you can do it programmatically and then the latencies are pretty much in terms of milliseconds. How about the scaling with Tomcat? Yeah, so the question is about scaling up the Tomcat, right? So the thing you have to keep in mind when Tomcat is that, it works on the threading model, right? So what you essentially play around with is how many Tomcats do I have and then how many per Tomcat, how many threads do I configure, right? So that exercise, you can do it in multiple ways, one or two ways. One is you actually look at your application code as in, then you figure out, okay, this call text, this much latency, this call text, this much latency and so on. And then you apply, there are fairly standard queuing theory mechanisms which will tell you that, you know, if you have this much latency and then this is my expected traffic, this is the number of threads that I should configure on Tomcat, right? However, after the point in time, what will happen is because every Tomcat opens up so many database connections, so many connections to your MongoDB, so many connections to your caching layer and so on, those will start becoming bottleneck, right? So your Tomcat won't be loaded because let's say you over provision your Tomcat, like just to be on safer side, you figure out that two is my optimum number of Tomcat servers, but just to be on the safer side, I will double it, right? Let's say I will have four Tomcat. But what will happen is four of them open so many connections to MySQL and they hold the connection. It won't be fun, right? So then you can't scale up independently. So at that point in time, you know, you will be pretty much forced to move into independent databases, right? Move to micro-services or service-oriented architecture, have individual services, have their own databases, right? Okay, thank you. How about the fraud handling? Yeah, so fraud, we are still trying to, you know, learn things, more and more things about it. So what is the specific question about fraud? Is it like, how do we do it? Or what are the technologies we use? And the online, how frauds are will be like... Okay, can you... Online frauds, right? What are the repeated frauds? Okay, okay, yeah, yeah. So, yeah, I forgot to talk about this. So our nature of business is such that recharge is the most abusive category used by fraudsters, right? Because the fulfillment is instantaneous. Like you pay up, you get a recharge, and then you're done. And in India, prepared balance is treated pretty much at par with money, right? So if someone gets hold of a fake, like stolen credit card credentials, the first thing they do is they go to all the online recharge sites and then try to recharge, right? So for us, it's a very, very severe problem because we cannot delay the fulfillment unlike e-commerce site because they have one or two days or at least one or two hours of gap between payment completion and then initiation of shipment. We cannot afford that, right? We cannot tell that, you know, now that you have money, I'll give you a recharge after a couple of hours. We can't do that. It has to be instantaneous, right? So for us, hence the fraud mechanism has to be such that even before the person attempts of payment, we have to figure out, right? And it essentially boils down to collecting the data and then figuring out how does a fraudster behave, right? Typically, we get to know, like typically what they do is open an account and then suddenly try to do 5,000 rupees recharge, right? No one does that, right? Typically, a new user, they never do that. You're trying to, you're hesitant. You're using your card for the first time on the internet. No one does that, right? So then we build certain models, like if you are a new user, if amount is more than this, that, we come up with those models. This is more of a building the model, but the trick is always how do we collect the data, right? And at the time of attempting a payment, we cannot try to evaluate the model. We cannot go out and look at all this recharge history, all this payment history at the time of attempting a payment because there, the latencies are, like, we can play around only in terms of milliseconds, right? So the trick is to pre-compute as much as you can and then collect and store the final fraud profile in a place where you can access it in terms of, within sub-millisecond latency, right? Hi. So how did you handle bot attacks? So we are not completely robust against bot attack yet, right, to be honest. So essentially we use Nginx and then we have some Nginx rules that, some throttling rules. Like, if there is from certain IP, you have this. If the rate exceeds certain limit, then block it and so on and so forth. So this brings us to a very interesting thing, right? So where, so, again, so once what had happened was, against the site went down, so on and so forth, we're trying to figure out what's going on. We were getting, like, tons and tons of requests from everywhere, right? Geographically, like, all the IPs, we could not figure out that it was a bot attack. Then it turned out that one of the card storage, so we use a card storage service, card storage service was down and all the clients, like Android clients, desktop browser and so on, they were infinitely retrying, right? So it was a day where we got DDoS by ourselves, right? It was a pretty bad scenario and that's what taught us, you know, have timeouts, right? You have to have timeouts while connecting to the service. Otherwise, it has a potential to bring down the service. I, two questions. One is, can you shed some light on what logging practices did you follow because you were moving towards distributed services and moving a lot of stuff to the cloud? Secondly, on that note, again, when you're, say, serving static content through a CDN, making a lot of those tools into services again, how has that affected your local development process or onboarding of members? Is there some policy or a fallback measure where they can work in a constricted environment but when it goes to production, you're using those things instead? Yeah, very good question. Can you repeat the first one? I did not get the question. First is, could you just shed some light on the logging practices because obviously, when you're moving towards a lot of things? So to be honest, we haven't got that completely right yet, okay? So what we measure, what we do internally is we haven't found out a way to enforce it or automate things. But what we do is we either through code reviews or design reviews, we make it a point so that everyone logs at the right, there are two things, right? You have to log good enough information and then it should not be too much information, right? So then you go into some other problems, right? So then the process is essentially around code reviews. So we, someone is code reviewing and then we tell them, you know, is it the right logging level? It's not the right logging level. You know, you're trying to say that, okay, I made a call to this service. Should it be a warning? Should it be enforced? Should it be error? We enforce right amount of logging level, right? And then there is a bunch of automation scripts take over, which take the logs from there and then move it to centralized location. Yeah, I get the answer correctly, but that's where we are right now. And the second question is, yeah, so it's a very good one, right? Because we have struggled it ourselves because when you move into CDN, local development becomes a very painful thing. So good thing for us was, even right from the initial days, we had all of our code base was configuration driven to an extent, right? So we never hard coded an endpoint into the code ever, right? It was always read from the configuration, right? So let's say we are trying to serve, like see, then free charge dot in slash CSS slash something, the free charge dot in was never hard coded in any of the JSPs, any of the code ever, right? So what did we do? It was always read from a configuration file. And depending on where you're deploying, if you're deploying to production, then the endpoint would just change to cloud front. If it's local, then it would be local host. Then we have a file where we specify a bunch of URLs, everything for a local development environment, it redirects becomes a local host. In a development mode, it goes to something else, goes to cloud front and so on and so forth, right? So for the new, and he doesn't, he did not bother about it too much. Hey, over here. So which cloud provider are you using AWS, right? Okay, so say for example, there's a genuine burst of traffic and there's all of a sudden, there's huge amounts of genuine traffic. So how do you deal with that? Is there any auto-scale mechanism in place to handle that kind of loads? So currently, no, we don't have auto-scaling yet. So because of which we have ended up super-spec-ing to an extent. So we, as I said, right? So we have for a TP, like 99 percentile or 95th percentile of the traffic, we are configured and then we take some heuristic, like we double it or we triple it and then we have so many Tomcats and so on. So search handling is still rudimentary for us yet. So we are not done that. That said, even the auto-scaling, we know that hasn't, doesn't work that well for search handling. It is because by the time auto-scaling reacts, the search would have come and gone, right? So the, so see, as I talked earlier, right? So you can take two approach. You can try to say that, you know, I will serve every traffic, all the traffic, or I will try to serve as much as I can. That's where throttling comes into picture, right? We should have a very, very good throttling mechanism. And so that, at least some part of the users are not affected, right? You have brownout, it's good to have brownout than a blackout. So, hey, second question. So most of the traffic, I mean, that you serve is from India, okay? So, I mean, which zone in AWS do you prefer? Which one? Singapore, okay, thanks. Hi, so currently you are using Tomcat as a container for, yeah, here. Okay, so currently you are using Tomcat as a container to deploy your web app normally, right? So have you tried other containers like Node or Vertex type of technologies normally? And one more point in the same. Like, putting the static content on CDN means how much it impacted, like the aspect of graph it impacted significantly. So, and how much cost it comes in that view? Okay, so the first question, right? Are we trying different app containers? Definitely, so as I speak, we are actually testing Node.js deployment. So Node.js is very important for us, not just from load perspective, because we are trying to revamp our mobile website drastically because some of the low-end mobile phones, they don't even have JavaScript enabled, right? So we cannot do any DOM manipulation on the client side. So DOM, whatever HTML has to be generated on the server side, and then it has to be transmitted over the wire. So for those things, we are actually testing our Node.js deployment as I speak, right? So it could go anywhere right now. And some of, now we are in the process, we are trying to move to microservices model, right? Where individual teams are trying to experiment some of the application development platform. Like, we are experimenting with Drop Wizard, someone is experimenting with Ruby and Rails and so on and so forth. That's pretty much happening. Have you also planning to use Vertex, like Node and Vertex, like both are in the same area of work? Like Node is more towards JavaScript way of working, but this Vertex is more towards Polygot, means you can, have you tried or have you listened? Okay, no. Yeah, so as I think the first talker talked about, it's pretty much about what the team knows and the front-end development team was like, is pretty much very passionate about using JavaScript. So they said we'll use Node.js and why not? Okay, maybe you can also try strong loop, means that's again a node-based work. Oh, sure, sure, sure, I'll. Hi, I have a question related to how your current deployment looks like and how much deployment you do in a, is it like, so mostly what kind of tools you use and how often do you guys deploy? Yeah, so we deploy about thrice a week or thereabouts, but now we are trying to move to a model where any developer should be able to push code to production at any point in time, right? So that's the philosophy we are trying to adopt and then we are trying to go there, right? And the tools we use are, we use Jenkins for deploying and then we use, I think we use Puppet for configuration management, right? So it's not completely automated, some form is like, to initiate a deployment a human is needed, but then after that it's pretty much on its own. Hello, so my question is around like scalability, how you, the first slide you showed, right, wherein you had an application with a single database node and two Tomcat servers and one HAProxy doing it. So looking at it, the design was not in scale, right? So at one point of time you might have realized that you need to design to scale, right? And the things you have talked about is like deployment at scale. So what is your design strategy wherein you're saying, this is how we changed our application strategy so that we can deploy to scale? Yeah, so let me rephrase it so that I understand. So the question is about, are we trying to develop our application in certain manner so that it becomes easier to deploy? Okay, at scale, right? So the answer is no, we are nowhere near that yet, right? Point in time we are still scratching the surface of the scalability, right? So if you look at our entire deployment it's just a few servers at this point in time. We are not using like hundreds of servers, thousands of servers and so on. So scaling, the deployment itself has not been a challenge for us yet, right? But going forward I'm sure it will be, right? So that's where Gaurav talked about. So I can just add a little color in it. Snapdeal, we do have a very large scale and scaling is one of the primary use case that we are trying to show and that's why the service oriented architecture that I talked about takes care of it by separating your application and your actual deployment of what service it needs to scale. For example, you can start with just a two node database server and as your traffic ramps up, the infrastructure will add a slave transparently because you're not talking to a master or to a particular slave server, you're talking to an endpoint which is doing load balancing for you. So in the back end you can continue to scale up or scale down as per your need changes. That's what the architecture provides you and it provides you that at all layer, at the database layer, at the caching layer and so on and so forth. Yes, so in the back end we are designed for horizontal scale that is why most of the components are front ended by some sort of a proxy server or something so that we can transparently add and delete things and also do maintenance. I mean if you need to bring down a server, you do not have to inculcate any downtime because that is given to you as a requirement from day one.