 My name is Rajesh, I'm the co-founder and CEO of Redmart and here my colleague Surya is the DevOps lead who will actually talk more about the actual testing process itself. I'm here to just give you a bit of intro. And we also have our release engineer who is also taking care of the parts of the DevOps as well. Okay, we are here to talk about how we are actually testing microservices. We just started actually breaking up our monoliths into multiple microservices. We already have about 30 plus services and all of them are actually AWS services. We're just here to talk about how we are doing it, get some feedback, improve it, see how, where else we can actually go from here. This is the real agenda. There's been a tough journey testing over what we are really doing right now. Just a brief about Redmart. It was founded in 2011 in Singapore and we have raised about 70 million. That's the latest round we've got recently through several rounds. We have about 500 employees. Obviously there's a lot of blue collar here because drivers and pickers and a lot of 60 plus engineers. And we rely heavily on tech operations as well as dispatch and besides the apps that you might already be aware of. So this talk is partly to show what else we are using tech. Just briefly, as I said, we started out as a single monolith. Most of you might have already done the same thing. It's basically the typical startup journey. So we started with a monolith in July 2013 and we started growing 15, 20% month over month. It was good enough to actually grow along with it. But along with it also we started growing the team as well. Now we have like 60, back then it was just 4 or 6 engineers working on this. Monolith was fine, we could easily deploy and all that stuff. But it became harder and harder to make releases as we moved on. So that's when we started actually making it up into smaller pieces. This is a known story, right? I'm just repeating the same thing again. So that's when we actually started doing it back in 2014. But the problem as you know with microservices is that if your tooling is not good enough, it acts a lot of for it to basically test, deploy, maintain microservices. You have instead of one big monolith you have like 30, 50 or 100 services. How do you actually do that? So that's how we actually deal with it. That's a big thing we were trying to solve. So yeah, this is basically our story and a quick intro about our setup. Everything quite standard, you know, nothing out of the world. GitHub Travis has three chefs on the infrastructure side. And on the dev, you know, in terms of development, whether it's on the boss or the test setup itself, it's a very grand-based setup where anybody coming in is basically ranked up pretty quickly with all the necessary toolings with Wagner. And we recently switched to GitHub Flow. We used to use GitHub Flow. We slowly switched to, you know, GitHub Flow. We are almost through with it. And we are also actually adding, you know, so-called queue into the stack as well as the antenna. You can see the code quality and all those things. So it's been quite some time we have used SodaQ. Now it has a lot of good plugins. If you're not used it yet, you should actually go and try it out. It basically checks most of the code quality issues. And it also gives a lot of feedback as well directly to the pull requests. So that's what we are excited about. And very briefly, this is our infrastructure. We have three different stages. One is the dev or test, which we call as alpha. And there's a pre-prod, which is called beta. It's a miss moment. And the other one is the prod, which everybody else sees. We have around 142 EC2 instances. We are 100% on the AWS, including our third party services. Everything runs on AWS itself. So 140 instances and 30 plus microservices with front-end apps and all those things. So you can see the amount of complexity. So we use Chef to manage our entire infrastructure. So you might have learned this term code infrastructure as code. So Chef is basically the CM tool we are using for that. You could be using Puppet or something else, but it doesn't matter. But in our case, we are using Chef. And we also talked about how we are doing this entire CI CD in the same Meetup sometime back. It has worked out quite well. But till we got into microservices with loads of different services, multiple instances of each service, we wanted to actually do something more cost-effectively. Instead of having a separate setup for every team, we wanted to actually carve out a small setup within the existing environment. That's what we are going to talk about. This is one last thing. So this is the regular stuff. We are using Frisbee and Mocha for in-point testing of our APIs. And web apps mostly Selenium and mobile apps is Columbash. So we will mostly be touching the setup for end-to-end testing itself. That's basically what we are here for. Okay. This is our setup. I let our devops take over. This is the man who actually built this. Okay. Thank you. My name is Surya. So, yeah, I was hired as the first full-time devops in Redmart. Yeah, so basically, as you know, like Rajesh has mentioned, we actually moved away from the big monolithic app itself into microservices. But all this kind of migration will not happen overnight, right? Because it's such a big thing, you know. And then we have, like, all the resource constraints and all that. So we are slowly moving. And then, yeah, so basically the big monolithic app that we have previously is called this, in this layer, which is our API. It used to be very thick. It processes a lot of things there. But we are slowly stripping away all these functionalities from this big app into smaller services. So on the front end, we have EngineX. And from there, then it goes direct. So some of the services that are still handled by the API, which we do not have the microservice yet, will still go to the API box itself, this layer. Some that we already have the microservices implemented. It will actually go direct to the services. So all this is done through the help of the EngineX and the HAProxy routing itself. So you can imagine how intensive actually we make use of the routing in EngineX and HAProxy. So the thing about microservices is that once you have this, and you heard just now we have hundreds of instances running, the issue is that how do we actually get down to the testing, right? So let's say you want to implement certain new features, let's say for only one service. How do you actually make sure that this testing, you are able to replicate as close as possible to the alpha and the production environment without actually bringing up all the other services all over again? We could have the developers themselves setting up the services on their local machines, but it will be still not close to what we have in the alpha and production. So with our feature testing, we actually built something that allows us to actually just plug a new service that is implementing new features, whereas the rest we can reuse what we already have in the alpha environment. So it means that it allows us to also do concurrent testing because a lot of people will be doing a lot of testing. Some of them are testing on the stable alpha environment, which should not be disrupted when a new feature is being implemented and tested. But at the same time, the new feature, we don't want to cross over or disrupt or mess around with the stable testing environment. So what happens is that with the help of the Game Engine X and the HA Proxy, we actually are able to do that. So how it goes is that the stable alpha, we have the alpha.redmont.com, but when we have a new feature, we will call it the feature.api.alpha.redmont.com. And then we added the feature-based routing over here. So for alpha, we usually call it just api.alpha.redmont.com, that hits our alpha box there. Whereas for the feature testing, if we need the API, because the API early is still one of the critical components, so any new features usually will require some change even though it might just be config change and all that. So we will create this and we'll call it the feature.exe.api.alpha.redmont.com. So everything is actually very similar to what we have in the production except that the URL name itself is different. So with this environment, what we are able to achieve is that the developers, they are able to actually work in a feature branch without touching the alpha branch. Previously, we were in the Git flow, we have the developer and the master. So without touching the developer branch, they can actually actively working on the feature branch. And at the same time, they actually get to deploy in a very environment that is very close to the alpha. So until they are ready, then they will push the code to developer. And then to accomplish this, we actually have to do some changes in the nature proxy, which I'm going to show a little later. So just to show you how the flow is like. So this is the normal alpha environment without the feature testing in case. So how it goes the route is that it is marked in green. So all the green boxes marked here, they are actually what we call the alpha boxes. So these boxes are running on the developer, the master branch or the Git flow. So if let's say you need to call the order service, like for example, we have the order service here. What it does is actually process the orders that are coming in right. So it goes through, this used to be part of the API, but we have actually split it out and make it into a microservice. So if you want to hit this, instead of going through the API, you will go through the hf proxy direct and go through the order service. Whereas for those services that are still in the current API, we will just go here direct. And next one. Then we have the feature testing. So let's say we call it like feature X. This will be the route itself. So let's say the changes that we are making is only specific to this order service. And then we have probably some minor or conflict changes in the API. So for this feature X, let's say we require these two services to be modified. So what happens is that how we hit the API, we just append the feature X in the URL itself. So in the engine X, we will see if there is this feature in the part itself, we will direct it to the API that is for this feature. Whereas for the other services that doesn't have this feature in the part, we will direct them direct to the stable of our boxes. I mean it will become clearer when we go through the next slides. So as you can see, we have two concurrent features being implemented. So feature X, feature Y. So if you want to hit feature X without bothering about feature Y, you can do that. And then if somebody is working on feature Y, they can also do that. This will be the part. So all the other services that are not affected, it will just go to the green boxes. Whereas all the services that are affected will go to the orange boxes. So one thing also you can see here in the part itself, previously, I mean in the normal R5 environment, we will just use a version when we go to the HAProxy. So a version X, Y, Z for example. And then we make use of this part-based routing in the HAProxy to actually determine, let's say there's this feature Y in the part itself, the HAProxy will know that it has to direct to this box instead of the normal other service. So you can see that this is where HAProxy is picking up. Can you see there's no version there? Yeah, because this is for feature implementation, right? So we do away with the version. Whereas the version that we have in alpha is actually the same as what we have in the production. So since this is a feature, we'll just make use of it. So there's no version because it's still being built? Yeah, so later on we'll start adding it. So once this is tested, there will merge from this feature branch because this is still in the feature branch. We'll push this into the developer or the master branch. And then from there, once it is released, then we'll take it with the version. Okay, so how did we do this? Basically, we have created several pools for the developers to access this when they need to create a new feature environment. So other than that, we will have the engine exchange, HAProxy. And we realized that even though we have all this, if it is too complicated, it is hard to get everyone to move fast into this. So what we did is, essentially, we have built a tool basically to automate all this process. Later on, we can see what the processes that are involved and what the tools takes care of. So what we have right now, we have three server pools. We have the front-end team. And the other back-end team. At each pool, we have about five to ten server pools. So the reason why we come up with server pools is that we want to assign each team a certain responsibility because we want each team to be able to just create new instances without limit. So with these server pools, actually, they share within the team. So the problem is that when people build something, after they are done with it, sometimes they might forget for whatever reason. The cleaning up part is usually quite hard to get everyone to do that. So with these, actually, they take their responsibility to make sure that, you know, because if you use for certain features and you don't clean it up, the other teams or the new features are not going to be able to deploy because all the servers inside the pools have been used. So the other thing is also in terms of security-wise, because we have to consider the fact that AWS security, so we don't want everyone to be able to create the instances themselves. So what happened is that with these server pools, actually, we can start and stop. When they are in use, we will start this server. It means that the developers will only have access to run and start the instances. So when they're done with the testing, then they can stop these instances. So what are the engine exchanges? So like I showed earlier, for the normal alpha, we have the version as part of the route. Whereas for the feature testing, we have this feature name as part of the URL plan. So these are, as you can see, it's not a big change. It's just a minor change for this. And then on the HR proxy side, we make use of the path-based routing in HR proxy. So what it does is, I think this one is most of you will be familiar with, but this one, I think not that many people are making use of it right now. So what we can do is actually with HR routing, you can specify a certain path in the URL, and then we activate this ACL on it. So it means that if in the URL, you don't provide that path itself, this ACL will not be activated, and it will just go into the default backend upstream. Whereas if you have this, and then this ACL will be activated, and then you will go to the feature backend which is specified. So what are the things that's going on? As you can see, there's a lot of things that's going on here. So how does the process start? So first of all, the developers themselves, they will have to create the feature branch in their GitHub repo. So it begins with this, and then what happens next is that we need to create a separate role because we have everything implemented like all those CI and CD. So if you are not careful, actually when we push something to the developer branch, actually without clear separation in terms of the role itself, some of those features, the servers that is being used for this feature testing, they might get affected. So we separate that with this role. So in Chef itself, we create a new role just for this feature. And then after that, we have to go into the server pools and check which server is actually currently free, not in use, and then we'll pick it up. The next thing it has to do is actually it has to start the server. So like I said earlier, we have the server pools. When they are not in use, the servers will be in stock state. So as long as we find one server that is free, we'll pick it up and then we'll start that service. After starting that, easy to instance, then we can do the bootstrapping. This is the Chef bootstrapping. So as you remember that the first step, they created the feature branch. It means that when we want to deploy into this server, we need to pick up the artifacts also from that feature branch. We want this server actually to pick up from the alpha branch. So Chef, we do it through Chef. After that is done, actually there is some other steps that needs to be done because our HAProxy and the EngineX are also managed via Chef. So what happens is that we have to upload the data back that is being used for HAProxy and EngineX. So that HAProxy will pick up for this server and for this service and this feature, where shall I go to? And the same for the EngineX. For this feature route, where should I direct it? So after we pick up the server, we bootstrap it, we have to still upload this and then here. So as you can see, there's a lot of steps involved. We have so many steps involved there. And then we have so many systems, so many services, and then the teams are growing so fast. And then not only that, we have for each service, we might have multiple features being implemented for a specific service itself. Once or once is that we have so many deadlines. Everybody will just be wanting to push their features to their production as soon as possible. So how did we do it? Yeah, so we make life simple for the developer. If you see the things that are in red and green, red means that that is the only step that the developers have to take care of. So they will just have to worry about the GitHub repo and make sure that they have created the feature branch. And then they will just run this tool and then the tool will actually do all the magic and kind of do the rest. Okay, so before we move on, maybe you guys have any questions? I have a question for you. Do you usually need to reload or something like that? Do you use anything like newer on Engine X to handle that reloading or do you just reload the whole Engine X process? The reloading of the Engine X itself? So adding something to the API directly. Usually you need to reload your backup and configuration changes. Yes. So you reload or something like that. Do you know how well Engine X will get everything to do the routing? No, we will reload the Engine X directly. The configuration itself is working? Yes, yes. The config itself is done like integrated through the shaft and all. That's why we actually need to, once we make the changes, we have to register this server into the Engine Proxy and the Engine X itself. So all these are done through the data bank itself. And as you can see, we'll just automatically update the Engine Proxy in the Engine X. Basically everything is like a bootstrap for our chief. So everything in the green. Once it's triggered, detected in the GitHub, it's not creating all these steps. Yes. Bootstraping environment for the user test. So basically once they have made sure that the feature branch is present, and they have to make sure that the build is successful and it's uploaded to the artifacts repo where it is in the AWS S3. So they have to make sure that this repo is there. After that, then they can run these tools just to create the feature test environment. So this is almost like the cycle. The cycle? The new feature has not been shown in the production. You mean from feature to production? From feature to production, it depends on the complexity of the feature as well, right? It's a simple change break. What happens for the active sessions? If it is just a simple change, it usually takes very fast, probably just a few days. Actually, all of it, I think that's the question. Yeah, from the rollout. So if there's nothing major that affects the, you know, the, especially like payment, logics and all that, it usually takes like a few days to maybe one to two weeks. But certain big features, it may take longer. Yeah. That is into it. But the question is just the rollout part. No. What happens to the active sessions, obviously? For example, in the production, they say it's a similar problem that we are also considering. Yeah. They say users are active on the line. I'm rolling over to Chinese. Good. Is it right now? Yeah. It takes about 2.3 seconds to 3 seconds. Yeah. Yeah, I mean, in our case, most of the services are stateless. That means I don't care that I come through instance one or instance 20. You know, we can actually stop it. But what we do have is that for critical services, like for example, payment processing, and a couple of critical order processing systems, we do have a graceful shutdown. That means that before we actually do one instance at a time, we are actually not going to go back once. So graceful shutdown in the first instance, and then deploy a new feature and then slow it down. Yeah. Because when we were doing the instance, we had to shut down the instance. At service level, yes. We do the, we basically run pseudo-shut client, which basically brings the entire artifact into the service level itself after it has graceful shutdown. But otherwise, for most of the cases, like you're browsing the catalog or whatever it is, you weren't even talking about the restart instance in the middle of a session, nothing like that. So normally we don't do that. Yeah. In your case, it would be radius one of the options. You could just take the connectivity in the radius or in the cookies, whatever you choose. Even if you restart the service, client doesn't lose connectivity. Yeah. We are actually going to catch you, but still we are going to clean up the cache. Yeah. I think the value of the radius is pretty much in this case, in every kind of case. Correct. We don't use a radius family for that. The session tokens are actually, that's only when you actually get state, what goes, not being state token. So state is basically. There must be somewhere to say that, okay, this is the token and how good I deal with this, whether this is about videos or whatever. And also just to add on, some of those critical services that require this kind of graceful shutdown, you probably have to detach this node from the HAProxy or the EngineX itself. So we can do it through the data back as well. Like from the EngineX, we will just deprecate this server temporarily. So it means that the EngineX knows that this server will be down. So it will not be directing traffic to it. And then we'll see the lock, you know, if there's some more traffic coming in, then we will, yeah, HAProxy. Sorry, that's one quick question. So the start of this, it's not started directly from when they created a branch, or they created a branch, they've done some work, they've implemented a feature, and they've produced an artifact, okay, and that goes up to state. And then they run the tool when it does all the steps and then the ploys, and then read the ploys. But you can obviously do, produce multiple artifacts from the same feature branch as you, for example, produced by version one, version two, version three, et cetera. You've managed your feature branch into a developer to request you. And there was a feature branch like version 0.0.5, that was actually like the final one that implemented the feature and passed all the tests into all the edge cases when they go to merge feature branch A back into development. Yes, that's similar to the questions that were asked as well. So we had this question whether we should actually version the feature branches as well. We kind of decided not to do that to reduce the complexity. So what it means is that if we don't want to roll back or whatever it is, you still have a comment ID. Okay, so when I produce an artifact on a feature branch, I overwrite the artifact in place of the one that I produced before. Yes, feature branch is not going to break anything, so we're kind of liberal on that aspect. But we do have exactly like what you mentioned for the production environment, where we keep five previous instances. If at all the new one is, you know, breaking the production system for one of the reasons we can roll it back. Okay. That's what we're doing. And yeah, maybe just to add on like this feature branch artifacts is going to be different from the one that we have for alpha. So it means that it will not affect the... So you merge, you don't merge the aspect. You don't just point to new aspects. You would merge and then you re-read the aspect. So what happened is that yes, after you merge, then it gets deployed in the alpha. Okay. Yeah. So the idea is to basically not to block the alpha environment, the test environment. So I can test the feature for weeks in a row without actually blocking the alpha environment itself. So when I'm really happy about the feature, all the services are tested fully. Only at that point I would actually go and budget it to the developer branch. And keeping in mind that when we say testing in alpha, it is not just the developers. All the consumers, teams and all that, when they are testing like new product, new catalog and all that, they will be doing that in that alpha. So we have to make sure that that alpha is always good, it's not broken. Are you using the same thing for ABA testing as well or the same kind of setup? Good question. Yeah, actually we could potentially use that and also the Google's tool as well. But in production we are actually using the side-spec. Side-spec is a commercial tool. We are using that for ABA testing. Is something like optimized need? Yeah. I do handle if two services that have made the talk to each other both need changes that are simultaneous. So... Two services. For the same feature. So it says the Earth need to change. For the same feature, right? Yes, it will be the same feature. There will be another... Which we have in this case, right? Let's say in this case we illustrate the case where you need to change the API itself and then the other services. So it's already like two services. So it could be just at the service level itself? Yeah. Of course like on this level, they could have probably other than other services that are being changed for this feature X. And then how they access each other is actually this you are more proud to get one. Is this one... one year retail? Or then multiple year retail? It's very... We have... We have hundreds of reports. Right, okay. Okay, so if I do... If I do... I just make sure the one feature branch is labeled the same so I'm making a change to the easiest three of your reports. They call it feature Y. One feature Y and one feature Y in the other. True. That's the only requirement. And then in this case... Good point, yeah. That's the main requirement. If I'm modifying something and Sura is modifying something else for the same feature, they have a basically... That's true. Actually, we have to try and make it as simple as possible which we developed the Vagrant tool itself. So the tool itself, if I'm the only one who is creating the environment for them, it's very easy. But then the thing gets complicated. When you have... You want to put these tools into the hands of the developers because you have to make sure they're able to set up these tools on their machines and, you know, I just had like a few bad experiences. Like, you know, one guy, he had a chef environment working there but not in AWS, you know. And the other guy had some other problems. To mitigate that, actually, we decided that to make it as simple as possible for them. So we created this Vagrant template. So what they have to do is just to download this Vagrant template itself and then they just run an init script. That script will actually create the Vagrant machines and also all the necessary setup inside that machine itself. So that includes pooling like the Git repo that we have for, you know, because, you know, all these we manage through Chef. So we have to make sure that every developer they are in sync. If let's say two guys, you know, they are working and we don't like the first guy to pick up this server and the second guy picks up the same server, right? How do we make sure that that does not happen? So we make it as simple as possible before they run the script. We will do the automatic Git pool and then once Vagrant is updated, we will automatically push it to Git and they don't have to worry about all these things. So this is the exact steps that is required if you are interested. We can, you know, later talk more about it. Yeah, basically. So after the testing is completed, like I said, the key thing is how do we make sure that they are able to clean it up themselves without, you know, every time. Because you imagine the number of teams that we have the number of developers that we have if everyone, you know, comes bugging for this, you know. Yeah, so what we do is actually we have even like, first of all you have to clean up that easy to instance, right? Because it was used for the other microservice. We have to make sure that, you know, the artifacts, the deployment there are cleaned up so that when you deploy a new service, it doesn't get, you know, messed up. And then after that we want to put the easy to instances to stop state and then return the server back into the server pool so that they can be reused. And then after that we have to make sure that those route things are cleaned up because otherwise our engine X config will just grow and the cool thing is that we are able to do that with a single click. Too bad we didn't have the time to prepare the demo today. Otherwise we would have yeah. So you do destroy the instances where they just put them, just shut them down? Yes. Because like I said earlier if we could like destroy them and create each time but then you have to give this permission right to each of the developers because they have to run it. We don't want to. So you deploy multiple services for an instance? No. One service for an instance? Yes. So why does the server pour your microservices last to what they developed? No, one service will be one easy to instance. Mostly they are microservices. Well you mentioned microservices for example we use Docker or we are using the easy to T2 micro instance. Okay. But it will be more efficient to simulate for example the services to bring to the libraries like Docker. Yeah. That's very useful. Without that you know. We are moving to that soon. Yeah. Because like you said we realize that a lot of these micro instances, even though they are just micro, but then the utilization is actually very low. So it makes sense actually to put multiple services in one box but we want to make sure that to the developers it's just like one box clean, not just one box is shared for multiple services. Yeah. We'll get into that soon. So the things that is still I mean we have everything fully automated right now. Most of the things we are able to automate but there are still like a number of things that we are still not able to do yet. Like for example like the queue dependent services like for example certain services they pick up messages from the queue. We want to make sure that this when we deploy the environment, they are not picking the messages that is meant for the stable alpha. Yeah. Otherwise you know because the logic might have changed within the new the new feature implementation itself and it might disrupt the other flows. So that itself we do not automate yet. But what we have been doing is actually for these kind of services, we require them to create a separate queue each time for this specific feature implementations and we could have chosen between like a separate queue or the routing key but for now I think mostly we will go with the separate queue. So once they are done with the future testing, they will have to clean up this queue as well. And then the versioning itself is something that Aslam is currently working at it. So basically we are moving into the GitHub flow and we want to have a central versioning system so that in any case for a variety of reasons. Somebody who just joined they want to see like what are the services there and what are the versions there and not only that, it's like when we have something come apart or something then we want to see which rollback to which version and all that. It will be easier but to be able to do that to integrate that with the CI and CD that we have is not straightforward method. So we are currently working on that and then we have the scheduled jobs. So like certain services just run as a scheduler. So that also is something that we still have not been able to automate. So if you guys are interested, yeah. How do you handle database migrations if a feature involved needs database migrations? How do you handle database changes? Exactly. That is a very good question. And there is something that if we don't plan it properly actually, especially with the large number of microservices, a lot of things can go wrong because like certain services if you upgrade, because like if you upgrade the database you will have to upgrade the driver right and then the data format and all that. And if you are not careful about doing that, certain services you might have upgraded the driver. Some others are still using the old driver. Then there could be a lot of compatibility issues. So what happened is that ideally this upgrade itself you can call it like a feature. Like so like you are upgrading let's say Mongo. You can create a feature called Mongo Upgrade. So all the services you can just have the driver upgraded into that branch itself and then it will not have any impact on the offer environment. So I was thinking of something simple like a schema change. The same way feature needs a schema change in some kind of back end database store. Do you deploy a whole new database store with that schema change as part of the feature burn feature in the process or how does that work? In our case it's this is a more data migration. So we are using Mongo to get the back end. So it's the environment of the structure. You have similar problems right because if you've got a say you want to change some of your services will lie in a certain format and effectively apply a migration to your objects. True. If it is a breaking migration we make sure that there is a data fix before we make any changes. That's one way we are going to be able to come in the past. The other way is that we are also done in a certain way because if you have a collection which is not touched all the time there's no point in actually going through this migration for millions of records. So what we do is the other way of migrating that is like what we are going to test. So just convert it on the way out we convert it on the way in but leave the things down. I was thinking of doing that. I was wondering if other people could do that and it was same. They are all the same. It's a lazy memory. Compatible with like your latest codes. I'll commit the change to this thing as well as the collection as well. We don't actually just combine our programs and get back to the clients. What we also do is that there is a need to change. For example, we change the password from simple md5 to a shower. We take both for a short while and once we see that if the key does not exist we create a shower and hash and then we need a free response. So that's what we do. That's a classic case. We can't even do migration in that case. Because in that case we can do only on the fly. Because we don't show the passcode only when you log in next I have to do on the fly and then keep it. So that's one way to do it as well. That's correct. Can the database changes? The database itself can be exposed as well. The way we are doing it like in terms of services, we just have a couple of types of micro-services. One is a current data service which actually owns a particular entity. So the way we are trying to do is that for example order is a pretty key entity in our service so we don't want to actually any service that changes that entity any of the entities in that collection. So for that we make sure that there is a current service which is the only way you can go on. So that's our industry. My vision may not be really precisely. We can do a lot. We can have a washing place. We can do that because of this country. Any production? That the base you use is you take sample from production in all the environments or it's a copy of production. It's the other way at all but we actually mask away all the sensitive management. There's PDTA as you guys know which went active last year. We don't want everybody to be accessing the customer email phone numbers and all this stuff. So what we do is basically mask all this information. It is quite close. We do this on a regular basis. If you have a different version you can have a different version of the customer service. You don't even have to have a version number in the API. If you have an API you map to a version and store them there so that every request you have to know which version this API is from for the API which version this one is from. It is definitely possible and in fact that's a good point and we didn't talk about the one part as the other person mentioned it's about containers. One other thing which we are actually looking at is service discovery. I don't have to actually go through this entire layer and do the routing we are looking at point-to-point communication but as of today we basically did the routing and made sure that we basically we use most part of it 95% of what is there in alpha and just create new small instances. It is very cost effective That's all. I want to ask how are you talking about sharing sharing testing model Yeah, I think we are just using the same one for clusters for the future tests So when you do a clean up you will also clean up your data Sorry So when you do a clean up so getting this feature is done Will you do a data clean up also or just leave the data in the model you can test? It is a breaking change Yeah, we do that only if it is a breaking change like in our particular data and as people watch it doesn't actually feature the database for that or all the testing sharing the same database And the thing is with MongoDB we are just using mostly the basic driver which is just a dictionary If I have one extra field or less field the application doesn't break most of the time We had very few instances where we had to make sure that there is a data migration in place Password was a good example and we changed the strength of the hashing and there was one other thing for payment but otherwise it is fairly lenient I would say It is a dictionary I am just wondering so you have two feature one is theory the other one is fashion okay at least in this tool we don't really care about the possible race conditions that actually applies in the Q case which we were mentioning what used to happen is that the stable development branch from there we are actually creating orders for the rabbit MQ the feature would have picked it up and vice versa so there used to be confusion where did my auto go and it is breaking or whatever it is that's the part which we are actually trying to do it either with routing Q we can rabbit MQ which is part of the MQ or we could potentially use a separate Q itself but otherwise we can do a feature proxy and all those orders and it is like 450 people have to service this and 50 people and just free of the yeah we could have done that as well the others it is not lot of money actually if you see are you talking about money in this case we just have two HF proxies it is actually quite straight forward if you look at the changes that started we didn't want to actually I mean use the row 53 to do that instead there is actually a very valid question you know being one that you wanted to do in this recently I have started using row 15 yeah you should continue to use that but there is a better way to do that that is what we are talking about I am sure you guys must have heard of something called zookeeper we are I have played around with but there is one more thing called consul that is what we are actually experimenting on as you might have seen it gives like cross data center support and also it has the same things what you need like what is set down in our DNS management wise it is a lot better than actually doing it through row 15 and you also get distributed configuration management and health checks so it is like zookeeper plus I would say so it has a lot of things and also if you have used zookeeper especially as a newbie it is quite hard to set up this whole thing so there is not like that you get the same kind of functionality but it is much easier to manage that is what we are actually heading towards once we do that I agree totally with you and as I said just a while ago we won't need this setup but what we will still have is some of the concepts which we used here where we will use n minus 1 services from the alpha and just one new instance we will be creating that way it is still cost effective in terms of the number of services we are running that's what we will do so the whole layer of whatever you saw health check proxy and all those things will be gone eventually I have a question you have mentioned on what servers are serving in the infrastructure of your system is there any logic that we have developed to do this? good question there are two things one is we have a health check we are using a tool called cabot it is C-A-B-O-T cabot it is an open source tool which is pretty good then it is slightly better than what you get out of cloud watch and you can test that so that is one tool we are using and the second part sorry what was that the one you mentioned or scaling we have not done that yet so one thing we are doing if we get some time Q3 are definitely for Q4 is auto scaling the good thing is that we already have the necessary recipes to bring up and bring down the instances all we need to do is hook up with auto scaling groups that AWS provides and that is a fairly easy thing to do from here of course I want to ask so you are using what is the instance so you have chat boxes it is expansive chat is open source it is free form but isn't chat server no it is it is so high yeah we do but it is not enterprise so it is yeah it is chat server Q4 is known for the enterprise but not cause of like, if you are launching a bunch of very hard to justify the problems yeah I totally agree it was the same argument for new relic also it is even more expensive so that is why we started actually building something with cloud watch and also we are relying on a new tool called sensu you mentioned that so that is also pretty good we used to use natuos and other things in the past so sensu is basically a new kid on the block that is what we are using for monitoring but it will be more simple for example to use immutability optimization as a doper just bootstrap in the environment which can be bootstrap in 5 minutes we test it that is a very good question as we discussed earlier we would like to get there containerized the whole thing but we do not want to rush into that yet this is working fine I totally agree we are just taking one step at a time we wanted to automate this our team suddenly grew we wanted to see how to actually develop for the team so this was the first step second step as we mentioned just a while ago it is to do the service discovery exactly like the gentleman was telling and the next next step would be to use containerized exactly like we have mentioned we could do that that becomes more like a template you mean like all the problems with the men in it we could do that did you use like Docker in production do I use Kubernetes that comes as part of it how big is your cluster the browser but it is still very it is not exactly the whole field is in quite a bit of flux there is a new tool every day something which works today may not work a week later that is why we are rushing into that I know there is a lot of things you could do with that that is why we are kind of doing a waiting game this is working out quite well for us right now and the next step was service discovery that will be pretty good console is a good tool yes we actually have been using this cluster since 2012 it is self-managed the good thing is we hardly manage it we went through two different versions and it was pretty good in other words everything is done ourselves and the other thing is the important thing we are using is the mobile gaming monitoring service that is a free service you should all get it this is provided by the Tengen forks itself it is awesome and you can also purchase a pack up and a one click upgrade and all those things that are possible but we need to purchase that option what we have is a cluster with a replica at least a replica size is like 2 or 3 and we do have one hidden slave which is mostly like an insurance for our data from the day to day access so we use that it is actually with this slave we have a few seconds a few hundred seconds so that if we forgot something goes wrong we can cut that server from the replica side and update that as a master model that is what we have done one other good video question what is the size of the data that you are creating in data we are definitely not talking about being data in the sense of large numbers in terms of transactions we do anywhere between 2000 to 2500 hours a day that is the number of hours we are getting which means that we have multiple number of catalog lookups multiple number of active coordinate lookups and all those stuff that happens so it is just a funnel if you look at the end of the funnel we have about 2500 orders a day any detail about the transaction that you are doing that is a good thing we are using this framework for playing framework for a long time so it entirely consumes any data that we have had any issues in terms of processing as you smile there are issues that you run into no, I love my framework thanks we have been using this since 1.2 most of the folks are actually going into Scala and all those stuff that is where we we have huge traffic spikes like when we have sales or GOS and all that stuff it just works pretty well okay the feedback was good like what could be improving was there any other feedback that we can take which can improve? we didn't talk about the OSS you are using right like the Linux, yes we are using Linux how do you are you provisioning them are you provisioning then storing and then provisioning so we are using the bare-bones Linux the NCMI which is provided they are aware that they have a stripped-off version of CentOS so we just use that and we don't do more than that on top of it we used to prepare such instances like a stripped-off version of the kernel and all that stuff but these days we just use off-the-shelf instance yeah we update them yeah how do you do it it's done it's a good question we don't do it regularly when somebody is deploying a new feature for example they find that 48 updates to be done they basically do it we haven't automated that part yet that's a good question do you automate it already so what we do is we have we use Packer we take the OSS we take the CentOS 7 we have current types of OSS we want like the database web or whatever but on top of that we run unseeable jobs which we'll probably know that but with a bare minimum then we store the MI inside the MI store then we are experimenting with this thing called Terraform Terraform yes we use Terraform to spin the web stack and then we have a second set of unseeable programming but more configurations so we are close to the unseeable strategy which is we don't update the instance we don't do young and whatever we just trash the instance and we create a new page and then we run that's one way we have been doing for certain cases just trash the instance and then create a new one we text a lot of things but in easy to it's not that it's not a lot of time right it's pretty easy to spin a new instance the good thing is when a new one comes it's already up to the patch level we would like it to be you can test that before running out you can test it to make sure yes true and using Terraform it's something that could be different to you you can be doubly sure that right from the OS level it's working fine in our case it's just Java application what is so much about OS level I'll change that when you're spinning up a bunch of let's say you have three sets in a feature how do you get the order right how do you make sure they're all about three styles like if one needs to connect to another it's not so do you just have time or something else yeah I mean we haven't done that time I said that's what they wanted to do but it's done manually like what we wanted to do is another thing which we wanted to use is also feature switches so a feature called so we want to use it to first of all deploy a little bit of feature itself but also use the same thing for you know staggered over so I can prepare an instance I have a feature but I work very well it's only until it applies which is like I can say moments notice that's what we're doing so how do you make sure that the planets are not moving which is for the shaft configuration now we are doing based on test set we should be yeah I agree totally agree any process can be agree okay do you test your channel so do you use any like service app or is it like channel does it include kitchen or just like really honest we are not doing that actually the biggest advantage of using a CM tool is exactly like what you said I can version the entire infrastructure and test it to be honest we are not done we tested we tried it very early on in credit typing we know that tools like this and the necessary configuration of the tools are available we didn't predict the production are you using it so I broke both so I'm cleaning up this service app okay it's just a simple thing once I finish I run the disco book is that process exist in the in the system at the moment yeah it's simple stuff that's a big advantage that CM tool like Chef thank you anything else good question we had one engineer for about 6 months he built that ancient Chef infrastructure and all that stuff and he wanted to become the product manager so I lost the works engineer after that it's only like 3-4 months ago we hired the new one so this is one person there's one other person joining next week so it's two and the aslam joined a couple of weeks ago so he's taking care of the release aspects so it's about 3 right now it's about 60 plus 60 plus developers the thing is we have changed quite a few quite a lot of things we have a lot of new features it's not just features we are also building a lot of different things more than what you're seeing on the website so there are a lot of different projects we are trying so that means we have a lot of new features it is quite a tricky situation to be in that's what we are covering on this lens so psychologically yeah it is doing okay actually this is what we try to power and infer the good thing is something like there's a simple tool that will change that whole thing usually it used to be like and there's always a message coming to devops saying that I need this so we automated some of those things as well we use IFTTT so basically somebody wants anything it just goes into travel board and we will do it once or twice a day instead of actually jumping on it minimizes the context that's one thing we have already done and some of these simple tools though it looks simple we have a lot of stuff in front so that's another thing about that as well so how do you do it by planning yes true what we also have is the git comments also come to Slack itself so if at all I want to comment I can just comment there I normally watch the Slack channel for any kind of comment it's interesting once I use it Slack we are using quite heavily whether it's for this purpose or devops requests or anything from Patrick that's good thanks for all the feedback and the questions and definitely make us know this better as I said this is just the first step in what we have been planning and we just didn't want to jump into most of the shiny stuff that we see there there's Docker or anything else the best thing I mean the most important thing is looking for layers by Q4 as console and if time comes we would like to share and if somebody else is sharing as well as this story we would like to definitely come in and listen to you there thank you I would say that there was a drink downstairs if you like it's half past eight so half an hour we can have a discussion if you like here or downstairs thanks for telling us we could do anything