 Hi everyone, my name is Surya and my colleague Aslam. Today, before we start the presentation, let me just ask you guys a few questions. First, how many of you guys here are developers? Okay, how many of you guys feel that right now what you're doing? You're taking too long before you make any small change and make its way into production. Okay, and how many of you guys feel that right now you can be more productive if you don't have to deal with a lot of team conflicts or merge conflicts and all those things? Okay, cool. So it means that I'm in the right room and you guys are the same. So today we are going to talk about not the perfect, we are not going to define what is a perfect CI-CD process but instead what we are going to share today is the lessons that we have learned when we built the CI-CD pipeline at Redmart that is right now supporting the high-velocity software development pipeline that we have in Redmart. Okay, so rather than talking about theories and best practices, today we are going to show you instead a demo of what it takes. So let's say you start from an idea, how does it flow throughout the pipeline and all the way to the production and you see it live. So we have been scratching our head in the past few days to think about what kind of things we could put up for the demo so that we can show it is really making its way from the beginning right all the way to the production without breaking anything. Okay, so this is just a simple hypothetical story that we've created. Let's say, you know, for whatever purpose, we find that the greeting lock-in that you see here when you lock-in to the Redmart.com, we want to do something about it, we want to change that information. So right now what you get when you lock-in is the email ID that you use for logging in right. So the estimated time let's say for this story is one hour and what you're going to see today is the three workflow, three parts of the pipeline, which is the feature development stage and then how it actually goes into the pre-production stage and all the way to live in production. So I'm going to show you right now what is going to happen. So, you know, due to the time constraint there are certain steps that we have skipped but I'm going to explain what those steps are at the later part of the presentations. So we have created this feature a branch for today's demo. So what I'm going to show you is that this feature environment is now live for this feature test environment and we're going to see that, you know, right now as you can see this is still showing me the email ID that I used to lock-in, right. And what we're going to do next is to make sure that we are on the right branch here and next step is we'll make the changes. Okay, it's on this line that we're going to insert this simple logic, okay. And one important part, you know, that gets people freaked out usually is how do I test any change that I make, right. So here today we're going to show you how that unit testing is done for the feature that we are implementing. Okay, so we are done with all the changes that we need. Now let's push that. Okay, yep, it's right now going into GitHub where it triggers the CI CD pipeline. While the build is happening, we're going to move on and then we'll come back to look at what exactly happens there. So just to give you a key summary of the tech organizations in Redmart, what does it look like currently. So first is that we have 80 plus engineering team that is spread across two geographical locations in Singapore and Bangalore. What is interesting is that out of these 80 plus engineers, they are split into 20 plus autonomous teams. One of the key challenge that we face when we are scaling up and I'm sure that any organizations will face when they are going through that scaling up phase is that out of these 20 plus teams, they are not formed all at once, right. So they are slowly being added and all that. So one of the key issue there is different teams will be at the different maturity level, whether it is in terms of the team cohesion itself, whether it is the knowledge that they have about the whole infrastructure and all that. So when designing a CI CD pipeline, that is I think one of the most important thing that we have to pay attention to because we cannot assume that all teams will be able to pick up whatever we have immediately, right. Out of these 20 plus teams, they are working on 100 plus microservices that we have right now. And each of these teams is actually able to move at the speed that they are comfortable with without getting blocked by every other team because we do not want that having more teams actually bring down the productivity instead of increasing the productivity. And then some of the other things is we have four deploy environments to manage it. So the first one is the feature environment. That is where all the feature development, testings and all that are happening. And then we have the pre-production setup. And then what happens in pre-production is actually we have the exact mirror of what is in production. And then we have additional environment called beta. We do not use it often. We usually reserve it for bigger features. And then we have the production environment itself. And then out of these 100 microservices, they are mostly largely based in Java Scala. And there are other languages as well like Node.js, Angular, Python and even Go. So the key there is really how you design the CI CD pipeline that takes care of all these heterogeneous languages that we have. And yeah, right now we have about 20 to 30 rollouts happening every day. So some of the key ingredients that we have in the pipeline is first, I think most of you would have been on board, which is GitHub workflow itself. So instead of maintaining many, many branches, we keep a single master branch. And then we have multiple feature branches. And then we have the second part of the pipeline is the continuous integrations. That is where just now if you saw the code that I demonstrated, for every feature that you implement, you have the corresponding unit testing that comes with it. And we have the code coverage analysis and all that. The third part of that pipeline is the continuous delivery. So what happens in this stage is where the artifacts are getting built and they are properly versioned according to the standard semantic versioning. And then once these artifacts is generated, they get uploaded to the S3 buckets, the corresponding S3 buckets. And the last part of this pipeline is actually continuous deployment. When we talk about continuous deployment, it can vary across different organizations. So here, let's, in our definition, I'm not saying this is the right or wrong definitions, but how do we define continuous deployment is that any code change that you make, it gets deployed to the right environment immediately. And you can even take it all the way, like a simple code change and push it to production directly. There's nothing stopping that. Usually, what comes in between is the business side of things. So usually, you don't want to push things into production before the rest of the organizations is ready with those new features that you are implementing. So once, you know, the business user say, okay, I'm ready to go with it, you know, technically you can just say, I want to deploy that feature into production immediately. Okay. So let's see what happens now with the build. I'm not sure if the build just now has actually completed. It looks like it is still running. So during this process, actually, as you can see here, it is actually running a lot of tasks. And yeah, before even it gets to the point where it can generate, you know, it can generate the artifacts itself. It looks like there has been some delay in the build. Okay. So this build usually takes about 15 minutes to complete. So in the meantime, maybe we can just move on and then we'll take a look back at what happens there. Okay. So this is how a typical software development life cycle happens in Redmart. So it all started with the developer working on the local machine and they do the code changes and everything in the local machine. So what is unique in Redmart infrastructure that we have is developers, even when they are testing locally, they have two options. One is to tap on whatever infrastructure that we have. So without any additional configuration or whatever, when they run the test on their local machine, it can be just one service that they are testing. But the rest of the resources, they are reusing whatever is already there in the rest of the infrastructure. And then once, okay, the second option is actually, you know, they can do a mock on all the resources like, let's say, the database. We're using Mongo and the cache and the messaging queue. So once they feel that, okay, I'm done with some of these changes and then I want to push the code. So what happens here is it will trigger the CICD process that runs all these tests that you see just now running in Travis itself. If something breaks, it will go back to the developer and then they will fix it. But if the whole CICD process is completed successfully, it will automatically get deployed into the feature environment. So one of the things here that is noteworthy is that, you know, instead of developers developing and testing the changes locally, as early as this stage, they can involve the rest of the teams like QA and even the end user, which is the business user in most cases. And then, and also one important thing is that we want to give developers the feeling, the comfortable feeling that, you know, whatever change that they are making, they don't have to worry that it's going to break anything, okay? And they know that it's going to get deployed into a production-like environment without worrying that, you know, somebody is going to come after them and say, you know, screaming at them and for breaking something, right? So that is what our feature environment is created for. They can feel free to commit changes and, you know, break things there. And this is the usual loop. That's why it's thicker there. So as early as this stage, developers can get feedback from the real user itself. And it goes back until, you know, the QA and the stakeholders say, okay, I'm good with those feature changes. Now we are ready to go into production. So the next stage will be, you know, they go into the pre-production environment. This is really a close mirror of what is production. And once they say that, okay, I have tested everything here and you're ready to push this feature to go live. So all that can happen with just a simple script. Let's see. The build doesn't seems, okay. So it looks like it is still taking a bit more time. So maybe I'll pause at this stage. Do you guys have any questions? Yeah, I have a question about your deployment. So you mentioned the artifacts going to repository NS3. And then how do you discover these new artifacts to your pointer? Okay. So that is actually covered in the later part of the slides of the presentations. Just to give that brief answer is that there is locations in our artifacts repository that is dedicated for that feature. Yeah. Can you explain how the feature environment works? Do you have multiple feature environments just for one feature? How many boxes? So the interesting thing about our feature environment setup, which I think is going to be covered soon as well, is that it is just a small delta of what we have in the pre-production environment. But to the real users, they are really seeing as if there are that many feature sites that is live. And the number of instances in the feature environment really depends on how big those features are. It can be just like one instance for, if let's say that feature touches only one service or it can be multiple. It depends on, okay. So I think it's time for that. So just now you guys see that, okay, even here it is showing my email ID, right? Sorry, I'm not Daniel, but yeah, it was showing Daniel. So what you're going to see next is that, yep. So instead of picking that email ID, remember the code change that we did. So we are just adding these few lines of code to say that if that is the, I mean we, for this purpose, we just want to constrain to make sure that it only happens for this user. And we set the first name as Jack Daymo. So at this point we have seen that how it is actually getting deployed in this feature environment without any intervention from the developers. And, oh no, something is, okay. So yeah, going back to that feature environment and all that. So this is how the for deployment environment looks like. So as you can see here on the left most is actually our feature environment. To the real user or rather to the stakeholders themselves, they see that, you know, if you have n number of features, as if there are n numbers of sites. But in actual case, there is only a single site where it is the pre-production environment. So what we have created is a delta for each feature. So let's say feature A, it touches like two services. Feature B, it touches another two or three services. So those are the delta that we call in this feature environment. So the trick there is really to create the feature routing that is smart enough to tell that, hey, you know, for this feature, for this service, I'm supposed to go into where, whether it is the feature instance or is it the pre-production instances. And the same concept we have extended and applied to the beta environment and production environment. So as I mentioned earlier, for beta, we really reserve it for critical features. So for example, like changing the payment gateway. And let's say we need to support Alipay. And the other example is like, let's say we need to change the entire flow of how an order is created. How does it make its way from the front end when somebody plays the order all the way to the back end until it touches our, you know, finances and all that. So those kind of features usually are what we put under the beta sites where we usually let the internal users, Red Martians themselves have a go at it, place real order there. So as you can see, the difference between beta and the pre-production is that the data in pre-production is just test data. What happens in beta is it is real data that it's sharing with the production environment. So again, this beta environment, we are just creating a delta. So let's say for that case, you know, for payment service, you need to change the way the payment is handled. You know, it probably touches a few services like checkout service and payment service. So we just have that delta in here. And it is sharing the real production data with what is there in the production. Okay. Now that we have, so far just know what we have seen is that, okay, sorry, it seems that we have reached only this stage of the workflow. So what we have seen just now from the demo is that it gets deployed in the feature, right? And now the next question is how do we level it up to the pre-production environment, right? Just now we mentioned that after testing is all good. How do we go into that pre-production environment? So I'm going to show you how that happens. So this is done. So we'll just go to the repository. And what we do is we just need to create a PR, okay? So the common practice here is that, okay, create a pull request. Here I'm not assigning this PR to anyone, but usually the common practice that we have is for any code change that is to be merged into master. It has to go through this pull request and at least one or two engineers have to review those changes before allowing those changes to be merged into the master branch, okay? So with that actually, this is just a PR, right? So the next is after the other engineers actually review those changes, what they have to do is to merge the pull request. Usually the same engineers who does the PR, they are not allowed to merge, but for this demo purpose, let's just merge this thing, okay? With that, you know, that sets its way into the pre-production environment because once you merge it and it gets pushed into the master branch, immediately it actually gets deployed into the pre-production environment, which is what is shown here in blue, okay? So just to recap a bit, for this normal situation is that you don't have this beta routing in between. So all the way from here, it makes its way to the production environment. So while, you know, waiting for the build to happen, we have actually prepared a bonus feature. So here is another story. Aslam will explain more about that. So guys, this, because the demo which you just now, Surya explained, that is actually demonstrating our back-end service, like how we are pushing the changes to the back-end service. So we thought it would be good if you can demonstrate for the front-end also. So this demo is just targeting the front-end service. So in our Redmart website, we have this need-help option, and in that, we have this FAQ. So our development team decided that this FAQ is no more required. They already covered this inside this help. So with this demo, what we are going to present is, we are going to push this change, and it will be reflected in our Redmart website immediately. So the estimated effort is actually 0.5, which is mentioned here. And since you guys already gone through this feature workflow and pre-production workflow, which already Surya explained. So I feel like it directly goes to production. So before going to production, I would like to give some... So before actually going to production, I feel like, okay, let me just show you how we... Because this feature is already tested in our feature environment and pre-production environment. So I'll just demonstrate like, okay, if I just go to that feature environment, which actually we call the same jug demo. So this is our feature environment, and in that, you can see this option is not having that FAQ right now. And because this is already propagated to our pre-production environment, so if I just go to alpha.redmart.com, that option is not available here. So now I'm going to promote this feature to production. Yeah, okay, I would like to invite you guys if you just go to the redmart.com. And this option you will see. So if I just go here also to... In this, you can see this FAQ option is there. So I'm going to make a production release. Don't worry. It should be fine. So the way we actually push releases to production is actually just single point of entry, which is actually calling one script. So this is the script which we have actually configured. And this script is actually just to make sure that every team member should follow the same convention whenever they want to push anything to production. And this is common applicable to all the services. So for this, I just... And with this, because we are following the semantic versioning also, so we have to specify what, for example, if I just run this script without any argument, it can specify whether you want to bump up which version type, whether it's a minor, major, or patch. Based on that, it will automatically manage the version. So I'm just bumping the patch. So it is expecting a short summary like what's the purpose of this release. So you can specify that. So here I'm just removing the FAQ option from Meet Help drop down menu. So once you push, then this will automatically do a commit into the service, which is our golden cross-service, and then it triggers the CI CD pipeline. So if I just go to our Travis, I can see the build is triggered. So it will take a bit time to push the changes to production. So in the meantime, I can continue. Triggered for that point. Triggered for what? For this one. Let me just show you. Okay. You want to show that? Yeah. So can you get the volume up? I'm sorry. Because I think you also want to trigger the release in the meantime. Yeah. Because earlier, remember for the member service, the backend service that we have, we have merged to master, right? By right, we should wait until it gets deployed into the pre-production stage before we do the same, like what Aslam did for the front-end service. So, but to cut short, because I don't think we're going to have time if we wait further. So we'll skip that verification. I mean, we'll skip that certification from the QA and all that. We just say that, okay, but I'm still going to show you how it happens in alpha.rapmar.com. But now we'll just do a trigger to production for that member service and praise that nothing breaks. Okay. So I have to first switch branch to master. Okay. I'm going to master branch now. And what I have to do is I have to do a git pool. Remember that we have just merged from that feature branch into the master branch. We'll do a git pool. Okay. So we have the changes that we have pushed earlier. So what we're going to do next is just trigger prod release patch. Okay. Just demo in action. Okay. That's it. And with that, actually you let go of the hook and let it go all the way to the production, which we're going to take a look at it later on. So let me see how it's still going on. Okay. So I think in the meantime, I'll continue for the slide. Okay. Here it looks our CI CD pipeline, what we are following in Redmart. And I think you already saw what we did. The main components, which we have, if you see, like the same Travis for this continuous integration and deployments and AWS for S3 bucket and storage of the build art effect. We are using Chef to get that, identify the specific nodes for getting the deployments to be triggered. And we have Nexus also for sharing library resources which are dependent across the different services. And we are using Sonar cube for code quality check and publishing whatever the results during the builds to the dashboard. A part of this, one of the things which we are actually using is this version manager. This is the, I think this is the place where the guy comes question is the one who is actually responsible for doing all the manipulations of the version, like how the version will be automatically bumped and based on that trigger which we passed from the script. So we, the same like we are following semantic versioning. So that's why we have this major minor and patch and based on that trigger, it is updating that particular one. The artifact is getting generated and that artifact is pushed to our S3 because we are having different buckets for different environments. So because of that, the version is properly updated and artifact is actually going into the respective bucket. Since we did the production release, so the generated artifact will goes into the release bucket. If it is for this pre production, it is going into this master bucket. And if it is for beta environment, it is going into the beta and similarly for the feature. This is how the things is getting done. And because this chef, it is because our infrastructure is automated using this one. So it is actually getting all the list of nodes in which it needs to be deployed. So let me see how the build is. So it's still going on. So I think we are close to finish. Okay. As I explained, the, for example, this is what we just now mentioned. Like for this member service which, which Suri already pushed, we have, this is in the S3. So we have the same like beta feature and master release. And in that, if you can see, we have this artifact with this merge versioning with the semantic versioning. So here you can see like this, this artifact is meant for only master, which is pre production because this artifact is having pre releases. So our convention is like any pre releases will always gets deployed in pre production environment. But if the artifact is having just like major, minor and patch version, then it's get deployed into the production environment. So the CI CD pipeline automatically identifies for which artifact and for which environment we have to trigger the deployment. So I think by now the, the other release is done. Yeah. So release is done. So let me just go to, I think you guys can also try go to the red mud.com. And if you go to this need help, so that option is gone. Okay. So this like, this is one of feature which was planned for today's rollout. So this is how we did actually. So we thought, let's, let's make this as part of our demo. And yeah. So this is how when the bill, this is just one of the sample, like how we are publishing the code quality status to our sonar cube. So you can see just like a sample service, which we have like inbound service for that we have generated. So you can see the coverage details also along with ratings and the issues and those kind of things. So using this, the developers will just analyze and go to that and then do their fixing. Sorry. Okay. So here I'm going to show you like what we did. I think you already saw that how we trigger a production. So when we plan for any production release in red mark, we have this script, which is meant for this purpose. And in this script, we specify two things, which is one is the release type. And the second is the deploy region. But generally because right now it is only for the Singapore. So our services and everything is deployed here. So that's why by default we just take this option as SG. Otherwise, if our services are getting deployed into a different region, we can actually trigger the same release. Gone into both regions at the same time. So that option is actually for internalization, which we have that capability. And as soon as we plan to deploy in different geographies, we can actually move on. But the functionality is available with us. Okay. So when the release is happened, after that we are doing some notifications and updates also. Like how to notify like, okay, the release is done. But some mechanism needs to be in place so that people know that, okay, some release is happened and who did that release, those kind of things. So one mechanism, one is actually the email alerts in which this is just one of the sample. So in this it is mentioned that, okay, just one release is happened with this version. What is the version is released and whatever commit messages or commit history is there as well as the summary what we have typed. So this is just one of the sample. I think if we can get email, I can demonstrate that also what we did for this golden grocer just now. We have a slack channel also. So any production release is getting updated here also so that and it's having all the references. So this references for the Travis bill. There's a summary and this is the version and actual author who did that release. So these kinds of things is updated in this slack channel also. And the last one is update in the GitHub. So when we do the tagging as part of the release, we are actually generating the summary also in that. And the same thing what we are actually getting in the email. The same thing is here also. So it's like all the places we are getting updated. Okay, so I think that's the demo part for that production release for the front end service. I think let's do that to continue. Yeah, usually Travis doesn't take this long to actually complete the build. I don't know what happened today. It seems to take a little bit longer than usual. So but anyway, going back to where we left off. So remember that where we left off just now was how do we not we didn't even go that stage yet. We wanted to see how that features actually happens in the pre production stage of the pipeline. Right. So it means that it is the same alpha environment that Islam was showing just now. So that new feature is going to kick in after I lock out and I lock in again. Okay. Okay. So as you can see here, did you guys see that it is no longer showing the email ID but instead it is showing the message that we modified. Okay. I think it's going to take a few more minutes for the build to complete. I while, you know, waiting for that to happen, maybe I can take one short question. Yeah. Yeah. I mean, that's a very interesting questions. And that is one problem that we end up facing a lot of time. You know, in redmart, what you see in redmart.com is just, you know, the consumer facing apps. Right. But actually to power the entire e-commerce business, what happens is we need to have a lot of internal services, internal apps rather to support all this. Right. So we manage the customer feedback, order problems and all that customer service will have to have the proper dashboards and all that to look at those things. So that is where we face a lot of problems usually because it means that they have to coordinate between front end as well as back end. And to us, it is, we feel that we could add all the automation that we want. But the key there is, you know, the coordination among the team members because there is no level of automation that can guarantee that when somebody is pushing something at the back end that is going to be breaking and the front end is not ready. Right. And there is just to us, at least until this point, maybe we will be proven wrong at some point. But to us, at this point, we still see that as, you know, as the key solution. So the proper coordination has to be done between the front end guys and the back end. Yep. Okay. Has the bill. Okay. Two more minutes. Okay. That itself right now we have the scripts as well. So that trigger product release is able to deploy a specific version that you want. So you have in the buckets all the artifacts, they're not going to get overwritten. They're still going to stay there. So you just specify which versions you want to redeploy and it gets deployed. Do you always follow the same, same flow even in some kind of emergency situation? Like some, some error was in full of environment. Then we'll show you always follow the same flow or you have some other. No, for, for those, I would say like hot fixes are exceptions. It depends on the severity again, but hot fixes correct me if I'm wrong Alejandro. Do you want to add something on that? We might skip the future environment. The hot fixes we directly actually try to go to the master. Like PR directly merge to the master. And then the hot fix is verified into that and then get deployed. Yeah. So for those cases we make the exceptions because like I said, the, all the facilities is there even if they want to release the production at this point of time, any change. It's just that usually they don't follow unless I mean usually they will follow the normal process unless it's critical fixes like that. Yeah. It's possible to share the script. Sorry? Share the script. The script? Yeah. We are working to open source that. We'd love to. Yeah. When we are ready, we'll, yeah, open source it. Okay. It's basically just a shell script that creates a git commit or some source. Yeah. That's one thing. But as I mentioned, it's also about semantic versioning, right? So it takes care of that and also environments where you're deploying it to all those things. Yeah. So good. Not yet. Not yet. Yeah. When you are actually deploying to production, what you're doing is creating a new stack of production or what have you been having? Interesting. Right now, I would say that we, we are not at the ideal level or ideal place that we are working on right now, which is to introduce the Canary releases. Right now, we do not have that. So we deploy on each instance one by one. Currently. Yep. So it builds again. Remember, that's not the PR. After that, you merge, right? And then when we trigger that script, what happens is that it pushes commit with the usage with just a pre-formatted comments on it. And then Travis will trigger the build and then we'll, we'll pick up based on the comments, whether it is supposed to be a production release or not. So there is a certain format. Yes. Yes. Yep. Is it really? No chat. So, yeah, yeah. Any more questions? It's artifact versioning. It's artifact versioning. Nope. So you said, you know, that's why the release can be suppressed by one by one. So in that case, if there is any implication to the release that connected with that global right now, I mean, there is this slim probability that will happen because we, when we do that update, we don't switch off traffic to those instances, right? For critical ones, yes, we do. Like especially those related to payments and all that, we make sure that we stop traffic to that, those instances before, you know, we deploy it. Yeah, but like I said, ideally we want to have all this, but in fact, we do have. Yeah, for the front end services, just know what we demoed actually. In that, what the same process we do, we actually remove the instance from the ELB so that no traffic will come to that instance and then trigger the deployment. And then once the deployment is done, then we attach it back and then remove the other one like that we do. Yep. For the front end services. We have that for our internal app service that, yeah, we detach it from ELB so stop traffic entirely. And then we do the deployment. Once the deployment finish on that nodes and then we will attach that node back into the ELB, turn traffic back on and then detach the other instance and then those are done automatically. Yeah, that's all good. So I'll come back to your questions in a bit. So right now, yeah, you have seen that in redmart.com, you know, I mean in alpha, it was working fine. So what I'm going to show you now is in the production site. Okay. So hopefully nothing breaks. Yeah. So you see now the change is happening in production itself. Right. Okay. I could have applied it to everyone else, but I may get fired after. Okay. So what you have seen today is, you know, we have taken you guys from one example where we have a simple idea on a story that we want to work on how we propagate that story all the way, you know, to the proper test environment for it to get tested and, you know, certified with all the stakeholders and how we allow developers to make that call that, you know, now I'm ready to deploy my things into production. Like I said, the definition of continuous deployment vary from organization to organization. And in our case, if you can see for the front end services, like just now what Aslam was demonstrating, the rollout can happen like 10, 20 times a day, right? Because it is just a shell or rather, you know, just the static pages and all that without disrupting the business functionality. But for most other services or most other features that touches, you know, the core business operation itself, which I think a lot of organizations will be adopting is, you know, unless we get to a certain stage where we say we are confident enough, you know, for all the tests to be done by the machine without all the other stakeholders involved. But I'm still waiting for the day to come. So, yeah, here what we have seen is from ideas, it goes into the implementations, the developers working on their local machines, and then triggers the build pipeline. So this is where we say it is the continuous integrations. Artifacts are automatically generated and get pushed to the artifacts bucket. And the later part is where we have the continuous deployment where, you know, we are continuously deploying into whatever environment that it is supposed to be deployed at. So one of the things that we have skipped, we wish to show you actually these steps, but when we did the timing, it is just not possible for us to squeeze this part. So earlier I mentioned that we assume that, you know, that feature environment is already created. It's just that the change hasn't been applied to that feature environment, right? So the very first step that, you know, developers have to do on this new feature environment is they go to our chatbot and once they have the artifacts ready for those features, they go to the chatbot and tell the bot that, hey, I want a new feature with this service name and this feature name. So the next step is the bot will actually check. Do we have all the prerequisites for this feature? Are the artifacts ready? And then once that check is passing, it will say, okay, I'm ready to bootstrap new EC2 machines, right? This is where the wait time usually comes in. AWS usually takes about three to five minutes at times to actually get the instance ready before we can do the bootstrapping. So the next part is where the Chef bootstrap happens. So we run the bootstrap based on the service role and gets the right artifacts deployed on that machine. And the last part is where the feature routing setup happens. Yeah, and then after that, what the developers get in the end is a test environment that they can start sharing with the QA and the other stakeholders. Okay, so hopefully we have a longer time. We could show this. Yeah, that's it. So questions? How do you take care of any database changes? Yeah, for that, actually that is a question that has been Rajeshwara addressed. So most of our services are using Mongo, so with which obviously schema free, evolution is not a problem. But the most recent services started using Postgres. So obviously you'll apply all the schema evolution, which is liquid base or whatever it is. That's something you have to apply for sure. But so far, because we have been using a lot of services on Mongo itself, we didn't actually face much issue on top of schema evolution. Just to add on that, usually even if we introduce any feature changes that adds certain fields or whatever, usually we have to make sure that it is backward compatible at least for a certain period of time before it gets face out. So nothing breaks, yeah. I noticed in your HDLC, you have some testing parts. Is this mapping or something? Which one? The way testing? The way you use it to record more. Okay. This one. So this part, are you referring to this one? This one. So here for the feature testing, of course, we have always ongoing automated testing end to end that is run through the scripts and all that. And to make sure that if everything is broken, it gets a catch immediately. The other thing is QA usually will have to certify, make sure that these features satisfy what the business requirements sets the feature out to do. You have 30 to 30 rollouts every day. It depends on which business domain you are talking about. So if it's front end, it's much faster. It's like less than five minutes. So we have night mages and also, you know, Selenium-based automated testing. You basically use either one of those things. And if it is iOS, it takes a little longer. I don't know if you have heard of this thing called Kalabash. So we are using Kalabash for testing. So bootstrapping and getting that going, it takes about 10 minutes or more. So you would actually reduce those things unless it is a major production release. So, yeah. So this part actually is all automated. That if you look at the build itself, what is happening? Oops. Okay. So there is a lot of a bunch of tests that was done. So yeah, all these tests starts for, this is the unit testing. If you can see, you know, there's a lot of tests going on before it can say, hey, you know, this change is good. Nothing is broken. And we can go ahead with the artifacts creation. So if the build, that those tests actually fails, we're going to stop and say, we are not going to proceed with the artifacts creation itself. So you can quickly go to Travis, why am I looking at those tests? Okay. Yeah. So actually we have a lot of mocks here, even in Travis itself. So, Mongo, we have... Yeah. And then we have the messaging queue itself, which we are using RabbitMQ. And Redis itself, I think we have options to mock, right? Yeah. But Redis, okay. Yeah. So I heard that you said you have Selenium automated testing, right? So does it increase maintenance overhead somehow, like if the week changes? Yeah. What happens if the automated testing breaks? How do... Yeah. That's a good point. So the way we are doing it is involved with QA of the sprint planning. They know what is coming and they'll prepare much ahead of time. But also because in case of front-end, it's all react-cooperative, you can do a lot of local testing on those unit testing of the component itself. So by integrating that into a bigger piece, that does not really cause much of a problem. Sorry. Yeah. So that didn't really cause any issues for us. We also use the same set of scripts to test the production every 15 minutes or so. It basically goes through the entire thing. Places, almost it goes all the way till the checkout and we don't place the order obviously. But yeah, it does not take much time. How did you decide that you are mature enough as an organization to see a CD process from the very beginning? That's a very good question. I was talking to Michael about that the other day. So we started our initial CI CD basically by automating the infrastructure that was back in 2013. So we just had one developer who knew nothing but Ruby. I said, Chef is all in Ruby. Why don't you try that? Let's start that. So yeah, I mean, we started with a bit of infra automation. That's what was much required at that point. That was the need of the hour at that point in time. But slowly we realized once we started growing the team, number of services coming in. Back then we had about five or six services. Today, as you saw, it's about 120 odd services. So yeah, we basically started back then and slowly started adding multiple features to it. But the one thing which remained is Chef. So we continue to build on top of that. Though we did experiment a bunch of other things, Docker, serverless architectures, and all those things, we didn't go there yet. The big thing which we are actually looking at right now is canary release, like what Surya just mentioned about. It's evolution. It's not something which happened overnight. But the good thing that has happened right now is that as most of you know with microservices, you need a lot of these tooling to basically deploy as quickly as possible. You have hundreds of services, heterogeneous environments. You have multiple environments, multiple stacks. How do you make sure that everything goes harmoniously into the production or whatever setup it is. So that's one of the things I feel has worked out quite nicely for us so far. Yeah, there's still a lot to be done. Yeah. So the question is what happens if two developers are working on the same service on two different features? That does not happen often, but if it already does happen, both of them will create multiple setups. Both the developers will have a dedicated setup for them, so nothing will actually break because... Yeah. Surya, can you just go back to the slide? Which one? The one which was showing the different environments. Wow. This one? Yeah, this one. So what happens is if let's say I basically created a feature branch, you also created the feature branch on the same service. So both of us will basically have this one, right? So we'll test to our heart's content and make sure that it's working fine and then only we'll merge to master, right? So typically, the rule, at least the basic best practice we have is that the changes to the master should not stay there for more than half an hour or whatever it is, right? So if at all you don't follow that, you'll end up having much conflicts and all those things. Yes, as Surya pointed out, this is thick for a reason because you can do a lot of iterations there. Don't do it in the master branch. Because it's in a feature branch, it's up to you to break it as many times as you want and go back. And once you're confident, then only you're merging. So we hope that it won't take long and it won't stay in the master for long. Yeah, that has worked out quite nicely for us. Any other questions? Any advice to achieve such a full lifetime? I mean, you know, imagine today I'm just happy to continue a little bit of work out there. It's like a manual push for the day of the very moment or for the prediction of the very moment. So where I start, you know, it's a lot of steps. So do I start by saying, do I need to deploy in one moment? This is my prediction. That's why it's a big step to do. Yeah, it's true. Couple of guys like this, that's one of the things. But besides that, obviously, you don't want to do everything at once, right? So you would take one step at a time. I mean, we started like almost four years ago and things have evolved. You know, we don't have to probably do so many of these things. You know, there's a lot of open source, you know, tools are also available to do that. And in fact, like what we were telling earlier, we want to actually also, you know, open source some parts of this as well. Especially the parts where releasing, semantic versioning and all those things across any kind of, you know, stack, whether it's Go, JavaScript or whatever it is, I feel that has come out quite nicely for us. So that I feel is probably of some use for others in the community as well. Yeah. So going back to your question, yes, I mean, I would just take one step at a time, you know, just have Jenkins or Travis or something. Yeah, yeah, yeah. And I think what is important is to get the feature environment ready. The key thing, yeah. That's what happens. Next one, please. This is the biggest difference we have in our infrastructure. You know, basically with microservices, besides all the complexities which you all know, if you were to actually create, recreate a working environment for every single feature you're building, it's going to be costly, both in terms of time, resources, and also money. Because, you know, you have hundreds of services. Even if you take a subdomain, you have lots of services to create, wide up, you know, there's a lot of time it takes. So this is the best thing that happened to us, you know, in terms of setup, because you're actually reusing everything from here. Let's say, in Surya's demo, he changed only one service called Member Service. So we created only one instance, which means we are paying for only one instance, and it's also setup time is much faster, right? And testing and all those things, deployment, everything worked out much, much faster. So this is one of the sailing features, I would say, you know, which, which probably, you know, we can talk a bit more some other time, or, you know, we should have some blog post as to how we did it. And also some of the scripts which are there could be potentially open sourced as well. Can you say why did you write more history in the future about the system? This came in about two years ago. Two years ago, yeah. You know, this one, this whole thing. It just came in at a time when we were about to, you know, jump onto microservices, like it was exponentially growing. We started with one or two services, and then it just started growing. And it's just about that time we came up with this idea, and it worked out quite nicely for us, from timing, resource, costing perspective. Yeah. Yeah, this and some of the other tools, and the scripts that we have built basically have to take us, the tool basically takes care of which feature, you know, patch and all those things, and also which, like let's say if we had multiple locations where you want to deploy, whether it's Singapore or Hong Kong, somewhere else, you could actually tell it takes care of it for you. One key thing I should tell, it's all on AWS only as of today, but because it's riding on top of Chef, it's pretty easy to, you know, use the cloud provider for your choice, you know, just change the APIs, and it should just work for you. Yeah, that's pretty cool. Actually, if you can walk through this feature and note creation, as I saw on the slide, there's a Slack chat board where the developer goes and says that, okay, I need this, I need a new feature note for a new service. Correct. There's a Slack, yeah, that's fun, right? And then the chat board identifies what all artifacts he needs and then creates a PC to instance, loads those artifacts there. Yeah. And then when the developer is done with the service, whatever developer he is doing. Yeah. Okay, then is there somewhere in this note creation routine that you are also identifying this note with Chef? Like, okay, when I'm deploying this, this is the note that you're going to deploy to or I'm a little confused how it identifies that note? Oh, that's based on Chef, right? Because the entire thing is, Chef is basically a configuration management tool, right? So when the chat board is creating this, is it that in this routine, only you are mapping the note to the note? Yeah, yeah, we are creating a new note based on the repo... Service name? Yeah. Service name that we are creating, where is the... Yeah, yeah. Drop this one. So based on this, we already know what type of instance to be bootstrapped using Chef. And we also know where we are supposed to do because it's a feature environment and basically create a new instance in this case. Yeah. And then when I'm doing a merge, then again, to some of the... Okay, you go back slightly. Identify which note to... Oh, yeah, yeah, definitely, yeah. So two distinct things, right? One is when you create a feature environment, we basically create a new instance. That's only... That's what, like, what Surya was telling, that creation and bootstrapping will take a while. That's why we skip this step. So, you know, barring that, but it's basically similar to what we do for the rest of the environment as well. And in case of existing environment, which is like alpha or, you know, which is pre-production beta or, you know, this thing, production environment, the entire configuration is managed under Chef. So it knows for notification service how many instances I'm running and where all are they located, and it basically takes one at a time and upgrade them, yeah. So you can make it even better by using Canary Readers, you know, blue-green deployments, you know, all those stuff. That's something we are working on right now. Yeah, I think going back to the Chef configuration, this is where, you know, the different environment is configured. So in Chef itself, each node has a tag that says, is it a production instance or beta instance, alpha or feature instance. It's through Slack as well. We can take a look at it. Yeah, yeah, exactly. So our bot name is called Minimi. So you can take a look at some of the commands. This, if you see, it's pretty limited because we have the access control to say that certain commands can only be triggered from a certain channel. So in this case, what you're interested in is, let's say, the cleanup feature, right? So the service name that we have is MemberSurveys and the feature name is Jagdin. Okay, then it will check whether it is allowed to run that command and it says, yeah, I'm going to find a node that has this role. As you can see, I think this might explain your questions earlier. So the node name, the role itself, one of the way that you can think about doing, which what we have done is we named the role as service name plus the feature name at the back. So now, as you can see, it is already deleting that node and it is clearing the feature routes. So the developers, that's all they have to do to clean up that feature. So you want to run the show status for the feature environments? Yeah, let's do that. Yeah, you can also... Let me just check. Okay. We have actually prepared a couple of possibilities for this demo. So you can take a look at the feature environment status as well, as far as from the front-end goes. So for the JAG demo, you can see that actually we have prepared quite a number of possibilities before we pick this one. It runs through actually the entire routing pipeline and check. You can see here. This is the API gateway that we have configured. So whatever you see in green is actually the active feature route, which means that if you see this diagram here, means that that endpoint itself is supposed to be a feature route, means that they should go into the feature instance itself. Okay, whatever is shown in the blue here, it's supposed to go into that pre-production environment. Okay. Yep. So now the test feature, I mean that instance is completed. Okay, so... Yeah. Any other questions? Any other questions here, guys? Anyone? If there's nothing else, I think... Okay. Thank you all. Can I get the cable?