 Hi, I'm Sriharsha. I work in Platforms team at Capillary Technologies. So today I'll be talking on how we have optimized the setup time of a 16-hour production cluster from three days to two hours, first thing, setting up, why so slow? So earlier, what used to happen was there used to be four ops guys who used to manually launch instances and create volumes and attach volumes manually on the AWS dashboard, config individual instances. So all the custom configurations required for the services were either copied from an existing cluster instance or hard-coded manually on the instance. This is not replicable. It works for the first time, but again, if you want another cluster to be set up, it becomes really hard to get the configurations right, setting the database schema and inserting the seed data. So when you have multiple clusters, then maintaining the database schema across the clusters and getting the minimal possible seed data is again a huge task. So deploying the applications tag and configuring it, installing all the applications on instances one by one, and pushing the key files and other configurations required for it, and starting the services is a manual effort. It's a pretty repetitive task, and it is highly error prone as it's fully manual. So as capital is going as a company, our scale is also increasing year by year. So we have moved towards a process of automation. The first thing towards it was centralized configuration management, database versioning, and as well as automating the deployment. First thing I'll talk about is centralized configuration management. So any file which is deployed on the production cluster in capillary goes via Debian. So all the custom configurations of the services are packaged and are deployed using a Debian. So we use Debian's displays, replace, and hide for confile resolutions, confile conflicts. Configurations may be static or variable across the clusters. It means that there are static configurations of the configurations which remain static across the clusters, like patch a log directory, sort the ports on which it listens to. They remain static across the clusters, and variable configurations across the clusters are the variable configurations, like instance-specific and region-specific URLs. Instance-specific configurations which depend on memory of the instance or which depend on other configurations. Variable conflicts are managed using template replacements during post installation. You think a custom template script. So in Debian, you can have a post install during which we do the template replacements. Cluster-specific metadata is maintained in each instance as JSON files. And our custom scripts can fetch values from zookeeper nodes and CLA arguments also. An example. So you have a JSON file which can be specified like this. And you can create a template file in your project with a region URL. And you can use some templating engines, such as Mustache or any other, to do the template replacement. We have additional flags to fetch values from zookeeper or CLI parameters. We use Mustache as it has support for multiple languages, and it can be easily integrated. The next thing I'm going to talk about is database versioning. So why do we need database versioning? Capillary has more than 25 databases and 400 tables. And it's pretty difficult for us to maintain the schema uniformity across the clusters when there are more than 100 deployments in a month and maintaining the consistent and essential seed data. And when you version your database, it is also possible that your application might depend on a particular version of a database. You can have that kind of a dependency. And in developer machines, the schema can be replicated easily. So the production schema and the developer schema will always be in sync. Before the developer starts working, he can always keep his schema in sync with the production environment. So the tool which does it, DB Migrate. So DB Migrate logs all the schema altars and seed data inserts updates in the database itself in a change log table. So a DB service is a logical set of databases which is my SQL instance and are used by a particular module. So capillary, we expose each database as a DB service. The basic commands exposed by the tool are status, update, and log, for which I'll show you a small demo. It's in Python. So demo is the database. If I do log, there is nothing else. So here you can see that it says the database is on release version 0 and local version 0 status. Can you see the database is on release version 0 and local version 0. Release version is the version which is on the prod, the following SQL files. So these are the four SQL files are the SQL files which are applied on the prod, but not are applied on local system. So you can do it. It will ask you like you can you're saying that DB Migrate update the demo database to version 2. It applies the changes in the tool. You can also specify update. It will update all the SQLs. DB Migrate deployment cycle. So DevQ is a query in the query directory. QA reviews and approves the query. So DevOp prepares the release. So DevOp run a script, DBM release script, which moves the query from the Q directory to the release directory with an appropriate version number. And you run the DB Migrate update to update the changes. DB Migrate deployment cycle. So before the developer starts his work, he updates his database to the latest prod state. And if any conflicts are there between his local and prod schema, it will show a conflict. And the developer has to resolve it himself and resolve it himself and code. So the challenges we faced in writing the script like bringing the current DB deployments to sync across the clusters, figuring out the minimum essential data to run all the applications, and creating the seed data for all the applications. Next thing is we have used Fabric and Botoscript to automate the cluster setup. So this is a sample VPC architecture which I'm showing. So we have a public and a private subnet in each availability zone. And we use a term called logical subnet, which is a grouping of a similar type of instances together. So that we can have similar IPs between a type of instances. For setting up a new cluster, what we do is we run Botoscript, which sets up the VPC and launches a private and public subnet in each availability zone. Next thing is launching instances. So we have something called as roles. Roles are something which define what is the purpose of that instance. And we persist those roles in the EC2 tags. So whenever a machine is launched, if you are launching an app server, you tag that instance as apps around the ECT roles. And when the fabric is running and performing operations on it, it uses Botos EC2 API call to figure out where it is running, and it performs the actions on it. So for launching, just we give that this is a sample launch script. So in roles, we have a prefixed cluster prefix. So if you can see the roles, this test is the cluster prefix. We have something called cluster prefix. In a cluster, all the machines are tagged with a cluster prefix. So if you filter by test, you can get all the machines in that particular cluster. So you can filter using other roles such as app server. So the tagging part is done by Boto or a fabric? So actually, everything is wrapped under a fabric script. It calls a fab launch. So fab launch, what will it do? It will choose a host type. It will choose a host type. So in the configuration files, we are meeting a metadata such that if it's an app server, its tag should be app server or such that. And its logical subnets where it should be there, that part of this thing. And all the logical subnets are also tagged. And the launch script also, based on the instant type, it also attaches the ephemeral drives and also the EBS volumes. It creates and attaches the EBS volume also, based on the configuration. All this data is maintained in the configuration files itself. So whatever management tasks we are running on the cloud, we use this fabric script. And we use something called as filters, role filters, and host filters. So role filters are something which define the purpose of the machine. And we define custom fabric tasks to run the management tasks on those machines. End results, 16-hour production cluster, fully functional in just two hours, stable and frequent release cycles, hundreds of schema changes hitting multiple regions every month, maximum possible uniform take of configurations across the clusters. Questions? Can you hear me? OK. The DV migration scripts part, right? So you execute those migration scripts before the code has been written or before the deployment is to be done. At what point do you, if I have a feature, at what point would you execute the DV migration scripts for that? Suppose a new part of code is going on. It has to go to the prod, and it depends on particular schema. Then you mention in the ticket that even first this DV migration script has to be run, and then this package has to be installed. It's before the application has been installed. And eventually, I mean, these are executed automatically, right? Whatever is your deployment process that executes the DV migration scripts, and whatever you have mentioned, and then do the deployment. Is that correct? First execute the DV migration script, and then the deployment, deploy the application. OK. In that case, I mean, if you have like 10 servers and basically behind a load balancer, let's say, and you're basically taking out one server, and you just want to do the deployment on that, see how it goes. If it goes successful, then basically do that for the remaining nine servers. In that case, if you change the database schema for the first deployment, and the rest nine may be using the older schema, would that not screw up things in production? So what we do is, if it's a minor change, or if adding indexes are this kind of change, the server will always be in production, and it will go on. It won't make much of a change. If there are some altars which will completely affect the code, and it will give an exception kind of a change, we will, so what we'll do is, we'll take a downtime of an hour or something, and we do that. OK, all right, cool. Thanks. OK, thanks, Yashia. Thank you. Thank you.