 Hello everybody. My name is Pankaj. I work at Flipkart. A little bit about myself. I have been at Flipkart since the very beginning in Jan 2010. At that point in time we had like 3 servers. Now we have 2 data centers with 5000 servers. Also very accident prone I keep breaking myself and I like taking pictures sometimes. You can find me as Spoonman on the internet. Does this work for everybody? Alright. So I will talk a little bit about Flipkart what we do. Flipkart is an e-commerce portal. We have lots of users all over India mostly. We do around 4000 requests per second. Maybe more. Sometimes when we launch new products. We do one click build and deployment. I spoke about this at PHP on clouds. Another conference that we had some time earlier. You can talk about it to me if you like later on. About how we do production pushes and stuff. There are lot of this in Flipkart. These collaborating together on Git and SVN. It's a large setup. Let's talk about what we are really going to talk about today. Essentially we are talking about building elastic infrastructures. Why should we automate infrastructures? Why should we automate infrastructures? How to scale infrastructures and applications in an automated manner. We will talk about an open source tool from Flipkart which helps us achieve this. If we have time we will also talk about philosophy of deep customization versus generalized tools. I believe that when you are trying to solve problems you need to look at the problem and have a customized solution around it instead of using some general tools. Speaking of the problem, what is really the problem? Why are we building infrastructures and why do we need elastic infrastructures? We will talk about some of the history. Essentially in 2010 I was the only guy at Flipkart doing ops. Before being at Flipkart I was at Yahoo where I had this cushion living where if I needed a server I would write an email. If I needed a DNS entry changed I will write an email and somebody will do it. But when I was the only one running infrastructure I had to do all those things myself. A lot of these things are really repetitive and boring and also pain in the ass to do. For example, let's say there is a new project happening and somebody comes to me and says give me five servers to do this. I am sitting there installing Debian. I am running all my things. I am running all the infrastructure setup and ten minutes later this guy comes and says well forget about those, give me these. Essentially you are constantly struggling with meeting the demands of what is going live because if you are web app or web company which is growing really fast you need to be able to do things in a very, very fast manner. You need to do, you need to get machines, you need to set things up really fast. There are also times when you don't know what hardware you have. Essentially if you grow from ten machines to fifty machines in a span of a month or two you really don't know what is live, what is not live. So dangerously you could reclone the machine or something. There are a lot of routine tasks that you need to do like monitoring setup, setting all those things. You could forget if you are setting things up manually. Essentially the need was to be able to create infrastructure in such a manner that the obvious things figure themselves out. You don't have to do those repetitive tasks again and again. Some of the challenges for this is to provision hardware as well as virtualized machines just like that in a click without doing anything. How to configure DNS automatically? If you manually edit the zone files you will end up making mistakes sometime or the other. You shouldn't edit the zone files. It should just happen on its own. Also if you are growing really fast you need to be able to set up lots of infrastructure pieces on a daily basis. Set up MySQL clusters, set up monitoring and alerting, log aggregation and all these things should just happen automatically. That's one of the key reasons why you should build infrastructures in an automated manner. Some of the early learnings really early on three or four years ago was that you need to standardize on hardware. You cannot have all sorts of different hardware running around in your data center. You say I am going to use one, two, three types of machines. Let's say you are using VMs. You say I have a small VM, I have a medium VM and I have a large VM. These are the configurations of the VM. It will have this much memory, this much CPU and this much disk space. All the problems that you are trying to solve you put them in brackets here or there. You will find out about remote installation and management. Most hardware comes with a remote management card of some sort. It will have its own operating system and network interface. You can manage the hardware from there. You could reboot the box. You could do anything with it. That was also an early win to find hardware that allows you to do remote management. Because if you are stuck with hardware that you have to get a remote hands person to do things with, every time you have to reboot a box, every time you have to do rate control or anything to do with hardware, you shouldn't be dependent on a human being in the data center. You should be able to do it remotely. Virtualization obviously helps a lot. Another thing that is really important is to package software. You don't really need to make RPMs or devs, even if you have tar balls that will work, but you need to have some standardized mechanism of deployment. You cannot just SEP code and hope that it will work. Another early win for us was from the very beginning started using Puppet. Puppet is a configuration management tool. It lets you push configurations to servers. You can break your configurations down in two parts. One is the base environment that is going to be the same for all the machines. Some are machine specific or application specific information that you keep in another environment in Puppet. One of the most important things is to have a centralized inventory or a host database. We will talk about why this is so important. In the beginning, we were managing our infrastructure or the list of hosts we have, our host names and what is the rack position, what is the network IP of this box in a Google Sheet or something, which was not scaling at all. You tell the site ops guy, go to this rack, this unit and do something. He has no idea what he is doing. He is just going to pull some wrong wire out. If you are managing your infrastructure in Excel Sheet, somebody may read the wrong column and reboot the wrong column. You may allocate hardware that you know that it is not even being used and you do not know about it. All these problems were there in our infrastructure. How did we solve it? Essentially, we go to a diagram like this which talks about interaction between different components. The problem essentially is of information and communication of that information to various pieces which are taking action. We will look at this graph again once we understand something. At the core of the whole problem is the host information. What is the purpose of a machine? You know the purpose of the machine. Does your infrastructure know the purpose and how do you communicate that purpose to your infrastructure? For that, you need a sort of information store or a host database where you can keep the details of all the hosts. You have this is a box. This is its physical location. This is the rack where it is at. This is the network port that it is connected to. If it is a virtual machine, this is the box it is hosted on because all this information you need to know about the machine to take actions. You should be able to tag a host. Let us say the ability to tag a host gives you a lot of information about it. What you can do, for example, is let us say you have MySQL databases. Maybe you have 10 clusters. There are some masters and there are slaves. If you were able to tag all the masters saying these are all the masters, they may be masters of different components of data. If you can tag them, you can take action on all of them together. For example, you may want to say that you cannot write two slaves. You will want to set up read-only equals to on all the slaves. What you can do is you can tag all the slaves together and push a configuration specifically to them. All those things are really easy to do. Also, it becomes a single source of truth for everyone and for a lot of people to interact. What happens there is, this is like a screenshot of hostdb. Here you see the host name and this is the information about this host. This host is actually a virtual host because you can see that it has these classes, FKEN, VVC, which is our pseudo name for a virtual machine. You know who created this host and when. You know what is the full name of this host, what is the IP of this host and all this information is here. It is all fine. It is not a great deal. You store information somewhere. It is fine. What it allows you to do is it allows you to get everybody on the same page. Now you can communicate to everybody in your organization with the host name. You say do this to that host and if there is a guy in the data center, he can go to hostdb and see where it is physically located. If you want to set DNS for the host, you could write a script which could look this guy up and say, okay, this is the FQD and this is the IP. I will make this one file. Another simple idea which is very powerful is to have an API for this host database. If you have a HGTPA API where you say, give me tag, give me all hosts which are in this tag. Then you can do really interesting things. Because it is a single source of truth, you can be sure that whatever you are going to do is to have a single source of truth. Even if it is wrong, it is wrong for everybody. It is not that somebody thinks that this is the truth, somebody else thinks this is the truth which allows you to do even more interesting things. You can create decentralized pieces of software which interacts with this centralized repository to create a virtual machine on a particular physical box. You could go to hostdb and check how many physical machines I have, how many virtual machines are on each physical machine which is the one which is the most free where I can create this virtual machine. You can make really interesting decisions. For example, lots of people use Puppet here. How many people use Puppet? Quite a few. One of the problems with Puppet is to get a list of hosts and create nodes. Puppet allows you to do an ENC. If I am creating a machine and destroying a machine, let us say you are scaling and scaling down, scaling up, scaling down at will. If you are creating and destroying machines at will, you do not want to be sitting there editing node files. What you can do is, as soon as you can put that information in the hostdb, write an agent which scans the host database and then creates these node files automatically. Even that is smarter. You can create an agent which adds DNS records. You can create an agent which adds automated monitoring to hosts based on specific things. We will get into how we can express that. Now we can go back to this diagram. In the middle, we have this host database which has an API. This is like our orchestration layer. We call it cloud. It generally just creates virtual machines. What you can do is, you can have a software here which creates a machine in hostdb. There will be an agent here which says get host information and get puppet information. An agent will create a virtual machine and say when the virtual machine comes up, it will say what am I? What is my host name? What am I supposed to do? All this information you get from puppet, you can express in puppet that this machine is actually a web server and it needs these packages and whatever you want to express. Other things you can do like a monitoring service. Let's say you have Nagyos. Lots of people use Nagyos. You have to keep adding hosts. You have to keep adding them in clusters and all of that. If you can just write an agent which says we have web servers. All web servers perhaps need the same sort of monitoring. Let's say you are a web server for a particular thing in a website and all your brother's web servers will need the same sort of monitoring. What you can do is, you can put a tag saying web server for x, y, z and put the monitoring information right there in the tag. What hostdb allows you to do is it will pull all the hosts will pull their tag specific information as well. Whatever you write in one tag is just shared across all the machines that share the tag. It's a simple thing but it is very powerful and we will see what all we can do with it. Essentially as I mentioned, you have a single source of truth. You define the life cycle of the host. If you have to delete a host, you go to the host database and say this host is no longer live or this is dead. Then you have the orchestration layer, the cloud layer looking at host database and saying this one is dead so it will go and kill that machine off and take all the resources. How did we do it? It's essentially very simple. You take Puppet and you take hostdb which is another open source software that we have written. We have been using hostdb inside Flipkart for three years now. It's fairly mature software and the author of hostdb is here today so you can come to Flipkart and talk about it. What does it do? It's built to be highly available and reliable. We didn't build hostdb to be really fancy using latest and greatest technology. We just wrote it in Perl and the main purpose is to be available and be reliable. It allows you to have namespaces. What are namespaces? You could have all information about hosts which relate to your infrastructure in one namespace and let's say there is another team which is doing something else with machines. You have your own tags, you are doing your own clustering but there would be somebody who wants to cluster machines differently. You want to create clusters in tags. They want to create cluster in tags differently. You can have another namespace. Hostdb is based on Git. All the information that you have is saved inside Git and you can roll back any commit made. If by mistake somebody says change this so you can always go back to the previous version and you have an API. It has a web application. You saw the screenshot. It has an API. It has a command line interface. You could write scripts that can query. There is a Perl module. There is another C library. You could write programs that query hostdb directly and if you don't want to use the API. Puppet is infrastructure and configuration management tool. What you do with Puppet is essentially you define all your machines in Puppet as nodes and then you define classes which say what is the purpose of this host. What all should go into this box. What we do with Puppet is we essentially segregate the things in two environments. One is a base environment where all the information like how to do network, what default packages will come, what pseudo privileges will be there, what are user limits and like basic stuff that every machine should know about itself. That goes inside the base environment and it is important which means that there is an agent running on each and every machine which updates itself automatically. If you make any base change it reflects on all the machines automatically. Then there is another environment where you don't want to do things. Sometimes you just want to do things once at a time. Let's say when you are setting up a MySQL server. If you want to set up MySQL automatically so that a master comes up and a slave comes up and all. There are some things that you may want to do. You will write class and you will write scripts that will do things automatically. You don't want to run them again and again. Those things you go inside an app environment which runs once or maybe once in a while. You can use hostDB as ENC. Puppet has a concept of an external node configuration. What it allows you to do is it allows you to have, it allows you to dump the host information in YAML. This is the exact configuration that you use. Here we will say external nodes equals to one script. Our script will generate a YAML file and then Puppet will just use it to say these are the nodes. What we do is we define our machines and virtual machines inside hostDB. We run agents to dump it to Puppet. In Puppet we define what is the purpose of this box. Some of the examples of what you can do and how you build elastic infrastructure with it. For example, auto-provisioning of hardware. How many people deal with real hardware? Have data center and rack machines and stuff. It's a really long process. First you acquire hardware, then you have to rack it, then you have to go configure RAID, then you have to go configure IP addresses, and you have to do a lot of things. Every time you need to reclone a box, you need to go to some really, really bad interface, click which takes like 15 minutes to do, so it's really painful. What we wanted was as soon as you rack the machine, the infrastructure should know there is a new machine. We should know this is a new machine, it has this much RAM, it has these many CPUs, and it should become this. That's what we want to achieve, and how did we achieve it? Essentially you write an agent which does auto-discovery. Essentially when all the machines are racked, they have two interfaces. One interface is connected to the regular LAN, another one is on a management interface. Management LAN is there. As soon as the machine comes up in a default VLAN, we just boot the box and we do a discovery. We figure out this much memory it has, these are the MAC addresses of this box, and all the information that you need. Then you populate that information into hostdv. Inside hostdv, now you have all the machines that are not yet live. They are not yet installed, they just have been discovered. What you can do is then you can mark them and give them a purpose. Let's say you can do these things automatically. When the machine comes up, in hostdv you say this is supposed to go inside cloud. That will express that, install the bin version, this is the IP address, put DNS, and all of those things. You use DHCP to boot it into EtherMage that will give you all those information. This is the part of hardware detection. You say this is a new box that came up. When the box comes up, it doesn't have a host name or IP or anything, but you can express that. If this is the tags, each machine is defined by a service tag. If this is the tag, this is the IP. Essentially you can figure out it is connected to this port of the switch. It has a MAC address, it has this physical location, unit number. You can write scripts that interact with network devices to change VLAN if you wish, to do different things with the box. Virtual machines are even simpler. What you can do is you can just create a new machine inside the host database and write an agent which scans a tag saying new virtual machine. It does what it needs to do, whatever it may be. You can use KVM, you can use OpenVZ, you can use containers, whatever you want to use. You can express what needs to be done post the recognition. What we do is we have these mother ships on which an agent is running. The agent creates the virtual machine, starts the host up and the first task of each host is to run puppet. As soon as the host runs puppet, it knows what it is. When you put the machine inside, let's say this is the box we are trying to build. When you put the machine inside, you have put this information. When the puppet agent will build the node configuration, it will know that I have to add these classes. It is really simple. Puppet knows what to do. When you need DNS agent to do something, a new machine comes up, DNS agent looks at these things and it generates the zone file. Even DNS is automating, DNS is really simple. DNS looks at the tag saying all the machines that are participating in this tag, I need to build a zone file or X for this. There is another tag, I create another zone file for this. It is really simple. Essentially, as I mentioned, tags. You have a tag section. For example, Nagia server is a tag. These two guys are members. One is in one DC, one is in another DC. You have two Nagia servers. The Nagia server will have its own specific rules which will go inside the config part. It will just use those rules to generate configurations. How to automate backups? Essentially, the same thing. You tag it with backup. I will give you an example of what we can do. You can say, this can be different for different hosts. This host has a backup thingy. You can have this information specific to a host or you can have this information specific to a tag. If you have something that is common across hosts, you put it in a tag. If you have something specific for a host, you put it in the host configuration. Then you can write an agent. It just dumps stuff for it. This is the same way we automate everything. Monitoring on the host also is generated using information from host TV. Scaling applications. So because we can create machines automatically, we can interact with host TV using an API, which creates a machine. If we can express what needs to go inside a box. Even the puppet rules that go inside a box, even the puppet rules that goes inside the box are there. What you all need to do is express the box and then everything will just happen. You can just create machines at will and you can destroy them at will. Essentially, you can write a small script that monitors your request per second. For example, a really silly example. Monitor request per second. If it is increasing, just add another box. It is not that simple to scale, but you get the idea. You can do interesting things with it. For example, unbound is expressed as its own class. Whenever you need to add a new DNS client or a DNS server, you just create a machine with that tag and it just builds it. We can generate load balancer configurations using this. You can express load balancer information inside the configuration and a specific agent will read it and write configurations into the load balancer. Same way, we build MySQL clusters. Essentially, one of the problems is when you have to build a MySQL cluster, it is a really painful long procedure, which is repetitive and can be easily automated. You create a master, you set a slave, you create another machine, you set that to a slave. It is really simple to automate. You can set up a tag. This is the master and you can set up a tag, which is a slave of relationship. You can have rules to create partitions, add packages, set up automated backup and all of these. HostDB is released under a partial license. It is available at GitHub. We are committed to maintain it. The deep conviction here is when you are building, the reason this is sort of a framework is when you are doing something specific, you just need a bare minimum framework and you have to build around it. If you see the hostDB and the agent thing is really simple, but it has allowed us to really do a lot of interesting things because it allows you to express whatever you please because there is a centralized information about hosts. That is about it. I mean, I am not preaching a way of doing things, but this works for us and that is my job. You can follow Flipkart Tech for information about Flipkart. You can follow me. Questions? A tool to maintain hosts for the amount of servers that you have. Puppet is kind of given not like you are building it. It is a third party tool. It makes a lot of sense because your business is a little different. Is there a comparison that you have done with other tools that maintain hosts? Similar to what you are doing with hostDB and you have found not tools beneficial? I am just curious. I mean, when we started writing hostDB, this was like 2011. At that point of time, there were not many CMDBs. Now, I think there are a couple available, but I think the interesting thing is the ability to access information about hosts and cluster them together in tags and having nameshift. We have built the tool. It is customized to our approach to solving things, but it is also a very generalized tool that can work for everybody. I think there are tools, but I do not think there is something comparable which can do what we do. One other question. I wanted to understand the part where you mentioned once you rack your server, there is something that gets automated out and there is information going to hostDB because you are talking now at a hardware level and I do not know you have an agent running somewhere else that picks it up. I can explain some. You have two interfaces. One is the regular interface of the box and one is the management interface. Using the management interface, you could reboot the box. You could change the boot order of the box. Essentially, as soon as any box comes in the default VLAN, it is made to boot in PXC mode. As soon as you boot in PXC mode, there is a default DHCP server which gives you an IP and a host name and it also gives you a kernel to boot. As soon as you boot that kernel, because you are in the default VLAN, you get a particular version of Linux. As soon as you boot into that box, it is customized and it runs a discovery script. Essentially, that script then communicates back to hostDB and says, I found this. You have, because when you mention single source of truth, in our world, we do a lot of UCS. There is that world that still maintains host which is away from foreman. I do not want to do that. I get that. I do not want anybody to sit and enter that information because I can discover it and it is more reliable. Thank you. Hi. You mentioned that you are using Puppet for compression management. How about the orchestration part? Is that M-collective or some other? Currently, for orchestration, what do you mean? Like to create machines? Yeah, doing something remote execution or remote running commands. Right now, we are in the process of using M-collective to do that. M-collective is really good at doing that. We do not use anything to do remote execution, large scale remote execution. If we have to do something like that, we put it in the idempotent. Essentially, we have divided our configuration in base and idempotent. Let us say you need to upgrade LS on all machines. Essentially, you just go to the base classes and say LS should be this version above. And then because it is idempotent and it is running every minute, it will immediately reflect on all the boxes. You do not have to worry about it. If it is not something that can be expressed as packages, we do not do it. That is one thing that I mentioned right in the beginning. You have to manage your infrastructure using a package management system. If you are doing remote execution of commands, let us say you want to restart Apache on all the boxes. That is probably one of the use cases. For that, you can use M-collective. All of these are agent-based solutions. If everything is working fine with the agent-based solution, then it is good. But what happens when the agent fails? Either you auto-discovery agent fails or something which is something on the box to gather all the info. What happens when that fails? So then it fails. Essentially, because it is a single piece of software that is running everywhere, you test it and then you deploy it. If it has a bug, it has a bug. Then you fix it. You discover the bug and then you fix it. Let us say there is an agent which is writing a DNS configuration file. This is a very important agent. If this agent fails, Flipkart will not run because no host will be able to find the other host inside our network. We write the agent in such a manner that it takes automores, precautions as to this is the specific thing that you are looking for. If it does not exist, do not do anything. There are a lot of rules. Is it a single agent doing everything or are there different agents? There are different agents. That is why deep customization. For DNS, there is a specific piece of code that is doing it. It is a small code maintained by one guy. To do something else, let us say monitoring. To do monitoring for servers. It is a small piece of code maintained by the guy who is managing the web servers for a particular department. Because information is central, but how you do things with it is up to you. Hello. Hello. Hello. Hi. Hi. This question is related to your host db is there and then your missions are connected to a rack. The host db is something which has some sort of an agent is what you said. Which checks if the one rack is connected and it gets the data. Any concept of a messaging broker or a worker where it checks on a UDP or the multicast and then gets the data or something like that? Sure. We are not at that scale. For example, we have 5,000 boxes and how many agents? Maybe 10, 15, 20, 30,000 agents. And we can scale that by just having multiple boxes. And very little network stock is there. So I think the broker and all that is useful when you are doing large scale network I.O. Yeah, but still I see in Flipkart that is their large scale I.O. So there are things where we do large scale I.O. For that we use request broker. But for this particular software there is, I don't think right now there is a need. But if we become, if we grow to 50,000 and 80,000 machines where like millions of host db calls are being made then obviously we will need to look into that direction. So is that something like the agent and the host db are with the Perl or is it something like the agents are in other language? You can write agent in any language. So because host db allows you to do a rest interface. You can write it in any other language. So you pull it on different machines with the scripts. Yes. Awesome, thanks. No questions. All right. Thank you guys.