 Yeah, alright. So again my name is Pankaj, I work for Flipkart and we are dabbling with virtualization and this is our first effort. It's called cloud. It's basically a way to create machines really fast using whatever hacks we could come up with. So let's look at it. So the aim was to provide a service that will allow you to add capacity really fast and it will reduce the time it takes to add machine because if you're a startup and you're growing fast you need to keep adding hardware to support your software and you need to every single week some new component will be added. You need to add capacity for it and you need to make it grow. We also, one of the aims was to integrate seamlessly with what we had so we came up with this. It's basically a simple to use interface which allows you to create machines which are well defined so there is only like you have an option of small which is our version of a core. Our version of a core is 3% of a 32 CPU box. I will explain why it is like that. You have 4 options of hardware to choose from and you can destroy the existing machines that you own or you can look at the status of the machines you own. So we have a set of inside it's called staging which is what the integration environment is. So for almost a year we were using 3 servers for staging before we came up with this environment. And within a month of getting this environment up the staging the number of staging machines reached 139. It's 3 to 139. Time taken to allocate 60 servers. It's anything between 10 to 15 days because if you have ever put a server in rotation you know what it takes. You need to you need to install it. You need to make sure that it's up and running. You'll see issues with it. Even if you use Kickstart or FAI or something you'll always run into issues. Using cloud it takes 10 to 15 minutes to allocate as many machines as you want. It's the same amount of time it will take for one machine or 100 machines. It doesn't matter. So this is what it looks like when you create a machine by hand. You use some sort of tool to figure out where your machine is put it in PXC boot. Then you have to log into the centerfills, log into BIOS, change that stuff. You have to boot it, wait for it to boot, then it installs itself. So on and so forth. It's a long process. With the current setup you can make a small shell script to create as many hosts as you want. You also can use the web interface. The same functionality is there with the web interface. All you need to do is give the hostname, choose which size you want and choose where you want it. So we have multiple installations. So for staging there is an installation where hardware is allocated and you create your VMs, they come up there. And similarly for other environments. And it tells you what is going on. So once the machine comes live, it also gives you a status report of what this machine is. It tells you the IP of the machine, who is the owner of the machine, what is the status of the machine and which physical machine this machine is on. So anytime you need to know anything about this box, you go here. It also tags it as a VM. So anytime you want to find out, you can tag whatever you want. You can tag it punkage. So all the machines you create, you tag it punkage. So you can just go to the tag and you say, I want all the machines I created. Or you can tag machines based on teams. You can tag machine based on whatever comes your fancy. You can always go to them and you can access them. So what do we use for this? A lab is for authentication of users. We use Puppet a lot. The machines themselves are open VZ containers. And DNS is automated. How do we do it? That is the question. So please open VZ because the requirement is at Flipkart, we mostly use Linux and we're not going to change anything anytime soon and start using Windows for something or any other OS. So since the requirement was mostly in Linux, we decided to go for open VZ because it works mostly like a jail and that matches what we want. We don't really want a complete virtual environment because mostly the applications we're going to run are going to run in a JVM itself. So we have that much control. So all we need is a place to contain people and share that hardware. We use Puppet for automated roles. I will explain how this works. We build a host database for a lot of other things but the cloud uses the host database and really makes it work. We use daemon tools for all sorts of things and there is a free software open VZ with a panel we use that as well. So this is like an explanation of what happens. I don't think you can read it so I'll read it for you. So there's the front end or the API or the command line or whatever you want to call it. It interacts with every other component using an API. So the most important component is right here, the hostdb. So what happens is when you create a new VM using the command line tool or the web interface, it makes an HTTP call to a physical node. This node can be equated to a name node. So this is the guy who controls all the other physical machines which can host VMs. They all talk HTTP to each other. So this guy knows how many VMs machines are there to host VMs and it can connect to them and say create a VM for me and this guy will create it and this guy will tell this guy back. I created it. So when you create a host, what happens is the front end, this is called the FKBC maker, the software, it does two things. It goes and creates a hostdb entry, something like this. Something like this. It creates this entry because it knows what the IP is. It creates a service tag based on the IP and it creates a hostname and other stuff it adds. It also adds the tag called VM. So this entry is created in this hostdb. And also it makes a call to this guy and says make a small machine for me. This guy sends back, okay, I got it. And this guy says your machine is ready. Chill out. And all the other things. So let's say this guy told this guy to make a machine and goes to this guy and says you seems to be free. You make a machine. He makes this machine. So what happens is when the VM comes up, it has a reboot task associated with it. So the first time it boots, it is told to execute a script. This is the pulse script. It executes that script called flip codify. What it does is it downloads, it basically sets up the app repository. It downloads certain packages like puppet. It sets up the keys, SSH keys for root access. It does a lot of these tasks and makes the machine ready for you to use. So once all that is done, it also runs a puppet dheventy. So I will explain what that does. So meanwhile, what is going on is this guy has gotten the entry about this new host that we are creating. So the DNS server runs a script on itself using daemon tools. It is a script that basically makes a call to host dv and asks, do you have any new machines? And this guy says, yeah man, I have some. And he makes DNS entries for it. Same thing happens on all the other services. Nagios is continuously polling this guy. Whatever service you want, you can just pull this guy and say, give me information about this tag, the VM tag. I'm very interested in it. So this guy will keep all information about this. And whatever you want, you just place it. So there's no limitation on what you can put it. This is basically a YAML. You can shove anything in there, right? If you want something for Nagios, you put it in there. If you want something for your X application, you put it in there. So whatever information you want, you put it with the host. Whatever information you want to add for all the hosts, you put it in the tag. So tags themselves can also keep information. So that is what is going on. So this is the preview of what happens. The puppet server will also run the same task using a demon tool process. And it will say, oh, I got a new machine. I've got to add it to the staging nodes. Or if it gets a prod machine, it adds it to the roles that is prod roles. So you have to do the basics, right? You have to first standardize the hardware. Because if you have non-standard hardware, you will spend a lot of time figuring out which machine you can host which VM. And what you can do with it, what you can't do with it. Because if you have a standardized hardware, you know that on a VM, I have 32 cores, I can only create 32 one core machines or 16 two cores machine, and so on and so forth, right? And you associate weight of RAM with each core and weight of the space with each core, and then you go with it. So we have a standard for this. So our app class machine is a 32 core, 1-bit G RAM machine, which we chunk it. And we don't virtualize databases. So database machine is this. The important thing is to get a centralized host database. This thing is very important because it lets you keep all the details of the machine in one place, right? It should allow you to store any sort of data. It should not have limitations that you have to have this, or only these values are there. You can just randomly store whatever you want. It should allow you the ability to tag hosts, because once you have the ability to tag things, you can sort them, organize them in like various manner, and it becomes extremely handy. It should also have an API, because we are able to do what we are doing, writing individual components, making it top to it because there's an API. So again, an explanation of the host database. This guy will have all the details. So this is a physical host. The first entry we saw was a virtual host. Now, this is an entry for a physical host. So all the physical hosts also go into the same database. All the virtual machines go to the same database. All the stuff goes into the same database. So this guy has these tags. So this is an entry for a Nagios server, I think. Yes, it looks like a Nagios server. So this guy also has the location of where this machine is. So you can anytime just go here and say this machine is in the fifth rack on this row. This is a preview of a tag. So tag can have details associated with it. You can say for all the machines in this tag do this. You put that information in the tag and all the machines, you can read it for all the machines. So if you make an API call for a tag, you get all the machines in it and you also get all the properties. So again, this is what we do. Data is stored in key value players and hosts can be tagged. Tags can have properties and it's a source of truth. That is very important. So again, we'll go over this. If you see, you don't need to go over this. Okay, yeah. So this piece, like I said, this piece decides which physical machine to create the VM on. So what this guy has is it has a simple rule engine that you can build on. So you can set up rules like no two machines of this class can be on the same physical host. You can set up rules like if I'm creating web server, you need to have them on one switch, on this switch, because you have all that information. You can just query the host TV and find out what are all the machines on this switch. Right. And then you can just say all these guys go to that switch. Examples. When a host comes up, it runs the task and then it connects to Puppet. Puppet server also runs a poll on host TV, create new nodes. It adds hosts to roles and it creates, you can create roles. So what you can do is we can create a role and put it into host TV and you can just pull it out from the Puppet server and shove it into Puppet. So you can have, you can give users as much freedom. They can create their own stuff inside Puppet without ever editing Puppet files. This is, this is the life cycle of Puppet. What happens is if you saw the machine has the status tag, right? It's either live or dead or running or something. So this tag is set by the FKVC makeup. When the machine is coming up, it is in a particular state. When the machine comes up, it is in another state. So when it is dead or planned, so if you destroy a machine, so the guy who runs the poll on Puppet, it will just delete all the certificates and everything for that guy, delete the node injuries. And when you put it in another state, it will do whatever is necessary for it. Like in the running state, it will update the host TV and it will rebuild the stuff. This is important because when you add machines and when you remove machines, all those details need to go away because if you use the same host name for something else, it's not going to work in Puppet. Again, you can write any application around it. It doesn't really matter. This is an example of an even more complex host TV entry. This is the tag that define what the host does. This is the physical location that it's in rack number 191. The NIC card is connected to the 24th port. All those details are there. So you can always get to the machine if you want. DNS does the same thing. It updates the changes and basically creates DNS entries and CNames in the same manner. Polls the host TV and does everything. Same thing. It also comes with an HTTP API. You can write code that monitors how your website is doing and based on your performance, you can choose to destroy machines or you can choose to create machines because if you add machines to a particular role, when the machine comes up, it does the Flipkartify thing and then it runs Puppet and when it's added to the particular Puppet class, all the details of what the machine is supposed to do is present in Puppet. So when it comes up, it comes up ready to serve. That's it. That's my talk. Now you can have questions. Pardon? Pardon? It's not a very, the Flipkartify script he's asking if it is available. It's not a very complex script. It's basically what it does is when the machine comes up, it adds a SSH key so that even if the machine doesn't do what it is supposed to do, it can still access it using a key that is available to ops. So it adds a key. Then what it does is it installs Puppet. All the juice, all the good stuff is in Puppet. So what you do in Puppet is you define your pseudo file. You define your roles that this is a web server. These are the 15 packages that it needs to have and since we are building on existing platform that Flipkart has, we have our own app repository. All the packages that we use in production are Dev packages. So to deploy in production, all you need to do is app get installed. That is what it does. So the role defines these are the packages you need. Then you go and run Puppet, it just installs everything. You're ready to go. Yes, so that is the front end. So that is here. So this guy right, that is the front end. This guy makes that decision. This guy. So the name node, its only responsibility of the name node is to know how to interact with these other machines and that's it. It doesn't care, it doesn't care if we can host 10 machines or 15 machines. It knows. So basically this front end queries this guy says how many machines you got. This guy replies I got this many and that information also we keep here. So this guy asks how many you got, how many you got, how many you got. So this guy will always know I have a headroom here. I have a headroom here. Yes. So what happens is there is something called the open VC web panel. Okay. So it's a software written in Ruby, which is free software. You can just download it and install it. I have a dip for it and install it on all the machines. And what it does is it has an API key. So you give an API key. This is also through Puppet. All the machines come up. They generate an API key for them and you just connect it to this guy. So this guy can talk to any of the machines. Yeah, yeah. So open VC doesn't have any limitations on what you can or cannot have. So if you can see here, we have an option of you have an option of so you can have Debian 64, Debian 32, you can have CentOS, Fedora, whatever you want. As long as you keep the users there, you can use them. Yes. It's in every physical needs to happen. Pardon? Four men. Four men. I have never used it. Does it, it acts like a database for Puppet instances. Then what? Pardon? It's based on files. So basically it's an implementation of SVN. You saw in hostdb, you can, it's based on SVN. You can basically go back a version. So it will always keep all the versions. You can do query from revisions also. So it will always query trunk or something. Or you can actually, oh you get the new servers because you just say, give me VMs whose status is planned. So it just gives you the new ones. Or actually what we do is we basically generate the entire file all over again. We just generate the entire file all over again. You know, we just generate the entire thing.