 Hey, so this talk I think is going to be on a private cloud, our journey to private cloud, which covers all our decisions, the goals that we need for private cloud. So it's mostly going to be a philosophy behind it, why did we decide we need to have a private cloud and what are the technical challenges behind it and how did we go about them and how did we achieve and where are we so far in the journey. So about myself, I work as lead DevOps engineer at media.net, media.net is an ad tech company. We operate from Mumbai and Bangalore. So how many of yous here operate from a bad middle data center? So probably most of the stock is going to cover things on a bad middle data center, but we also have philosophies that is needed in a public cloud. So other than that, there are few takeaway points you could take here and there, what could be a philosophy that you could deploy in a public cloud as well. There are certain points where you could probably think these are all the things that you take from a public cloud for granted and you could actually give a thought about the high availability around when the public cloud is offering to people just out of box. So as a company, we started sometime between 2008-2009 and our product started coming out around 2010. So initially it's all like we operated from a single bad middle DC and then we scaled out to multi DC. So on a single bad middle DC when we started operating, most of the hardware that we got are kind of like a pretty uniform hardware like 32Goes, 32GB RAM, 64Goes, 32GB RAM, you just get them, put the monolithic code on over it. So the whole dynamic application survey is a single monolithic PHP code which goes to all your dock rules, all the directories and you have 32 of such web servers. And sometimes the 32Goes box might not be used up with just a dynamic serving app. So people try to put unrelated things together. So there are multiple monolithic things running from a single server or a bunch of servers. This is how it started working and when the company, so this is how when the company was around like 100 people, things are all fine. Just deploy your monolith, scale it up, buy new hardware whenever you want and things started working fine. And then in 2012 the DevOps team first started, we saw Linux containers coming in and with the next containers we decided because earlier there was infrastructure we were sharing for the DevOps stuff with the production infrastructure team. Then we started, we got our own boxes to run config management like Puppet, PuppetDB, Nagios and everything separate from the production infrastructure. And we also wanted to have separate environments for each of these applications, like upgrading Ruby for one of the things that Puppet should not affect the other part, other applications. So we decided to go with the next containers where you actually have multiple environments for each of the DevOps stuff. So this is where the DevOps team started moving to containers, all the apps started running as independent apps. Though it's on a single bar meter there are isolated environments they started having. Then as the scaling demand increased though the bar meter data center scaling is always difficult, like it involves, it's not a single day transition that you can do. So when the company exponentially progressed and you are supposed to do the elasticity, you're supposed to have the elasticity in the company. So we decided to go to adapt to a public cloud, public cloud as well. So we just operated, we are still operating in a hybrid environment, we operate from a bunch of our data centers managed by us, co-located data centers and AWS at multiple units. So once we started operating from cloud, so the migration to cloud, usually I think how many of you in cloud is calling boxes exactly to the requirement of your app. Say you have one app which needs four cores or eight cores from C4 or 2XL, C4 or XL instance, how many of you just call exactly to that need, like you guys figure out and do it. So it involves some time actually for you guys to come up with that number that this requirement or this instance type will fit for me. So since we were always operating from monolithic code, we decided this 32 core instances is working fine for us. So we will spawn a closely matching similar instance in AWS and just move our monoliths there as well. As long as you make sure your calls to DBs and stuff, everything, all transactions are all properly saved by putting a lot of processes in the DB part, we just didn't change the application architecture itself, we moved. Things are working fine, there was no downtime, we are scaling, we go with the elasticity as such. But you understand right, you have actually thrown in money to solve the problem, you haven't solved it efficiently and this is something that was bothering the DevOps team continuously. Then the DevOps team actually, so it's usually the problem comes up because there are legacy thing that is left in the team and most of the startups don't have the legacy burden that they had to carry up, here we have certain legacy stuff that's already written. So in any company which has a legacy burden, there are also newer teams that come in which adds more intelligence to the system which adds more features to the system. So all those new teams started going to the microservice infrastructure. So the first point we made is all the newer projects that's coming up which is finding a spam and clicks or something of that sort or whatever a new project or new feature that comes up, that would automatically use a microservice infrastructure. So there we started microservice infrastructure. So I'm not going to say microservice is just solution to all, there are certain apps which does not fit to microservices and things which we will see the philosophy as we go further. Now the bar meta data center, so this microservice structure we started in both public cloud as well as in the private cloud. So still the legacy infrastructure says as it is, it still runs monolithic. So we started building up our own private cloud infrastructure, first we started at one of our bar meta DC where we are going to, we have a plan to slowly move all these monolithic things, whichever can be moved to microservices. So this is the path so far we are taking and I'll try to cover why all these design philosophies are decisions are being taken. So why private cloud actually because it's not going to be by public cloud because everybody knows why public cloud is. So why private cloud? So the first thing I said like if people have a monolithic infrastructure and they have a 32 code and 64 GB RAM. And if that monolithic code is not going to run in a 32 code, for example, even at its 99 percentile is still just operating out of 18 codes. People might think we could also, so in a typical bar meta data center, it's sometimes only doing after a traffic planning the orders for the servers will be placed and still it takes at least a month time to bring all those servers to your DCs. And then there is a time involved in racking and stuff which for people in public cloud apparently has not facing these days. So in those cases, the apps which have to be quickly deployed, they figure out which are our boxes which are underutilized and they put that service on it. For example, the static and dynamic content serving happens on a single box, like a bunch of boxes. They are completely decoupled things. So what is the problem in having two decoupled services running from same box? The problem that you have is you have no point of ownership. So if you go to migrate or if you go to upgrade your infrastructure, you need to talk to a bunch of devs. It's not going to be a single point of contact that you have because there are multiple services running on your thing and upgrading a single package in your system, a config management can go and upgrade but it can still go and break some other team because there's no transparency in which team depends on which services instead of you could make a tailor made service and give them that infrastructure that completely isolates them from other applications. So that is one of the advantages you get when you go to a cloud or a microservice. Efficient utilization of resources. So it's again a causal effect of this. If you want to keep decoupled on a bare metal DC with bare metal systems without using any of these virtualization stuff, then your resources are going to be underutilized. There is no way you could make sure that resources are 100% utilized and the other one is quicker provisioning. So what I mean by quicker provisioning is like the delay that comes in racking and procurement and racking of servers. Cloud does not give you elasticity as such in a private sense. A public cloud gives you elasticity because there are boxes lying around already which are purchased and they run it as spot, whichever is lying idle and then they move it to on-demand and stuff. In a private cloud, still your capital has to be done because magically your box won't appear and again in a private cloud, the only advantage you could have is on a single click your containers or VMs just come up. But still your capital has to be done because buying a bunch of servers and keeping them idle and then powering on only when needed doesn't make sense economically. So the more the green are where we are right now and the more the white shades we are still to cover them up. So right now a percent of DC still operates from bare metal servers without any virtualization and it has multiple unrelated monolithic code running on them. We migrated to, we started with our private cloud which from next slide onwards I am going to cover. So that helped us in having a decoupled monolithic services but still they are running out of KVM as decoupled services and this microservices which the other team was taking care of which the newer teams I didn't know in the microservices infrastructure is using microservices. So to make it completely microservices we need to build a lot of support infra around the infrastructure like collecting logs and so on. So you have a public, on a public cloud you can put it as three or something of that sort. Here we are building our own object storage and so that's one of the challenges microservices need a bunch of supporting infra. So we are building all those supporting infra which a microservice would need and so the newer teams are microservices and legacy teams are pretty much mostly on decoupled microservices and when the support infra is in place and when these teams who has a business sense to invest and refactor the code and then go on to microservices and jump on to microservices and a bunch of event based systems which gets triggered like a main system which gets triggered on an event or an SMS system which gets triggered on an event. So those things instead of putting a full of boxes and then use a function as a service like lambda. So that is something still in a POC stage we are evaluating a bunch of ways to call a function as a service. And so in a public cloud we will continue to use lambda on a private cloud we will have function handler to handle events. So the stack selection so we evaluated open stack and cloud stack for our private cloud. So it's a very preliminary parameter I'm saying so open stack as such is a medium to difficult setup in software. It has a very good community support and a very good modular code base. Cloud stack is since it's a single piece of software which comes up on its own and which is easier to setup and get your first VM on a cloud very fast. It's single it's very monolithic so you don't have much flexibility or modular support like open stack is and the community support is moderate it's still buggy and stuff. So initially the team started out with cloud stack in our case and we started working with cloud stack itself because it's going to be transient in fry if you see on the pipeline. Like very few of the services will reside here and most of them will go to measles or cuban it is. So we went with cloud stack so there are certain challenges which we had to come up with work around since there is no modular support available as such it worked with open stack out of box. I'll just take you guys through that as well now. So this is sort of goal this is a pretty much a simpler goal anybody would see you have VMs to take care of a dynamic content. The CDNs will call your static contents with a varnish or something which will take care of it so static and it doesn't have to be anywhere in storage. They could be somewhere in object store and you could just fetch and give them to a CDNs the dynamic contents will be taken up. So this storage has to be decoupled in a root file system everything will be decoupled which so all this is in Amazon already this is your EBS this is your S3 that is your EC2. And you need to have a tagging system so that inventory system where you could just figure out how much of cost this team is using how much of instances this team is running and stuff. Okay so three key components we are trying to bring it to our private cloud are high available block storage and object storage and a proper inventory system. So still object storage is in our pipeline we haven't successfully done it. We have come up with a block storage so why a why elastic block storage is more important in a cloud environment. So think of this when you guys all know that there is an AWS instance store right where you just when your instance goes down your storage also goes down. Like it's it's it's it's completely current. So HA storage which is decoupled from your compute actually gives you a gives you a flexibility like you can migrate instance from migrate your VM from one host to the other host and it happens immediately because your storage is over network your storage is not locally on a box so you don't have to copy contents from the local disk of this box to the other box. So migrations are easy it's flexible but it has its own impact. So if a network to the high the storage goes down the whole DC is down it's equivalent to a whole DC catching fire or a whole DC is under fled or a whole DC has no internet access. So a storage is a single point of failure if you don't make it properly. So there's a lot of design considerations we took on it. So I just got to them fast because I'm out of time. So the options that we had for block storage are a bunch of these I'm going through each of them. So a network file system a network file system is where you have a big disk lying on some box and you expose it as NFS and all your VM volume. All your posts via VMs are going to run and all your VM volumes are just a Q curve file and it works on its own. Unless you use a proprietary infrastructure like NetApp or something this NFS has its own issues like what if the NFS box goes down you will have. So your idea is you will have another NFS box then you have to have a DRBD or something which replicates from this NFS box all the contents of the NFS box. Even then the IP of the NFS has to float from this box to the other box when a failure happens. So the floating IP takes time you have a Gratitude SR there is a time. So if you have a small delay in reaching a disk the current panics can start happening even though if we discuss it. A shared one point is again something similar instead of NFS you run a GFS or something you have a disk exposed where I discuss you in all boxes and they have a cluster file system running over there. So all the problems are listed in NFS stays here as well. Plus it has the locks in GFS are kind of I find them personally is more difficult to debug but it again is an opinionator decision for myself. So this is one of the software CEP which we are completely happy with but we ran into some issues but still we are investing our time on it. I think this would definitely be a good open source replacement for a distributed block storage. So CEP is actually an object store like S3 but they have a libredox kernel module which helps you to make it work as a block store as well. So it works from multiple boxes it has a monitor and which monitors them for the health checks and based on that it updates a map. So each file is distributed to multiple chunks and each chunk is part of some placement groups and the placement groups are distributed randomly across multiple OSD. So if one OSD goes down the other OSD can respond to a placement group. So this is actually a better high available block storage that you can think of and it also works from normal boxes. You could put your jbots and make this up and running. So we see a degradation in performance whenever there is a failover and the placement groups are shuffled which is common which happens even when you re-ban in the Hadoop cluster which happens when you re-ban as any of this. But during this time we see some kernel panics happening on the medium so we haven't gone production with this but this is something which we are still putting time to invest on. We went with HP3PAR so that's a very cool project so don't see the image of it. So going back to what HP3PAR is, I'm not endorsing HP3PAR as a product. HP3PAR is one of those hardware property systems like NetApp is. We got it because we acquired one of the companies and they have HP3PAR with them and we moved it to our data center where we were testing all these stuff. So what HP3PAR does is it exposes your volume as high as the endpoints and I can actually give it to, I can carve LVMs to people and give LVMs to each VM. So what we do is we expose the 48 TV initially to start with we started with 48 TV, 48 TV bigger volume from HP3PAR. We exposed it to all our host machines and the host machines run a cluster LVM thing which creates LVM for these boxes and then they, whenever a VM asks for this it creates an LVM and then gives it to the VM. So there are a bunch of things, the switch via which the host contacts HP3PAR is made HA with Cisco Nexus technology so that you know even if one switch goes down there is always another path with which you can reach. And each host connects to HP3PAR where multiple network paths we have eight network paths so that if, so usually it's round problem but if one network path goes down they will try, they will use the other network path. So you have a highly available storage system so I try to mean that with that approach. So CLVM is a red hat tool which supports carving LVM because when you carve LVM from multiple boxes you have to actually make sure the metadata is not corrupt, if the metadata is corrupt the whole LVM is corrupt, if the whole LVM is corrupt you just lose out everything. So we started with CLVM, CLVM uses Corosink and DLM but more DLM is distributed lock manager. So this has an unsteady lock state like it's very difficult for us to figure out which box actually took the lock whenever the cluster completely becomes not possible to create new disks because there is a split brain and since most of the locks are kernel level you have to do a reboot thing. And even after reboot there is no guarantee that the cluster comes back on the same state. So the reboot chaos and thing, it became a little bit chaotic in infrastructure once you put a cloud you ask people to move and then after sometime the LVM infrastructure is not stable. The VMs will still be running because your disks are already created but you can't create new VMs. So you say we are on maintenance fixing things we will get back to you. So if the maintenance mail comes everyday then people will just lose the infrastructure. So it made us to think some way to fix this issue. So we started writing our own LVM lock tool over the CLVM thing because CLVM uses the mesh topology where a single instance usually all the instances before getting locks talks to all other instances and then they get the lock. So in the instance we created kind of three persons who were supposed to issue tokens to the machine and whichever machine which got the token successfully will write LVM and other machines which are supposed to wait. So this latency is only when the instance is spot after that there is nothing no latency you guys will see on it. So what we did here is we have a reticent point whenever LVM create call has happened or LVM remove anything of those weather data calls have happened. The host machines where it is going to run will try to write in a simple channel that assume three reticence instances. It tries to get the lock from all the three reticence instances which is basically writing to a key successfully with their PID and the box name. So if they write successfully cool they get the lock if they don't write successfully that means somebody has already got the lock. So if they get more than 50% of the lock then there is no other person who effectively has the right to write to the LVM only this host machine is effective to write to the LVM. So they start writing to the LVM and once they are done they will delete all these locks. So the only thing is we made sure there is no concurrent access due to split bin or something of that sort because it's stress tested and we gave more preference to remain in a deadlock state than corrupting the metadata because at any chance we can't have a corrupted metadata. So in a deadlock right now it's not automated healing we are doing we are just checking if any instance has crashed after taking a lock then we give up the lock ourselves after we think it's safe to delete the lock. Or if some of those commands went into an uninterruptible sleep state like D state then we can't do much we have to reboot the box like the same CLVM thing. But it's better to manage than a kernel level lock where we have very less control on and a mesh topology which tries to work on its own. So last few slides I will just wrap them fast so networking so networking is storage is over network we know it the VM should have their own data network as well. So we are not a multi-connected infra so we don't want to have a GRE between our GRE tunnels between VM to VM communications we are just ourselves it's RDC. So we have a single L2 bridge which connects the actual interface to that bridge and whichever VMs comes they will also connect to the same bridge and they just make a DHCP call. And they get IP from the VLAN of the NGO our DHCP runs on the same VLAN where the NGO runs. So that's it we have a separate network interface, separate VLAN for the storage and a separate network infrastructure and separate VLANs for the actual data network. Now this applies to both the public and the private cloud we follow the same auditing and accounting policies where every team has to tag an instance that they spawn it with which team it belongs to who is the owner which product and a bunch of other tags. Based on that we actually figure out how much cost the team is incurring and if it is within the permissible limit each team has their own budget if it is within the permissible limit. If a team is not tagging properly because which happens if it's a if it's spawned by APIs it's usually machines do work usually better they spawn it with all necessary tags but if it is by human human do much tags. So what we have is we have this now we have this tagging system which people in our company dread are usually dreaded we just send out my till they fill tags we keep on sending when still they fill that. So at one point they actually so it's a good form they just fill those tags then if it's a public cloud we have public cloud AWS APIs we push it those tags and if it's a private cloud even all these clouds are going to occur as their APIs we push it all the instances get tagged all the volume stores get tagged and then we could do a cost analysis and we can audit. So I'm done so I'm going to do this the whole team which was peaceful with us all through the migration we had ups and downs and who have worked specifically for the private cloud as well as the container infrastructure the system operations teams take care of the procurement of hardware and marketing of them and everything as a network operations team for taking care of all the network related infrastructure and multiple switches and everything and yeah I stole all the images from the site somebody have any questions. This is on. Okay, so I was just trying to understand the rationale behind the CLVM and the CF deployments and so on. So let's say your VMs are inherently expendable then why you have storage. They're saying a stateless VMs probably. So yeah so once people move to contain a ring from which I said people are also doing stateless things so people are experimenting spot in AWS where storage goes everything goes but they're okay with it because the app is just a jiff put and do it but monolithic apps are actually heavy they have even images and stuff in it like we don't have still object storage part coming in so sinking 2 GB is easier than sinking 200 GB sometimes they are holding the jar that people deploy is huge because there are static contents on it. So that's why having a HA helps migration right now but when we reach the point where the whole most of the infra is in container stateless point that something means we are looking at when we move completely to container infrastructure we will go stateless all the newer teams are stateless and they use object storage for heavy contents. So when we also when all the legacy teams also started moving to it this won't be a bigger requirement for us that time we can just operate from our local point of view. Use the percentage of each area. The question is right so public cloud is already there right so is there a specific requirement for private cloud right because to match to their so is there a specific requirement need where you can look at a private cloud. So at certain point when so we are looking at so I don't have a real cost analysis of public and private cloud but at certain point I think when we have when we are checking at when we are beyond a certain scale I think a private cloud which can come out cheaper to you than a public cloud. So this is something which and we anyway have invested heavily on the private on our co located data centers we have all the bar metals lying around. So once we have Apple to Apple ready probably during next call I can give you exact update whether we actually reduce the price because right now we are aggressively looking on the cost comparison between both the clouds. But still apps are not taking the private completely once apps are completely moved to private or I can give up call. But our assumption is we can we can get a better deal if we just if we go with the private cloud and you have the scale to manage it. So it is like you should have enough number of site-level engineers and devops to throw at it if you don't have enough number of site-level engineers and devops team to throw at it then obviously it's a management overhead for the company as well. So we are we have a 60 member site-level editing and 20 members from the college joining up. So we have enough people to throw at this. So I think so it depends on the it's again depends on the scale that people work on it depends on the employee cost that is thrown at it. But I can I don't have an actual number because this is just we are just gone live. So once we once I have the exact number probably after three four months I could comment did we actually make savings. If we don't actually make savings then obviously we will stick to a public cloud. If we make savings then we need to take a call. This is the devops team. Employees. Everything the production app goes here or the whole product runs will run from here. So we make a billion of ad impressions every day. So all these ad impressions will go from there. What is your primary motive behind trying out this private cloud? So yeah the primary motive is efficient utilization of resources, the resources and ease flexibility in doing migrations. Right now we just took the migration from CentOS to Debian. There are other migrations that if you do you do certain migrations it becomes really difficult to decouple systems run together. You have to go to each people talk to them like this is what happens what happens. When everything is virtualized and having content on top of it you could the updating the host version you need nobody's permission. You can just send out mail like AWS. Hey this infra is going to reboot your instance. People could reboot the instance it gets carried to some other place and their infrastructure is updated. So that's one thing. The other thing is utilization of resources. Like I said there are boxes which are over which have a lot of course thinking it might need it but it is not actually used at all. I mean about what a box the box lies there with a lot of course. Instead if you can somehow create environments for multiple teams to share that resource and later on if they actually need a lot of course there should be a way to migrate other people who we provision there. So that is the whole point why we actually went to the private cloud.