 Thanks for coming to this talk. Just before we start, I'd like a show of hands. How many of you have plans or are already running Kubernetes on public clouds? Oh, great. How many of you would like to save on OPEX costs? OK. So I think you've come to the right talk. So my name is Bik Ali, and here's Arun with me. We're from Platform 9. And today, we want to discuss ways to run Kubernetes clusters on public clouds more cheaply. And to motivate our talk, let me just give you a quick introduction to what we do at Platform 9. We are in the business of running open source software as a service for our enterprise customers. And we specialize in infrastructure management. And this means we offer Kubernetes, OpenStack, and Fission as an integrated product. And what makes us unique is we run the control plane separately from the data plane so that our customers can run the data plane on the infrastructure of their choice, including bare metal or public cloud instances. And then we host the control plane, and we run it as a service. And we're currently in the process of migrating all of our control plane services from VMs onto containers running on Kubernetes, running on the public cloud. And just like any sane business, we are always exploring ways to be super cost efficient without sacrificing our quality of service. So in our journey, what we've learned and what we've been exploring is what we've discovered is one of the best ways to reduce costs is to take advantage of spot instances on AWS or preemptible instances on Google Cloud. They're kind of equivalent. And most of you familiar with the concept of spot instances, we can do a quick review. So spot instances are essentially the same virtual machine instances that you can get regularly. They have the same types, except that they are considerably cheaper from a per hour cost. And our observations have shown that, on average, they can be 60% to 80% lower cost than the regular on demand instances. So the potential savings are huge. However, they come with a drawback. There's a catch to them. And so the issue is spot instances or preemptible can be terminated at any time for reasons that are beyond your control. So for example, on AWS, if the real time price exceeds your bid price, then you can lose the instance. On Google Cloud, they can get terminated for a variety of reasons. But even if nothing happens, after 24 hours, you're guaranteed to lose the instance by policy. So what this means is to really take advantage of spot instances, historically, if you're an application developer, you really need to understand the trade-off between availability and price and kind of match your application to this trade-off. And what this means is, typically, people have been running specialized workloads on spot and have required special tooling or scripting to handle node failure. So the good news is, fast forward to today, we have Kubernetes. And we think that Kubernetes is going to make spot instances mainstream. And why is that? Kubernetes can really hide the complexity and the details of running spot instances because it is really built to handle node failure. Under the scenes, it will, for example, leverage public cloud resources like autoscaling groups that will automatically spin up a new instance if one dies. And at the pod level, if you use the correct constructs, like deployment or replica set, if your pod resides on one of those nodes and disappears, Kubernetes will reschedule another pod somewhere else for you. And Kubernetes has really taught us to design applications for failure, to be resilient with constructs like replica sets and services, which can distribute requests across a pool of pods so that if you lose a bunch of pods once in a while, no big deal. Your application continues to run. So we think that Kubernetes and spot instances are really a marriage made in heaven. But the devil's in the details. And today, Arun is going to walk us through all the details about how to understand spot instances and how to best take advantage of them within the context of Kubernetes. Thanks, Vik. So let's dig deeper and jump in to see how to save money on your bills. I wanted to look at this blog that Jeff Barr from AWS wrote in 2015. And he spoke about how you choose instances or how you group them into pools. So he spoke about a concept of capacity pool. So this capacity pool is a logical encapsulation of instances that are both standard or on-demand instances and preemptible instances or spot instances. And these instances are pretty much of the same type, so T2 medium. They are on the same availability zone, and they would belong to the same region. And these capacity pools are attributed to be able to create a instance or launch an instance at any given time for a given price. So what we need to take into account when we choose such capacity pools for our cluster are some of these best practices. So one is when we build our applications, we can make them price aware. So if our applications are built in a way that they don't, they are not attached to a particular instance type, we can choose a specific instance which is of lower cost for running that application. The other important thing is about checking the history of the instances. So this can be either manual or it can be automated. So it's always nice to go back to history, figure out what the costs are, and then think about which type or which capacity pool would be best to use. One main important thing is we need to make sure that when we deploy a cluster and choose capacity pools, we choose multiple of them. We compose them of both preemptible instances and spot instances so that our application always runs and our service never goes down. Let's quickly take a look at what Big mentioned with respect to spot instances in Amazon. If you look at the graph, it's an M1 medium over three months. I took it out very recently. And you can see that the spot price is always about 80% less than the on-demand price. On-demand is at around 0.09. Spot price is less than 0.02. If I blow that up, the disadvantage though is that one time or another, this spot instance price or the bid price goes above on-demand price. And that point in time, it doesn't make too much sense to be running things on spot instances. You could switch over to on-demand instances. And in EC2, you add a bid price to a spot instance. And normally, if the price goes above that bid price, the instance is terminated. And since Kubernetes handles node failure, Kubernetes kind of works best with such instances. In Google Cloud, it's a little different. It doesn't run on the surplus capacity market. They give you a flat-out discount on the price. The only catch is that the instance is always terminated somewhere around 24 hours or within 24 hours. The good thing is, if it gets terminated within 10 minutes, I think Google does not charge you. So that's the other thing. Let's look at what type of applications would be best suited for such infrastructure. So not all applications work well with spot instances. So one, the most common use case is bursting applications. So depending on season, depending on a special occasion, you do have a bunch of traffic. So in that traffic can always be offloaded to spot instances because it goes away and it's definitely. Another use case is the HPC industry, where the most amount of work you do is number crunching. And if these processors or your applications are more or less stateless, you can always run them on preemptible instances or spot instances. And it's not too much of a pain if an instance goes down to restart from where the pod left. Another use case is highly available cluster apps. So there are still people writing applications in a way that there is one which runs active. And there's another copy of the app running standby waiting to take over in case of the active app going down. So in those cases, again, the standby app would just be syncing. And if it's not participating in an active, active configuration, it can be offloaded to a spot instance and pretty much take lesser of our money than otherwise. Another is the node auto scaling. So the first one is with elastic bursting is you increase or dynamically expand your app. And with expanding app, you will probably need to expand your infrastructure as well. And horizontal pod auto scaling and horizontal node auto scaling and Kubernetes pretty much gives you a very good mix to solve this use case. And it's great to work on spot instances. Having said that, let's look at what we can do to deploy clusters with either spot instances or preemptible instances. We'll look at GKE and we'll also look at AWS with respect to kops. So in GKE, the instance groups are also called node pools. So given a node pool, you can specify a type of instance, whether that node pool is going to be preemptible or not. So a catch is that within a node pool, you cannot have a preemptible instance and a fixed instance. It's either or the other. So the best way to build your app is to have two node pools, one fixed and one preemptible. To start off with, you can always have a fixed node pool. And then as and when capacity grows, you can add node pools dynamically to the cluster. You can also have node pools with zero nodes in it so that auto scaling will kick in when your application is needing the capacity. This was one of the examples of an app that I had on GKE where the way I deployed the cluster is created two pools. One is the fixed pool with preemptible nodes disabled and the other one is enabling preemptible nodes. And if you go with kops, kops has various backends. You can deploy clusters across multiple cloud providers with kops. I chose this since it's an open source tool. Let's see how you deploy a cluster using kops and using spot instances. So how kops does it is, for kops, the instance group is the binding. And instance group pretty much consists of instances of the same type. And instance group can either be spot or preemptible or the fixed pool. And when you create a cluster with kops, it creates multiple instance groups, one for the master node, master components, and one for the nodes are the workers. And these are more or less backed by auto scaling groups. So if a node gets terminated within that node pool or an instance group, it gets recreated. How many of you have used kops here? Show of hands. And have you used kops with spot instances by any chance? Perfect. So we'll see how to do that in a bit. This is how it looks when you deploy a cluster with kops on AWS. You get an auto scaling group for a master, and you get an auto scaling group for the worker. In our example, we'll create another instance pool or an instance group that is a worker type node, and it has spot instances. kop.yaml file defines the role, and it's defined by either a master or a node. And the only difference between an instance group which is supporting spot instances and an instance group that does not support spot instance or is a fixed instance pool is the max price, what you can see there. So as soon as you add this max price key, the instance pool that is defined by this yaml becomes a spot instance pool. With kops, you can add new instance groups. You can also edit instance groups. So one difference between Google GKE and AWS is GKE, if you create a node pool with preemptible nodes, you cannot change it. You'd need to create a new node pool. With kops, with AWS, you can switch a instance group from preemptible to fixed or fixed to preemptible and so on. So let's quickly take a look at creating a cluster using kops and spot instances. So what you pretty much do is first you run AWS Configure onto the node and specify the access key, secret key, the region name, and the format that you want all your commands to output in. Once that's done, you can run kop commands. So we just run a kops get cluster. I have already a cluster deployed because cluster department takes time. We'll look at that in a bit. How do you create a cluster? You run the kops create cluster command, give it a zone and a name. It goes through a bunch of tasks. It assigns a CIDR for that cluster. And then it creates a cluster object with all of the resources locally. So it's still not created on AWS. You can go ahead and look at it and manage and modify the configuration. So let's quickly take a look at the instance groups that the cluster create went through. So there is a master instance group and a node instance group. Let's quickly change the node instance group to just have one node in it because we'll create another pool for the spot instances. That's done. Let's now create a new instance group using the create IG command. And the only difference to do this is to specify a max price. Let's say something like 0.08. That's pretty high up there for the T2 medium instance. And that's done. So let's now take a look at the instance groups that we've created. So there is a spot pool that we created. There's a nodes that it created and a master group. We update the cluster with this change of configuration. It's still, all of this is still happening locally. So it's just running through a check to see if all of the configuration that you've done is correct. And as soon as you specify a dash dash yes to the update command, it starts deploying things into AWS. So let's quickly look at the AWS console. So this is the cluster that I deployed earlier. It will take a bit and you should see new ones coming up here. But if you look at the spot pool for the launch configuration, that is responsible to bring up instances in AWS. It has a spot price set and the others don't. So that's the main difference of creating instances or groups in AWS which supports spot pricing. I think it should have created the other ones. There you go. It created another one with the 0.8 that we just ran. So it takes about five to six minutes to eight minutes and the cluster is up. Then you should be able to deploy apps on it. We'll not wait for it. We'll go with another demo. So this is on GKE. We saw AWS. Let's take a look at GKE. This demo pretty much shows you how I have multiple pools, a fixed pool and a preemptible pool. And I deploy app and I create load on it. So it runs a horizontal par order scaler. It creates more instances or more nodes to support it but those nodes are created in the preemptible pool and not the fixed pool. And then as soon as the load goes on it again shrinks. So this is a cluster that I have in GKE. Initially I had three nodes of fixed pool with just one node in it and preemptible pretty much disabled. I have another pool with preemptible enabled and it can be auto-scaled up to three nodes. And I have another one that is just there. So let's quickly go and deploy a PHP Apache server that's going to serve out traffic for us. And we'll also create a busybox or a normal instance where we can start generating traffic for it. Before that I'm gonna create an auto-scaler group and I say that if the load goes above 75%, the horizontal par order scaling should trigger and it can trigger to a maximum replicas of eight. So the load is now running. If you look at the bottom right, you can see that the load now is running and the number of pods that are running for this is just one but the load increased to 223. So the horizontal par order scaler created multiple instances but you can see that one of the pod is now in pending state. So that pod is in pending state because it has already run out of resources. The cube cut will describe pod logs is going to show us that it didn't happen because of insufficient CPU. The GKE will now automatically trigger a node auto-scaler and it creates a new node that from the preemptible pool and it's getting authorized as part of our Kubernetes cluster. So we had one earlier and it is two now. As soon as it gets ready you'll see that the pods now start creating again and the load should slowly reduce on our app. The load is still there, so it's going to try more creating, create more instances till it reaches eight replicas and since we had an auto-scaler on the node pool set to maximum of three nodes now we have three preemptible nodes there and everything is running. If you look at the target now is about 63% or the load on the app is 63% so it's fine. Let's quit the load. I just quit the load there and killed my load generator. You can see that the deployment, the auto-scaler deployment target will now fall to zero quickly and as soon as it falls to zero you can see that all of the app's pods are being terminated and once the app pods are gone the nodes from the node pool also will get cleaned up by GKE. So you can see that one went away, two went away and now we are again back to three nodes in our original cluster. So we expanded the cluster for our app during bursted load and we brought them back down. It takes about five to six minutes for GKE to clean up nodes that are not used so that's the reason for the demo. Now we saw how you design a cluster to use spot instances or auto-scaling. Now we'll look at how you design an application to use spot instances or preemptible instances because you don't want the application to die completely and not have your service pretty much going down, right? So some of the application considerations that are required are, is your application stateless or is your application stateful? If your application is stateless it is better to, it is a very good match for such a use case. What about the application replica distribution of your application? So if you have six parts that is backing your application and all those six parts running on the same preemptible pool may not be a good idea. Definitely not a good idea because it goes down. What happens when a node fails? Kubernetes automatically reschedules those parts to different nodes or instances and if Kubernetes reschedules them to the preemptible pool again, you have a problem that all of the nodes might end up in the preemptible pool, which is also a problem. So there are also cases where applications require specific GPU processing, for example, machine learning or big data analysis. So there again, preemptible pools might be present in your cluster but you might not want your apps to go and sit in the preemptible pool. So to do this, Kubernetes kind of has a bunch of mechanisms. I'm sure most of you would have used it. If not, I think you should use it. One of it is the node selector. So nodes come with labels or you can apply labels to nodes and you could use that as a node selector when you deploy your application to save your parts, go to that particular node or not. Now, let's say for example, you need to deploy an application on this cluster with preemptible instances and fixed instances. So one way of deploying this or one example is I have two deployment specs of my same app, just an engine X server. I deploy one on the preemptible pool by specifying the GKE preemptible true and I deploy the other part of my apps or it's the same app, but another deployment onto the fixed pool using the node selector using the name of the node pool. So GKE, when you create node pools, it adds labels to those node pools and for instances that are preemptible, they come with a label inbuilt and that's the GKE dash preemptible equal to true label. So what are the supporting mechanisms here? So let's look at another example, right? The HA app that we were talking about. So let's say we have two node pools. We have a bunch of apps running on them. We want to deploy this HA app and we have one copy or one deployment already deployed to the fixed pool and now we want to deploy the other one, passive one, standby one to the preemptible node pool. So the way you can do that is by using a node affinity and the type of node affinity that you would use is something called the required during scheduling, ignored during exception. Basically what it says is, deploy this app only to the preemptible node pool and not the fixed node pool and vice versa. So if there's another app that I would like to run on my preemptible node pool but I'm okay if my preemptible node pool is full, I would use the preferred during scheduling, ignoring during exception type for the node affinity. If you want to make sure that your apps are on different AZs for high availability, there is another label that you could use which is the failure domain label with the slash zone that tells you which zone your node is running on so you could have affinity or anti affinity for your app for that particular zone. Let's quickly take a look at the node selector or the application availability. It's basically how you deploy an app to these clusters. So what I do is I have my application like the way you saw it. I have two deployment specs, the fixed deployment spec and the preemptible deployment flex. It's just going to deploy the Nginx server and the main difference that each of the spec have is that one is node selector to the fixed pool and the preemptible one is node selector to the preemptible pool. I just have an init container there which adds this echo statement to the index HTML which tells us which part is going to be replying that's for us to figure out in the demo. There is a service that's going to front both of this. So there is one service but two deployments. There are two Nginx deployments, one on preemptible, one on fixed but there is just one service that fronts all of this. So that's how you can make use of the service but not worry about where it's running. I also have a small busy box that generates load from where I can generate load or curl. So let's quickly run into deploy this. So okay, there is nothing. So let's deploy all of them. So all of them are running. So I have Nginx, one Nginx server running on the fixed pool. I have two of the other instances running on two different preemptible pools. Let's look at the service and let's go in and start running some load on it. Okay, and let's do a while true. Let's do a W get and the cluster IP. I want VPN so 1055, 248, 133. We always need to sleep there. Yeah, Mr. Roo. Okay, oops, that was a quite, not a P. Okay, so we are getting our responses from both parts running on preemptible and fixed. Let's go and terminate our preemptible pool. Let's assume that the preemptible pool kind of dies. Let's go to the GCE and let's terminate the flexible instances. So you can see that it takes a bunch of time for the service to reconfigure for parts going down, but then you have the part on the fixed instance running and responding. And I think, live demo. But yeah, so as soon as the preemptible instances come back up, we will start getting responses from the preemptible pool as well. I think I might have deployed the curler on the preemptible pool as well. That might be the reason I might have lost this. That's fine. So that's the demo. Basically you use node selector and affinity, node affinity to deploy apps in the right place. We'll quickly go back. I had an interesting experiment that we ran. Just had a two node cluster, one basically a split 50-50%, one on preemptive and one on fixed, ran on GKE for 12 days, no active workloads. And one observation that we saw was the preemptible node goes away somewhere around 24 hours in the median time. Almost sometimes it's a little more than 24 hours, otherwise it's normally around 24 hours. And the most important thing, most the reason why you're here is the costs. So if you look at the total cost or the bill that I got for the 12 days for the two instances is $16. And if you replace this preemptible custom instance core with a standard instance, the total would be around $24. So cost saving is about $8 for 12 days just for two nodes and no workloads. So if you just do a mental experiment and scale it up, the cost saving somewhat looks like around $13,000 for a year, for 15 nodes on fixed and 15 nodes on preemptible. So that's the interesting figure that we are seeing. And it'll definitely increase as you increase the number of nodes or reduce your fixed node prices. That's about it that I had. So hopefully one key takeaway that I think you could take away from this is using spot instances is beneficial. It would reduce your costs, but there are two things that you need to take care of. One is to architect the cluster and the second is to architect your app. And doing this, both of this in a good way can get you quite a bit of savings in terms of bill. And Kubernetes is the best for this approach and that was it for me. Thanks. We can take maybe one or two questions. So a great presentation, thank you. Definitely an introduction into it. I agree with you. I think Kubernetes really gives us a chance to take care of these spot and interruptible instances. It gets really complicated on AWS and things keep changing. So like they just recently changed their pricing structure. They even went from hour to per second. And every time we have a strategy put together it seems like it changes dynamically. So that's actually what motivated me to try to encapsulate this into a service so that people that don't have a team of 12 people to go off and do that can still leverage and get benefits from these. But the key points is to make sure your application can survive. And one other point I think you missed is that even though pods recover you still don't want them going down a lot. So at least for the spot instances you can actually detect that and tell them to start draining and try to move things off there. So that's another important piece. And the last one is diversify across multiple instance types. Just like a stock plan. But interesting. Thanks for the comments. So I saw you used cops, which is cool. I'm not using cops. So are there any other tools to use spot or preemptible instances with a service I can run on Kubernetes to monitor that kind of stuff? Or I don't know. Good question. So cops is pretty popular out there. So I use that. And it had spot instance support. Platform line, we do spot instances. I would want to say QBDM does it. But I am not sure that that is yet. If anybody with QBDM knowledge could talk about it. But yeah, cops is one I know that does spot instances. OK, that's all we have time for. Feel free to talk to us on the side. Thanks.