 So this is scaling Kubernetes nodes without breaking the bank or your sanity. I'm Nick Tran, and this is Brandon over here. So we'll talk a little bit about what spot is exactly, and some of the best practices you can use for Spot. And then we'll go into Kubernetes and Spot, how they work well together, and then how to auto-scale your nodes with these Spot nodes. And then we'll hopefully go into a demo after that. So what is EC2 Spot? So EC2 Spot is spare virtual machine capacity. This is available at a steep discount from on-demand instances. And these instances are the same underlying instance. The only difference is that these are interruptible. And so what that means is that EC2 can claim these instances when they need to for an on-demand customer, and they'll give you a two-minute notice from which you can reclaim this instance. Sometimes EC2 can also give you a rebalance recommendation to give you some more time. And now we'll talk a little bit about that later. So some common workloads you can use with Spot are quick continuous integration. This could be something like Tecton. Or you could also do batch processing, stateless API, or dev workloads, basically something that's tolerant to the interruptions, and you can clean up quickly. And so some of these Spot best practices we'll talk about are Spot max price and how not to set it, flexible instance type requests and rebalance recommendations. And these are all going to maximize your instance runtime. Minimize the costs you have, and also maximize the chances that you can get an instance when you ask EC2 for one. So prior to EC2, sorry, prior to 2017, the Spot pricing model was a bidding system. And so what this meant was when you set your max price, this is what you would pay for the instance, and it would maximize the instance runtime based on how high you set your max price. And so sometimes you would see with this bidding system is that it would go over the on-demand price. And so after 2017, EC2 decided to change this and make it based off long-term supply and demand. And max price is what you're willing to pay. And so if you don't set max price, it would default to the on-demand price. And when you have the max price set, that opens you up to another condition from which EC2 can reclaim the instance for you. So most of the time, what you want to do is just to not set max price so that you can maximize your instance runtime. Another one you can do are flexible instance types. So EC2 has a lot of VM capacity. And it's important to know how these are segmented across all regions. And so this is the concept of capacity pools, which is a tuple of availability zones and instance types. And so you can see each of these orange squares right here would be a capacity pool. And if you have some capacity requirements or resource requirements for your instances, let's say you need at least a size 2XL, the on-demand price is, would say, $0.44 here per hour. You can actually request a 2XL and also a 4XL and 8XL on anything above, because the 8XL is still cheaper than the 2XL because of the spot discount. So capacity pools are the amount of unused instances in EC2. And so here we'll have these three capacity pools, which is the m6i.large and then the US East 2A to be in 2C availability zones. And so there is something you can do with EC2 when you ask for an instance, and that's using the capacity optimized allocation strategy. And when you ask EC2, given these capacity pools, it'll pick from the capacity pool that has the deepest amount or the most unused instances. And when you use this capacity optimized allocation strategy, that will really maximize your instance runtime because it'll pick from the pool that has the most instances, meaning that EC2 will be more or less likely to reclaim your instance because an on-demand customer needs it. You can see here that here are some instance types, and these are the frequency of interruptions. So it's important to note that these interruptions, if they do happen, are not going to happen all that often as well. So the one of the best practices you can also do is rebalance recommendations. And so this is just a warning that the interruption is coming, and at the worst, this can also come at the same time at the interruption notice. It will be vended through the same service, the instance metadata service that you'll see for the interruption notice. And so if you have workloads that can't be cleaned up in two minutes, this rebalance recommendation might be useful for you. And so something else, sorry, so going to the next thing. Kubernetes and Spot work super well together. And within Spot and your cluster, you might want to have these programmatically configured to watch for these interruption notifications. So within AWS, Brandon has actually worked on the AWS node termination handler. And you can configure this within your clusters to work and watch for interruption notifications in rebalance recommendations. So when you do this, there's also an important Kubernetes concept called the pod disruption budgets. And these are used mostly to ensure that a certain amount of replicas on your pods are running at one time. And sometimes if you use pod disruption budgets, you might run into problems when you have spot instances and they need to be cleaned up within two minutes. So we'll go into a little bit of an example here. So the Kubernetes eviction API is what is used when you drain your node. And the kubelet will basically be sending sig terms and sig kills based on this Kubernetes eviction API if it respects all the pod disruption budgets within your cluster. So let's say you have no termination handler installed in your cluster. What this will do, you can configure it to watch for interruption notices. And when it sees interruption notice, you can configure it so that it also drains the node. And when it drains the node, what the kubelet will first do is it'll send some sig terms to the pods once these have, like let's say that the pod disruption budgets won't be violated and that you can evict these pods. For instance, this pod could be saving data or shutting down. And from that, a kubelet will send a sig term. And once it's done, it'll be deleted. And then you can also do another one where this pod could be draining connections. This could be something like batch processing. Sometimes when you are cleaning up your pods, sometimes they take too long, maybe too long, way too long. And at that point, when it reaches the termination grace period seconds, the kubelet will send a sig kill to the pod. And at that point, it'll be forcefully deleted and then the node will be cleaned up. So that's how you can use Kubernetes and Spot to clean up your workloads programmatically. But sometimes we need to also provision our workloads to get those instances in the first place. So we'll talk about how to autoscale your cluster. So you can use HPA and VPA to autoscale your pods. And so HPA is the horizontal pod autoscaler. And so when you horizontally scale your pods, this will be in the form of scaling the amount of desired replicas on your deployment for your pods. And so you see on the top one, you can see the pods going from one replica to four replicas. And the same would be for vertically scaling. You could go from one smaller pod that maybe has one vCPU and five gigs to a larger pod that is two vCPU and 10 gigs. And so within this, sometimes you might not have the nodes or the instances to hold these pods. And so you need to also autoscale your nodes. And so a very common node autoscaling solution used in Kubernetes is the cluster autoscaler. And for AWS, this is a simple interface between the EC2 autoscaling groups. And it would look at the existing pods within the cluster. And if they can't schedule it to the existing pods, sorry, it's the existing nodes, it will increment desired capacity for the associated ASGs that you have configured. And so cluster autoscaler, in this sense, uses externally managed infrastructure. If you have a lot of different capacity type requests, availabilities and requests, and instance shapes, you could have this setup where you have six different ASGs where they're each of the yellow ones. And that leads us to something else that we'd like to talk about, which is carpenter. Yeah, so oh, clicker, cool. So yeah, so that's why we decided to build carpenter. So carpenter is a new node autoscaler for Kubernetes. It's a groupless node autoscaler. So cluster autoscaler kind of takes the approach of using this externally managed infrastructure, like autoscaling groups or virtual machine groups, that you'd increment this desired capacity on, like Nick was talking about. Carpenter is completely groupless. And Kubernetes is native. So we use provisioner CRD, which is our custom resource in carpenter to define this configuration for scaling out your nodes. Carpenter is familiar to use if you're coming from cluster autoscaler, because we do just-in-time provisioning. So we look at pending pods that are unschedulable by KubeScheduler, and that's how we know when to scale up your cluster. Carpenter is designed to be completely vendor neutral. Currently, we only have the AWS Cloud Provider, but we're hoping to add on more Cloud Providers in the future. So let's look at an example here. So you have a cluster. You have some pending pods coming in. Just like cluster autoscaler, KubeScheduler gets the first pass. So if you have existing capacity in your nodes that are running in your cluster, the KubeScheduler will just schedule those pods onto those nodes. If you have unschedulable pods, so this pods at KubeScheduler is not able to place on nodes because of scheduling constraints or just not enough capacity. This is where Carpenter comes in. So Carpenter will look at these pods. It'll look at your provisioner spec, and it'll do an intersection of the requirements in both the pod spec and the provisioner spec. And then Carpenter will look at the nodes available in the Cloud that you're launching capacity in and figure out the set of nodes that it should launch to fulfill the capacity. So like I was saying, Carpenter is completely Kubernetes native, so there's no external infrastructure that you have to set up to use Carpenter other than a few permissions. And so we use a custom resource definition that we call the provisioner. And so this is an example of a provisioner spec. So we have some main sections of the spec, main one being the requirements. So the requirements uses the same Kubernetes requirements APIs that you're probably familiar with if you're using like affinities in your pod specs. And we use this to kind of layer on these scheduling constraints. So in this example, like the Kubernetes IO arc, that's a well-known Kubernetes label to set the CPU architecture. This provisioner is flexible to both x86 CPU architectures as well as ARM64 architectures. So this provisioner would be allowed to launch pods that require either of those, or if you build your containers for both architectures, Carpenter will union together all the nodes or all the instance types in the cloud that support those two architectures and pass it to EC2 for EC2 to make the best decision. Like Nick was talking about with the capacity optimized allocation strategy. With Spot, this works really well because you can pass it a huge array of instance type options and then let EC2 select the one that's going to run the longest with the best discount. So similarly to the CPU architecture, you can also be flexible across capacity types. So your pods, as always, can override kind of what the provisioner's flexibility is. It will do this layering intersection of constraints. But if you're flexible to both Spot and On Demand in your provisioner, Carpenter assumes that you want the cheapest node option if you're flexible to both. So we'll default you to Spot. But of course, you can override to On Demand if your pods aren't interuptable. So another section of the provisioner's spec is the provider's spec. So this is a generic spec. But in this example, this is the AWS cloud provider. This would be where other cloud providers are also implemented. And this is just cloud-specific parameters for launching your nodes. So Nick was talking about the Spot best practices of not setting a max price because that was kind of the bidding error of Spot, at least on EC2. And then the big one with Spot best practices is flexibility. And that was one of the main reasons we built Carpenter is because we needed to be really flexible with the compute request with all the instance types that EC2 is releasing. And then rebalance recommendations gives you a way if you can't tolerate a two-minute interruption window, rebalance recommendations can potentially give you more time to gracefully shut down your application. So we're going to look at how Carpenter helps with this approach. And so these are the things I already mentioned, the architecture of flexibility. Carpenter builds this right in here where you can be flexible across different CPU architectures. And then like I was saying, your pod spec here, if you require specific architecture, you can override it in the pod spec. Capacity type works the exact same way with Spot non-demand. It'll default the spot if you're flexible to both. And you can override it in the pod spec. And the requirements section isn't limited to just the ones I'm showing here. These are kind of well-known labels in Kubernetes. There's not a capacity type well-known label in Kubernetes. So we've used the Carpenter namespace for it. But you can use your own labels here. So you can kind of make your own scheduling constraints if you need that. So another one that is a well-known label is the node Kubernetes instance type label. So if you do require, or if your pods can't run on specific nodes, you can do an exclusion list here. And again, it uses the requirements API in Kubernetes. So you just do a nod-in operator here. So another interesting scheduling constraint you can do is with GPUs, or extended resources, more generally. So in this pod spec, you're doing a resource request for NVIDIA GPU. And Carpenter knows how to look at your GPU resource requests, query your cloud provider for nodes that support an NVIDIA GPU, and then provision that instance for this pod. So here's a little bit more details on the AWS cloud provider side and how we're doing this flexibility that we've kind of talked about a little bit, but this is more concrete on the APIs that we're using. So we use the EC2 fleet API in what's called instant mode. So Carpenter will get these unschedulable pods, and then it will compute this constraint intersection between the pod specs and the provisioner spec. And once it does that, it will query the cloud provider for all the instance types that match the scheduling constraints and the bin hacking that Carpenter is doing for the pods onto those nodes. Once it computes this list of instance types, it'll pass that to the EC2 fleet API with an allocation strategy. So if you're using Spot, it'll use capacity optimized. And if you're using on demand, it'll be a lowest price allocation strategy. But you can potentially get a huge list of these instance type options, which makes, obviously, your request very flexible. You have more guarantees on getting capacity, and hopefully your spot capacity will also live a lot longer. And all of this is taken care of with one provisioner spec. You don't need to make a bunch of these provisioners to get a hugely flexible set of instance types that are launched. So this provisioner will support on demand spot, different CPU architectures, GPUs, all of that. We also support topology constraints for zonal constraints, host names, all the regular stuff you're doing with your pod topology today. So we talked a lot about scaling up. So now let's talk a little bit about scaling down. So Carpenter has two main fields in the provisioner spec that control scaling down. So we have a TTL seconds after empty. And so this is when you have a node up and all of your pods are evicted or exit if their jobs. And so you need to spend that node down. So Carpenter will look for empty nodes, excluding Damon sets, and then turn those nodes off after this time to live expired. The next one is the TTL seconds until expired. And so this fulfills a requirement if you need to roll your nodes over a specific time period. You don't want them to live longer than 30 days or a week or something. Carpenter will basically fulfill this expiration period for you and roll your nodes over. And whenever Carpenter rolls a node over or turns it off due to it being empty, it'll gracefully drain these pods from the node. So it'll use the eviction APIs. It'll respect your pod disruption budgets. And we even go a little bit further where we install a node finalizer on all the node resources in your Kubernetes cluster. So you can actually kubectl delete a node that's provisioned by Carpenter. And Carpenter will call the eviction API to drain the pods respecting your pod disruption budgets. And then we'll terminate the backing EC2 capacity behind it. So Carpenter today doesn't support natively spot like being able to watch for interruptions. But you can always install the AWS node termination handler in your cluster, which is pretty easy to install. And it will pass that signal back so that Carpenter can kind of handle your node evictions for you. And so this is a similar walkthrough of what Nick did, but with a rebounds recommendation. So this would be used if you need more time than that two minutes window that an interruption would give you. So node termination handler is installed in this cluster. It watches for rebounds recommendations. And then it sees one and decides to drain the node due to this rebounds. Another way you can handle a rebounds recommendation is to simply cordon the node. And that's an option in node termination handler if you just don't want further pods to schedule onto it. And so then the eviction process happens with a kubelet sends it a sigterm. Hopefully the pod shuts down in a reasonable amount of time within your termination grace period. And then it just shuts down and everything's good. So what do we talk about today? So we talked a little bit about spot and how spot can drastically reduce costs of your cluster. Discounts up to 90% on your nodes where it's the same backing capacity. The only caveat is you have to handle that interruption notice. But handling the interruption notice isn't that different from on demand, right? Because you still have to handle pods shutting down for on demand, hardware failures and on demand. So you kind of have to be able to shut down fast anyways in a lot of applications. So why not try out spot? We looked at spot best practices mainly on the provisioning side on being super flexible to different instance types, which will allow your spot instances to run for longer periods of time and also hopefully get better discounts if you're being super flexible. Then we talked about how Kubernetes and Spot are really a match made in heaven, really. Kubernetes node lifecycle abstractions make it super easy to handle these spot interruptions and components like the node termination handler standardize everything for you. Then we talked a little bit about how scaling up your pods requires you scaling up nodes and how you can be flexible with cluster autoscaler or you can try out Carpenter and use our CRD approach with a provisioning spec to do groupless node autoscaling. And we did have a demo, but we had severe technical difficulty before. So I'm not sure is it gone? OK, yeah. I think my laptop died right when I walked up here and wouldn't charge anywhere. So no demo. But if you catch me around and I get my laptop working again, we can definitely show you a demo of Carpenter in action. So yeah, that's all we have. So are there any questions? I think we have like four minutes left, maybe. Sure. I think we have mics up here if you want to. Justin Garrison, MC. So would you recommend only if I use Carpenter to scale my nodes, would you recommend to only use spot instances in production? But I mean, it really depends. So there are certainly applications that can't tolerate interruptions. Like you just can't do it. So in those cases, if you can handle some interruption but you need to guarantee some amount of capacity, then you can do a topology spread across on-demand and spot nodes. So you get like a 50-50 spread. And hopefully one day we'll be able to do weight-based topology spreads where you could do like a 10% on-demand and 90% spot spread. But there are definitely workloads that require spot. And Carpenter will fulfill that use case as well. But it's really easy to use spot with Carpenter. And also, is it possible for Carpenter to get rebalance recommendations to use the? Yeah. So as long as you install the node termination handler in your Kubernetes cluster, Carpenter knows how to drain the empty nodes or knows how to shut down the empty nodes. And so node termination handler will actually do the graceful draining part of the process when it sees an interruption or a rebalance. And then Carpenter will shut down the node for you. Hello. You said that Carpenter can shut down the nodes that seems empty, excluding the mindsets. What about moving pods that can be scheduled on remaining capacity to defragment the cluster? Yeah, exactly. Defragment is definitely on like the top of our priority list right now. We've done a lot of work on it so far. It's not released yet, but we're thinking about that problem a lot. There's a workaround today where you can actually use the expiration feature, where you can set an expiration. And if you have a good pod disruptions set up, you can shut down instances that have expired. And then the pods will go back into pinning, and they'll be packed more efficiently on that round. So that's kind of our workaround currently. But hopefully, defrag will be out soon. Hi. Hello. Can you see Spot and Carpenter actually helping with Carpenter footprint for companies? Part of the green revolution, so to speak. Right. I don't have any data on it, but I definitely see Carpenter as more efficiently utilizing nodes and capacity. And hopefully, moving forward, capacity will be more utilized and therefore more efficient and therefore more carbon efficient as well. Hi. Have you noticed any changes in the spot markets over the last like two years? I would imagine companies are using EC2 a lot differently after pandemic than before. Yeah, I have not. I don't have access to that data, so I wouldn't know, to be honest. But yeah, hopefully, more people are using Spot. I don't know if the pandemic had anything to do with it or just, you know. It's a good deal, right? People like discounts. Hi, and this is a great slide to ask the question for, because I see that you have R5 instance specified here. Does Carpenter support the attribute-based nodes selection? So that's an interesting question. So we're working on a addition to this provisioner spec where you could specify attributes to constrain the list of instance types. But by default, Carpenter uses all instance types, and it'll look at the pods to kind of figure out what constraints would prevent it from launching certain instance types. But yeah, we're working on the attribute-based filtering down in the provisioner spec. Thank you. Hopefully, that would be out soon too. Hi. I also have a question about Carpenter. I mean, this spot instance is really interesting. I think pretty well understood. Carpenter's interesting for a lot of reasons. You've talked about the bin packing on at least the metrics of CPU and memory. Have you also considered the constraint of volume attachments? Yes. We definitely have. So we support some empty directory packing today. We're looking at supporting more dynamic ephemeral storage attachment, like scheduling and bin packing awareness. We are aware of persistent volumes. So we'll schedule in the correct zone if you're trying to attach an EBS volume to it. But yeah, that's definitely something we're working on currently. Just because of the volume attachment limit, right? Right, exactly. Yeah. Make Justin run. Hi. You said Carpenter doesn't work with out-of-scaling groups. But how does it work with managed nodes? Yeah, so today, Carpenter doesn't work with managed nodes. So they're separate solutions. So with managed nodes, you'd use cluster auto-scaler today for managed nodes, because it is an out-of-scaling group. And Carpenter would just be completely without any groups. OK, thanks. I think that's time. So thank you very much. Thanks, everyone. Thanks.