 Oh, thank you for joining this session. My name is Rohini Gaonkar and I'm a Senior Developer Advocate at AWS. Today, we are talking about Carpentel, an open-source Kubernetes cluster autoscaling. If you have more questions, feel free to reach out to me via LinkedIn on the given website. So let's quickly look at different ways we can do Kubernetes scaling. Remember, the goal here is to efficiently use the infrastructure, have less wastage and save costs to and ensure a more highly available application. So there are three main concepts and let's look at each of them. In the horizontal pod autoscaling or HPA, you scale or add more number of pods based on the resource metrics. So it is pod level scaling. You simply start adding more and more number of pods as your demand increases. And if your demand decreases, you simply automatically stop the pods to free the resources. So you scale out and scale in as per your need. With vertical scaling as it suggests, adding capacity to the same resource. So the Kubernetes VPA automatically adjusts the CPU and memory reservation for your pods to help right-size your applications. And finally, the Kubernetes cluster autoscaler, which is a popular cluster order scaling solution maintained by Sig Auto scaling. It automatically adjusts the number of nodes in your cluster. So when your pods fail or are rescheduled onto other nodes, it is responsible for ensuring that your cluster has enough nodes to schedule your pods without wasting resources. So it watches for pods that have failed to schedule and for nodes that are underutilized. And it then simulates the addition or removal of nodes before applying the change to your cluster. Now the AWS cloud provider implementation within the cluster order scaler controls the desired replicas field of the EC2 order scaling group. EC2 order scaling group is a feature of AWS that's used by cluster order scaler. Cluster order scaler works with HPA. So the horizontal pod order scaler changes the deployments or replicas sets number of replicas based on the current, let's say, CPU workload. If the CPU load increases, HPA will create new replicas for which there may or may not have enough space in your cluster. So if there are not enough resources, cluster order scaler will bring up some of these nodes so that the HPA created pods will have a place to run. Now if the load decreases, HPA will stop these some of these replicas which will result in some nodes may be underutilized or even empty. And that's when cluster order scaler will actually terminate these unneeded nodes. So as you just saw, cluster order scaler relies on the concept of node groups and EC2 order scaling groups to manage the cluster capacity. Now cluster order scaler here assumes that the instance types are all identical in a given group. So if you want to use a node group with let's say mixed instance types, you need to make sure that each type has roughly the same amount of CPU and memory resources. Otherwise, resources might be wasted or insufficient during a scaler. To support different instance types, you need multiple node groups. Also, as I mentioned, it's recommended that each node group span only one availability zone. So to make sure that if you want your workload to span across multiple availability zones for high availability, you need a node group per instance type per availability zone. Well, cluster order scaler was not originally built with the flexibility to handle hundreds of instance types across multiple availability zones. It loads the entire cluster state into memory. The nodes, then paths and the node groups identifies unscheduled paths in the cluster and simulates the scheduling for each node group. So when you have lots of node groups, this gets very complicated. And when run at scale, it often takes up to five minutes to actually scale your capacity in your cluster. This can have significant impact in use cases, where the speed of capacity scaling is very critical. It could also have a real customer impact as customers are not able to meet the commitments of their end users. So it's hard to get the high cluster utilization and efficiency of operations. Customers of AWS, I have over provisioned resources to ensure that a consistent end user experience. I've seen our customers over provision their infrastructure by 20 to 25% in some cases. And then there are some use cases like machine learning or batch workloads, where I need to quickly experiment something. So instead of having, sorry, so instead of having to get a node group configured, then get other resources, which actually slows down the pace of innovation. And that's where we need Carpenter. Carpenter is an open source, flexible, high-performance, Kubernetes cluster order scaler that helps improve your application's ability and cluster efficiency. It launches the right-sized computer sources, for example, in our case, Amazon EC2 instances, in response to changing application load in under a minute. Through integrating Kubernetes with AWS, Carpenter can provision just-in-time computer sources that precisely meet the requirements of your workload. What's that asterisk? Well, AWS is the first cloud provider supported by Carpenter, although it is designed to be vendor-neutral. Carpenter works in tandem with Kubernetes scheduler by observing the incoming pods over the lifetime of your cluster. So it will launch or terminate your nodes to maximize your application availability and cluster utilization. When there is enough capacity in the cluster, the Kubernetes scheduler will place the incoming pods as usual. When pods are launched and they cannot be scheduled using the existing capacity of your cluster, Carpenter will actually bypass the Kubernetes scheduler and work directly with your provider's compute service. For example, Amazon EC2 instead of order scaling groups in cluster order scaler. So to launch the minimal compute resources that are needed to fit those pending pods and binds those pods to the nodes that it provisioned. So as the pods are removed or rescheduled to other nodes, Carpenter looks for opportunities to terminate the underutilized nodes as well. Running fewer, larger nodes in your cluster reduces the overhead for demon sets and Kubernetes system components and provides more opportunities for efficient bin packing. The central concept in Carpenter is Provisioner. So we do this using the Kubernetes custom resources. This is a kind of modern way or standard way to write controllers. So a Provisioner is how you define how Carpenter will manage the unschedulable pods and the expired nodes. So Provisioner comes with some smart defaults, but these are fully configurable and these defaults include the configuration of the instance type, selection, the launch template generation, the subnet, security groups, etc. etc. So you could think of two personas. Okay, there's an administrator and there's an application developer. It is expected that a cluster administrator would install an update Carpenter. They find the Provisioners to segment the infrastructure space as needed. So they can define the Provisioners based on purchase options, the capacity type, the instance type, the availability zones, etc. And the application developer who is actually deploying these pods might, which might be evaluated by Carpenter. They write the pod manifest. So as long as the requests are not outside of the Provisioners constraints, Carpenter will look for the best match, the request, comparing the same well-known labels of Kubernetes defined by the pod scheduling constraints. Note, if the constraints are such that a match is not possible, the pod will remain unscheduled. Kubernetes features that Carpenter supports for scheduling pods include the node affinity, the node selector, it also supports pod disruption budgets, topology spread constraints, inter-pod affinity and anti-affinity as well. So let's quickly look at a demo. In this demo, I have already set up a Kubernetes cluster. It also has Carpenter installed. You can find all the steps in the Carpenter documentation. I'll provide the link towards the end of this presentation. I've already set that up. I've also defined a default Provisioner. So this is something your administrator could do. So they have defined a default Provisioner. And in there, I've mentioned that any capacity that you launch should be a spot, should be of this instance type family. And it could be of a certain instance size. Right now, I have just commented it out, but you can have it of a certain instance size. And I've also mentioned that it should be AMD-based instances as well. You can also mention what is the limit of number of CPUs that we would want. How can Carpenter understand where to launch these instances? Well, I've also mentioned where the subnets are and security groups are about to detect them. So it will go ahead and discover that, hey, these are the subnets that you want to go ahead and launch your EC2 instances or the nodes that you would need. And there's an important point here that I've mentioned that TTL seconds after MTS10. What does this mean is that once a node is empty, there are no parts running on it. It will wait for 10 seconds. Carpenter will wait for 10 seconds before terminating that node or that EC2 instance. I've kept it low because it's a demo. I want to show it quickly, but you can keep it higher if you are running a production workload. So I've already applied the default provisioner. All could. What I'm going to do next is going to see that will create more replicas in this case. So before we do that, you can see that there are no parts running right now. You can see that there is only one node running right now. And these are the Carpenter logs. So generally, I start with one and escalated further, but try not to save time. What I'm going to do is I am going to just ask for maybe four parts that I wanted to create. The moment I say yes, what it is going to do is it is going to have four parts that are in pending state. And you can see in the logs, let's go up a little bit, it says that, hey, created node with four parts requesting certain capacity. That yes, it is now waiting for this EC2 instance. So it has already created that EC2 instance. And you can find that it has already launched an EC2 instance 23 seconds ago. What is this EC2 instance size? The size is obviously anything that would fit all these parts on, but it is of C5. It is AMD64. And if you move a little bit here, you could see that it is already spot. And the instance is already running. So it's 39 seconds, but you can see the status has changed from pending to container creating. So as we talked about this in the presentation, when Kubernetes is creating those EC2 instances, it's not only considering that, hey, I need to schedule these. It's not only creating that EC2 instances, but also taking a scheduling decision. So when it is creating these EC2 instances, it is bypassing the cube scheduler and by directly binding these parts to these nodes as well. So you can see within few seconds, like I think it was 58 or 60 seconds, you can see that all these parts are actually running on these EC2 instances. Let's escalate it a little bit. Let's make it instead of four, let's say I want 100 parts and we'll see how quickly Carpenter is able to compute that how many nodes it needs for all these 100 parts and is going to quickly launch all these EC2 instances. You can see that within seconds that the EC2 instances that it has calculated. So let's go up and see. Okay. So create a node with 85 parts. So it could fit a few parts on the other EC2 instance where it has gone ahead and deployed that that is something that cube scheduler will do quickly. So if you want, we can also check right away of how many parts are actually in the running state right now. So right now 15 are actually running on the EC2 instance that was ready or the node that was already ready. The one that is not ready, that's where the other 85 parts are going to be placed. Okay. And you can see that it's already 75 seconds and the EC2 instance will get ready in couple of few more seconds or couple of minutes more before it can actually have all these parts going and placed on a running state. So by by passing auto scaling groups and directly talking to EC2 instances, we are able to save 30 to 35 seconds actually when you're trying to schedule a lot of parts. And if you've been, if you've seen this that in 108 seconds, our EC2 instance was up and running. Let's see how many parts are up and running right now. You can see that yes, 20 parts are up and running. There are some in container creating mode. So they are downloading that image and getting ready. And if you want, you can also keep checking that how many of these are getting created. So you can now see that that number has quickly started escalating and you can within what it's been two minutes since that EC2 instance has been launched and you can see already most of the parts have been deployed. So that's how quickly Carpenter can actually get the EC2 instances up and running. Great. So all the hundred parts are up and running. So what we'll do next is actually just go ahead and remove all these parts. Okay. So I'm just going to say, Hey, just go ahead and have zero. And you can see how quickly it is going to scale down. So the parts will go off instantly. But for the EC2 instances, you can see that it has added TTL. If you see the logs that I've highlighted, it says that added TTL to the entry node. And because it was just 10 seconds, it's saying that it's triggered the terminations are within 10 seconds. All my EC2 instances have been deleted. If I want to make it more interesting, I can also go ahead and let's say patch the deployment and say, Hey, instead of AMD, I want ARM based EC2 instances. And once that is done, I'm going to ask for, let's say two parts that need a node that is ARM based. Now in this case, let's scroll down. We also got ARM based EC2 instances, but they will be like, when didn't you mention AMD already? But yes, I have also mentioned applied another. So you will be able to see that here one second. Okay, so you can see that it already found a provisioner for ARM 64. And the request that I just had matched that ARM 64 requirement, it was there in one of the provisioners. And so it went ahead and deployed that EC2 instance with ARM 64. So you can have multiple provisioners. In this case, these provisioners could have different constraints, different requirements, and Carpenter will automatically pick up that, Hey, there is already a provisioner. If there was not, there was no provisioner for ARM 64, it wouldn't have allowed the user to actually go ahead and deploy this particular application. So that's it. That's the simple demo. Let's go back to our presentation and wrap up the section. The key takeaway, so you use the default provisioner for diverse instant types and availability zones, you can add additional provisioners as you need. You can also control your scheduling based on the topology spreads, chains and tolerations and provisioners, etc. Use HPA with Carpenter to scale in and out. And you can schedule these pods with spot if you need to save cost. If you want to install Carpenter, you want to play with it, you want to contribute to Carpenter, do check out the documentation and the GitHub link I mentioned here. There are some best practices that we discussed about how to use this with EKS. There are also certain workshops if you want to do more hands on with respect to Carpenter, and you can find all that detail on these resources. So that's it. That's me. Thank you for joining me for this quick demo and discussion about Carpenter. I hope this was insightful and this was useful. And I hope we all experiment and continue innovating in the way our Kubernetes clusters are scaling today. So thank you again. See you again next time. Thank you, Rohini. Yeah, that was a great tool when you were working with KubeCluster.