 Hello everyone and welcome to the Sig Auto Scaling update for KubeCon, CloudNativeCon Europe 2023. So what does Sig Auto Scaling own? We now own a number of different hub projects. So the Balancer, which is new for the sort of past year, Piotr will be giving you an update on that and the implementation of that. We also own the cluster autoscaler for scaling your cluster in terms of nodes horizontally. Giant will be giving you an update on one of the newer bits of functionality for that to aid cluster operators in debugging what the cluster autoscaler is doing and why. We also own horizontal and vertical pod autoscaling, horizontal pod autoscaling, which is implemented as part of the Kubernetes controller manager, but also vertical pod autoscaling, a newer standalone component. And Chen and McKellie will be talking you through a demo of a proposal for a new feature called the multidimensional pod autoscaler and how that will allow users to potentially overcome some of the current challenges with using horizontal and vertical pod autoscaling together. And finally, we also own the Adornary Sizer, however, we won't really be covering that this time round. So I've mentioned some of the in-depth demos and sort of walkthroughs you're about to get. However, we also have a number of other updates that might be of interest to you as a cluster operator or a user of a Kubernetes offering. So for the horizontal pod autoscaler, container resource scaling has been available in alpha since 120. A member of the community has finally picked up the work to get that promoted to beta in time for Kubernetes 127, hopefully, and that will allow users potentially have managed services provided by cloud providers who might block off alpha features potentially to make use of it. And this has the potential to improve the durability as a service owner to improve the granularity of your scaling behaviors. And we also asked owners cluster autoscaler have committed to an improved release process going forward. And so current users may be aware that the patch release to this point has been a bit ad hoc. We are looking to provide a far more clarity for users of cluster autoscaler when patch releases might come out and therefore when cherry picks of bug fixes, etc, might make it back to supported releases. And finally, on the vertical pod autoscaling side, we're beginning to gather the work to enable taking advantage of dynamic pod resizing. This has always been intended as a feature for the vertical pod autoscaler. However, we've been dependent on work from other SIGs and that's now progressing significantly and looks like it's going to make into Kubernetes 127 release. And so we want to be ready to do all of the work that's required for the VPA to take advantage of that as soon as possible. And if any of this interests you, we as a SIG definitely need help. As mentioned, we have ownership of a number of different sub projects and we need your help as the community in all of these areas. We'd love to get your feedback where issues where some of our projects aren't working the way you think they should or progressing up that contributor ladder via issue triage, reviewing PRs or even contributing features or bug fixes that you think are important or excited about. If any of that interests you, please get in touch with us as a SIG. We hang out on the GitHub repo that hosts most of our projects. We also have weekly SIG meetings at 1600 CET. If this time zone doesn't work for you, please reach out and we'll try to arrange something. However, we're also reachable on Slack at SIG autoscaling and enjoy the demos. Hi, I'm going to cover Balancer, a new resource in the autoscaling space. It allows to control how pods are distributed across similar deployments and autoscaling together. In a second, I will explain why it may be useful. We've introduced the Balancer resource and a controller for it. To use it, you need to install an optional component from Kubernetes slash autoscaler. It's just been published, so it's an early alpha stage. So why would you want to distribute pods in a workload across multiple similar deployments? There are a couple of use cases. Perhaps you have a regional cluster and want to ensure that the workload is spread evenly across zones for availability reasons. You have multiple deployments, each responsible for a different zone. When pods are added or removed to the workload, you expect this even distribution to be maintained. Or perhaps you want to run your workload on both standard VMs and spot or preemptible VMs, which are cheaper but less reliable. You would like to always have 25% of pods running on these less reliable VMs. Again, you expect new pods to be distributed according to the defined ratio. Or perhaps you run your workload on different machine families, or there's any other slight difference in configuration. Now that we know when the Balancer feature may be useful, let's explore how to configure it. Let's say I want to balance my app between two deployments in a 3 to 1 ratio. The use case we described earlier for the 25% of pods on spot VMs. You can see the Balancer YAML to the right. API version balancer.xkate.io slash v1 alpha 1, kind balancer. This balancer defines two targets, which are deployments called my app A and my app B. We can define min and max replicas for each target. And lastly, we define the policy. In this case, this will be a proportional policy with ratios 3 and 1. With this config, if the Balancer controller detects that the number of replicas in deployments is no longer in proportions of 3 to 1, it will redistribute pods to bring back the desired ratio. Also, when scale on the Balancer object changes, so pods are to be added or removed, Balancer controller will add or remove them from the right deployments in order to maintain the 3 to 1 ratio. This combines well with horizontal pod autoscaling, so we can define an HPA that has the Balancer object as its scale target. This HPA covers the whole workload, both deployments combined, and recommends by updating the scale on the Balancer object. The Balancer controller then distributes this new scale across its targets, deployment A and B. Let's have a look at a different use case. I want to run my app generally in deployment A, maybe because it's a particular zone I prefer. However, if there's no space there, max replicas are reached, then overflow to deployment B, a different zone perhaps. As soon as there's space in deployment A, rebalance pods back there. You can see that now the policy part of the Balancer YAML at the bottom is set to type priority, and priority order is defined as my app A first and my app B second. We can add a new requirement where if pods cannot start in deployment A, even if max replicas is not reached there, we will still start pods in deployment B. Perhaps there are pending pods in deployment A because there's no space in the Notebook. We can configure this fallback behavior by adding a fallback section to the policy spec. We enable fallback and define the startup timeout, the time after which a pod is considered blocked. It will trigger starting another pod in deployment B. Once the original pods finally start, the excessive pods in deployment B are removed. In summary, the new Balancer resource allows to define how pods should be distributed across similar deployments. You can additionally define horizontal pod autoscaling on the Balancer object to autoscale the pods together. You can learn more at the GitHub repository page, Kubernetes slash autoscaler, and we're waiting for your feedback. Thanks. Hi, everyone. How are you doing? Today, I'm going to be presenting Debugging Snapshotter, which is a tool that we have in Cluster and Scalar. Before we get into what the tool is, let me try to explain what are we trying to solve here. CA logs a lot of information about the decisions taken. For example, if you scale up, you scale down, you don't scale up, but we don't log what data these decisions are based on. CA internally simulates the behavior of the impact cluster, and that's just too much to log. When something goes wrong, we usually need to understand how these decisions were taken, and not just what these decisions were. RCA of an issue usually takes time, and the internal state of the cluster changes when we mitigate the issue, thereby making it more difficult to debug the issue. Debugging Snapshotter is a tool to visualize the internal state of Cluster and Scalar at a point in time to help debug autoscaling issues. So before you go about mitigating the cluster, maybe it would be easier if you had a snapshot of what was the state of Cluster and Scalar at the time when the issue was happening. So let me quickly take you through some common use cases that are available. Scale up and scale down not working with special focus on scale from zero nodes. Maybe if you have an attached instance, for example, there's a mismatch between scheduler and cluster autoscalar decisions where cluster autoscalar decides to not scale up, but scheduler says it cannot schedule anything. There's a mismatch in resource availability on a node where there might be extra resource on the node, but cluster autoscalar assumes there's a different resource available. So what data is being captured by the Snapshotter that's available for you? First and foremost, the node list that essentially all of the nodes in the cluster. Secondly, any unschedulable pods that cluster autoscalar things are schedulable. There's also template nodes, which are simulated nodes or any attached instance group which with no nodes on the cluster. There is an error field which which is filled in case the Snapshotter fails and there's an error generating the snapshot itself. And there's a start and end timestamp. This could help you with if you decide to take multiple Snapshots and also how also because Snapshotter might sometimes take a non-trivial amount of time. So this would encapsulate the complete timeframe when this particular Snapshot was generated. How does it work? It basically captures the internal fields that we talked about when we receive a request for a Snapshot and returns a well-formed JSON. After receiving the request in the following cluster autoscalar processing loop, Snapshotter will snapshot the cluster state and limit it to a single loop, no crossing data. So it will start to capture in the following loop at the start of the loop and it will go through all of the steps that cluster autoscalar goes through and collect all of the data that is part of the Snapshot and end when that particular process loop closes. The HTTP request blocks for the duration of the Snapshot generation and returns a JSON as the HTTP response to the request. There's no files. You make a request, you wait for some time and you get the response as HTTP response. How do you make this request? You SSH onto the server running the leader cluster autoscalar in case you're running multiple cluster autoscalers and you curl to this following link. It's a local port that cluster autoscalar has a path available on. Let's quickly go through the demo. It's super simple. We have a cluster autoscalar running and we have a cluster running where we have three nodes in the cluster. We have cluster autoscalar logs here, which has an attached instance group, which is empty for now. Let's apply a pod or deployment with some resource request which cannot be accommodated on any of the existing nodes and we should see a scale up at that point. Let me go ahead and apply the deployment. We have done that. If you see here in some space, we have a pod that is unschedulable and we see a scale up for an instance group. This is GC specific, but this also simulates two different cloud providers. As soon as we see the whole operation being done, let's go ahead and make a snapshot request as we see here. If you see in the logs, we also have logs around the snapshot itself. It marks when it has received the request. Let me just take you through it really quickly. There's a bunch of other things on when the data collection has started, what data is already being connected, and snapshot being flushed back as a response. We just made a call request as soon as that happened. We see now there are four nodes, but we made this before the new node is registered. We should see the new pod as something that can be scheduled, but there's no other. Let's go through all of the different items. We're going to use a common JQ tool that's used to parse JSONs. If you see here, we have a bunch of items that are available as keys. Let's go ahead and see how many nodes there are. There are four nodes. We can also see template nodes here. Let's go through that. There will be one template because we have one attached instance group which has a simulated node available. We will also have unscheduled pods that can be scheduled. This would be the node that is available. This is a pod that could be scheduled on the upcoming node, but isn't. It's in that middle state. This could also represent a bad pod or unscheduled pod, which you might want to debug. If you see here, NGINX deployment 1, there is a bunch of information around it. This is how you can easily navigate your JSON. That's it. Let me give you a quick rundown, a quick start on how you can get yourself up and running with debugging Snapshotter. You just need to enable the Snapshotter on your ClusterArtsScaler by adding the following flag to the manifest. This is already available in production on all ClusterArtsScaler versions 124 and above. I'm hoping, even for older ClusterArtsScaler, people want more stability. This should also be available for you. Also, there are going to be detailed instructions in ClusterArtsScaler FAQ, similar to what I have gone through that you can refer to and how to operate the Snapshotter. We're looking forward to hearing back on any feedback that you have and any possible extensions to the Snapshotter that you think will make everybody's life easier. Yeah, that's all from me. Thanks, everyone, and passing it on. Hello, everyone. I'm Chen Wang from IBM Research. Today, my colleague, Mikali Orlandy, and I will introduce our new enhancement proposal for Kubernetes Autoscaler, which is called Multi-Dimensional Pod Autoscaler. I'm a research staff member from IBM Research, and my daily work involves enhancement in Kubernetes, including Autoscalers, Schedulers, and Node Resource Plugins. I'm also actively working on cloud-native AI system platform and applying AI techniques in cloud platform management. I'm an open-source advocate, Kubernetes contributor, and a regular KubeCon speaker. So today, I will briefly introduce our motivation, why we propose Multi-Dimensional Pod Autoscaler and the design of the MPA, and later, Mikali will show us a demo on how to use the MPA. Currently, there are two Autoscaling controllers available in the community, which is the Horizontal Pod Autoscaler and Vertical Pod Autoscaler. So Kubernetes Horizontal Pod Autoscaler, later we refer to HPA, is an Autoscaler that allows you to automatically scale the number of paths in a deployment or replica site based on either the CPU utilization metric or other custom performance metrics. So with HPA, you can ensure that your application is running at an optimal performance, at optimal CPU utilization rate, and by automatically scaling out and in the number paths based on the metric. And another controller is called Vertical Pod Autoscaler, we later refer to VPA. So VPA automatically adjusts the resource request and limits of a container based on its actual usage. Rather than the initial value set by the developer, it analyzes the historical resource usage patterns of the pod periodically, and then recommends an appropriate resource request or limit to be set for the pod. This ensures that the pod always has the amount of resources allocated to it based on its usage, and can help prevent over provisioning of resources. So the design of VPA actually consists of three main controllers. The recommender analyzes the historical usage patterns of the pod and recommends the resource usage and limit to be set based on the histogram of the resource usage observed in a previous time window, usually 15 minutes. And the updater is responsible for observing the difference between the recommended request and limits and the current set request and limit. And we'll evict the paths if the gap is too big. And the admission controller is responsible for updating the paths request and limit when the past evicted paths are restarting, according to the recommended values provided by the recommender. So those three controllers work together to provide automatic vertical scaling for Kubernetes paths, allowing them to dynamically adjust their resource request and limits based on their actual usage. So because right now, HPA and VPA control the scaling actions separately as independent controllers. So when they are configured to optimize certain targets, for example CPU usage, they can lead to an awkward situation where HPA tries to spin more paths based on the higher threshold CPU usage. Well, VPA tries to squeeze the size of each pod based on the lower CPU usage after scaling out by the HPA. So the final outcome could be conflicting, meaning it will lead to a large number small paths created for the workload. And none of the objectives can be guaranteed. So there's some motivations for the combined control of both horizontal and vertical scaling. Because first, we sometimes want to fine tuning the timing to do the vertical and horizontal scaling and prioritize a certain dimension auto-scaling than the other. And there might be some, there might need some synchronizations between both actions. And sometimes controlling the vertical scaling based on the usage and controlling the horizontal scaling based on the performance just doesn't guarantee either performance or resource efficiency. Because the usage observed is statistics from a time window. And sometimes a certain margin of resource over provisioning is needed to handle the fluctuation of the workload to guarantee certain performance objective. Therefore, there needs an advanced combined algorithm to find an optimal combined control of vertical and horizontal scaling actions for a certain application under a certain load. So moreover, in some cases, one objective is prioritized than the other. For example, when there is only one replica for deployment, you may only want to scale down the vertical scaling down the resources if the resource utilization is low. So therefore, here we propose a multi-dimensional part auto-scaling framework that combines the control of vertical and horizontal scaling in a single action but separates the actuation of actions completely from the controlling algorithm. Just similarly, as a design of VPA, NPA consists of three controllers, a recommender, an updater, and then a mission controller. And of course we have a new API defined in the customer resources as NPA. And that connects all the auto-scaling recommendations to its actuation. So the multi-dimensional scaling algorithm is implemented in the recommender. The scaling decisions derived from the recommender are stored in the NPA object, and the updater and a mission controller retrieves those decisions from the NPA object and actuates those vertical horizontal actions. So our proposed NPA can also support developers to replace the default recommender with their alternative customized recommender algorithm. So developers can provide their own recommendations, their own algorithms implementing some advanced control of both actions. So in details, in NPA API, developers can specify the auto-scaling configurations include whether they only want to know the recommendations from NPA or whether they want NPA to directly actuate the auto-scaling decision. And it can specify certain application performance targets such as latency output, and it allows any custom metric to be used. So other auto-scaling configurations such as what has been available in HPA and VPA are also available in NPA. So NPA API is also responsible for connecting the auto-scaling actions generated from the recommender to the mission controller and updater. And it's created based on the NPA, actually the NPA custom resources provided by the upstream community. And basically it is a CR and it keeps tracks of what's recommended size of pod and the number of replicas. So the recommender just retrieves the time indexed measurement data from the metrics API about the usage and it generates the vertebrate horizontal scaling actions and the actions from the recommender are then updated in the object and the auto-scaling behavior is based on the user defined configuration as well. And the updater will update the number of replicas of the deployment and evict eligible pods for vertical scaling and the admission controller just similar as the VPA admission controller will just update the pod request limit for those evicted pod when they are starting. So here is an example YAML defined an NPA object so it includes configurations such as update policy namely whether you want to do recommendation only or you want to actually actuate those auto-scaling actions. And then the recommender you can define the default recommender or any customized alternative recommender with the name you provided. And there are also vertical scaling related configurations such as the minimum maximum allowed resources the type of resources you want to control and what containers to resize. The horizontal scaling related configurations include the minimum maximum number of replicas and the type of magic horizontal auto-scaling is operating on. So that's all of my introduction of NPA framework and next I will hand over it to Michele to show a simple demo on how to use it. Hello everybody my name is Michele and I'm from Italy. Today I'm going to show you the multi-dimensional pod auto-scaler. We can see this as a mixture between vertical and horizontal auto-scaler. I'm running on IBM cloud IKS, Kubernetes cluster. So let me deploy the objects the YAMLs that are going to make up my auto-scaler. So as you can see it's creating some customer resource definitions, cluster role bindings, services. I'm going to mainly look at this particular deployment which is the actual recommender. So let me just get the actual name of the pod for this one. So that's the one okay sorry okay. So it has no actual NPA object to observe. So I'm going to create one right now. This object is just a deployment is the actual auto-scaler and as a target it has this deployment here which I'm going to actually apply here. So as you can see I've created my apache application text. We're going to wait until the recommender actually detects the metrics from the application. At the moment no metrics are available yet. Still not getting the metrics here. As you can see the recommended CPU is 100 millicores. Oh there there we go. Metrics have arrived so we should get the recommender to detect them. Over here we have a limit and requested CPU for the only pod running. Okay so here we have the vertical scaling. It has updated the CPU recommendation but it has decided not to scale the number of horizontally scale the number of pods because as you can see the pod is basically just sitting there. So I'm just going to add a load to this pod. I'm going to load it with the request and let's see what the recommender says. When does the recommender detect the load? As you can see the metrics are still not being updated even though the pod is fulfilling all these requests and the replicas have not been updated either. Now the metrics have gone up. We have 157 millicores so the recommender should realize this on the next run and should scale up both horizontally and vertically. Let's see what happens. Okay there you go. So this is the new horizontal scaling, sorry vertical and this is the actual horizontal one because now we have four replicas, four desired replicas and in fact they have already been updated here. You can see also the replica set and then deployment has been modified and you can see also the limits and the requests have been modified. This is the recommended millicore value for the CPU. We have 182 as the recommended value. Now let's see what happens on the next run. Okay 163 has changed the value slightly but the replicas have now gone up to six which is the actual maximum value that we can have as you can see. So now let's see if it scales down. We're gonna stop the loading, we're gonna interrupt it. Okay control C, there you go. So now the load pod has disappeared. We're going to wait until the the MPA realizes the text a lower load here coming from the metrics and we expect to have another date on the vertical and on the horizontal side. So let's wait for the recommender to detect this change. Okay what do we have here? Vertical 126 yeah it's definitely going down before I think it was 182 but the actual the horizontal scaling is it has still not gone down. As you can see average utilization is basically zero still not scaling. Vertical recommendation is not changing either even though metrics are all below target. So it's actually watching metrics from all pods. A vertical recommendation has gone down but horizontal is still at six. Okay I think we have something there. Okay this is the new desired replicas. Okay here's already scaled down and we're basically back to the initial situation. Yes it took a couple of minutes to actually scale down but in the end we did it. Okay I think that's it for now. Thank you for watching this video and hope to see you soon. Bye.