 Hello everyone Hi, my name is Ellie Berger. I'm a CTO and co-founder of perfect scale Prior to establishing perfect scale. I Spent many years managing DevOps and infrastructure teams. I established large scale sus systems mainly based on the Kubernetes in recent years and today I'm going to talk from my experience about a little bit about the autoscaling in Kubernetes how to Simplify and understand what the Kubernetes autoscaling is But before we jumping into the mechanics tips and tricks of autoscaling. Let's talk a little bit about What is autoscaling and why we would try to implement autoscaling in our environments? So Basically, Kubernetes autoscaling is a problem of day two operations. What is the day two operations? So any system starts with day zero the day zero is when we plan our system on day one We building our system and they too basically is the constant effort of managing our It's the continuous and endless effort of managing what we built So cloud the native systems coming with the great Promise for day two is the flexibility. We can scale our environment up when it's needed We can scale it down when it's not needed and save some money So by that we can achieve the holy grail the best possible performance at the lowest cost possible So let's dive into the autoscaling when we're talking about the autoscaling or scaling in general There is two dimension to think about Scaling the vertical scaling and the horizontal scaling the vertical scaling is when we adding more Resources to the existing instances, whatever it is. It is it could be a node. It could be a pose and The horizontal scaling and is when we adding more replicas to the existing instances so Kubernetes comes and Kubernetes and opera and open source community brings us a few tools Those tools are widely adopted and but will proven for horizontal scaling. There is a cluster auto scaler AWS Carpenter GKE autopilot to scale your nodes automatically There is HPA and KIDA to scale the pods horizontally and With all those capabilities it for me personally when I built my first system It was obvious. Let's put the cluster auto scaler. Let's put the HPA and everything will just work fine, right? So that was my expectation very high SLA and my system is flexible scaling up When it's needed scaling down when it's not needed my cost is Kind of have the season is with those very nice seasonality waves but The reality was a little bit different what I found over time is that my SLAs are not so good in the spikes and also my cost is constantly growing So this is this was for me the point where I decided to dive in and understand a little bit more on how does it work? So let's start with the simple mechanic of scheduling in Kubernetes because everything starts there For the example we are we are taking a pod of with the request of four cores of CPU 8 gigabyte of memory and some limit It doesn't matter and we will schedule this pod on a node and For for this purpose our node will contain a course and 16 gigabyte of memory So as you can see when this pod goes to the node it allocates Certain amount of CPU and certain amount of memory and this amount couldn't be taken by any other pole When we will try to schedule the next pod and For the example, we will take the pod with the request of 12 gigabyte It couldn't fit on this node. So this pod will become unschedulable cluster autoscaler carpenter Google Autopilot all those are Subscribed to unscheduled pods They are watching for unscheduled pods and if they see the unscheduled pod They are going to pop and bring new node to the cluster. So we will eventually be able to schedule our pod But the important thing we haven't said anything about utilization. We are talking about the request We requested four gigabyte of memory then we requested 12 gigabyte of memory We haven't utilized yet anything, but we already have two nodes running Let's continue So the cluster autoscaler cluster autoscaler is responsible for provisioning and deprovisioning of our nodes As we discussed the amount of nodes correlates with the requests of our pods What about deprovisioning how this part works? You can guess it works the same it looks for the requests and when Particular node utilization goes down below certain threshold by default. It's 50% It will be scheduled node or at least it will try to the schedule the node So once we understand this let's see how the HPA works So HPA the concept is pretty simple We want to increase the amount of our pods to to increase the parallelization and by that process more data so the basic and the initial trigger was the CPU or the memory. This is the most common one and Again, this one looks and compares everything to the request for example If we will take pod with the request of one gigabyte of memory and limit of two gigabytes of memory And we will set our memory trigger to be 70% The HPA will add additional replica at 700 megabytes, which is 70% The HPA will de-schedule this replica only when the average utilization of all the replicas will go below this 700 right so Another option to scale additional pods is based on the custom metric like amount of requests per second or something like that and KIDA brings very advanced capabilities. For example, you can schedule by event you have As you Kafka queue grows you can scale it could trigger additional replicas or Very convenient one. There is a crone that you can schedule on so for example if you have development environment And you want to schedule this environment down during the weekends or nights to save some money again very nice We also I also found it's very usable in production where load ramp up is predictable And I can in advance schedule additional replicas and not wait for for the actual triggers So now we understand all these mechanics, right and Basic building blocks the requests So how it's all come together in environment. So in some cases It may play nice in some cases. It may play notes very well. It could create Problems in terms of resilience. It also could create a waste It's also nice to mention here the VPA the missing part of the Kubernetes the vertical pod auto-scaler the tool that promised to bring the Vertical right sizing. However, it doesn't really work. Well, it doesn't support the HPA It doesn't support the HPA. It's not bottle proven in production grade. It relies on a decay histogram algorithm I definitely do not recommend to use it if you have high seasonality waves in your environment so I promised to simplify The auto-scaling However, as you can see, it's not very very simple The things that we so let's try to look at the auto-scaling from a little bit different perspective The perspective of what actually could get wrong with our auto-scaling So first of all, let's talk about the pod requests Our poll if we will over provision our requests What will happen? What will happen if we will simply waste a lot of money and we will also create excessive CO2 emission and you know at the end of the day We all share the same planet and any impact that we can reduce is good If we under provision the amount of the of the requests of our pods We we will get a slave reaches. We will get Different performance problems. We will have out of memories. We will have CPU trotlings We will have evictions our system will not be stable If we will not define the requests as you now understand We will simply break the entire orchestration the scheduler will not be able to schedule Podes as needed the cluster auto-scaler will not be able to add write nodes and and Yeah So let's talk about the limits now. We spoke all the time about the requests. Let's talk a little bit about limits so limits if we Over provision our limits limits acts as the Circuit breaker They are there to protect our node what we would like we would like to avoid the situation where a particular pod busy neighbor ate all the memory and cause a failure and Could cause also but the potential domino effect so if we are not if we are over provisioning our Memory limits, we will find ourselves constantly fire fighting dine nodes if we under provisioning our Limits what happens then then we will fire fight dying pods because our pods will not have enough memory on spikes And on spikes they will just die out of memory trotling, etc. And if we will not define then Then it's actually hell happens because you we will fire fight both the nodes and the pods and everything Let's talk a little bit about common Common problems in the cluster auto-scaler So first thing to consider is the right sizes of the node if you're choosing the nodes that are too big The result is huge blast radius at the moment that particular node goes down It takes with it huge chunk of your cluster if we're choosing small nodes Then what happens it creates a lot of overhead each node will contain all the demon sets Each node will allocate your IPs Many nodes will create excessive traffic between them So it's so it's everything about finding the right balance in your particular environment But practically talking DevOps is not only about technology. It's not only about metrics. It's not only about how the things are working It's also about combination of technology and methodology And when we are talking about understanding the auto-scaling of our clusters and their performance We need to think also about the process of how we improve that So first of all, let's talk a second about the pets versus cattle paradigm As a DevOps managers or as a DevOps says or as a platform engineers SRE is we operate in terms of cattle We oversee a lot of clusters with different workloads. However, they are each one of them is a pet Monitoring solutions provide us with a very good visibility into the pet We know if we know which one need attention we can grab all the data Think how you can first of all evaluate your cattle and then be able to pinpoint the particular sick pet in order to fix it Few more things to think about is Prioritization how do like you decide you decided that your particular cluster is Unoptimized and you want to improve its optimization. So how do you prioritize things there repeatability? Your clusters having multiple deployments a day There is a waves of seasonal there is a waves of load that coming and going it's constantly changing environment and this work need to be repeated you will not finish it in one in one action and To get best results of Your attempt you will need to collaborate you will need to collaborate with your R&D and developers You will need to collaborate with phenops You will need to collaborate also with and understand which cluster you're running which cloud you're running and how it built So Last thing This is a very very simple dashboard. You can build this dashboard by your own and you can also And and you can also follow pull it from our repo this dashboard brings you three values the Utilization the requests and the allocation and this dashboard allows you to simply evaluate if your particular cluster Optimized or not. Do you need to do to invest in optimization or not? If you compare your utilization and request you expect that at least your utilization is covered with allocations You expect that your requests are covering the utilization and only spikes are going above the requests and You can also evaluate how much you wasting in terms of memory and CPU like if you're using only Small small chunk of your CPU and you have a lot of CPU up. You are just simply overpaying and creating excessive CO2 emissions So, thank you very much and It's time for questions. I guess Questions anyone What is the solution? That's great question That's a very good question. I like it. So first of all start with evaluating Understand if you need to invest in that In some situations maybe maybe you will find that you don't have those seasonality waves And you don't need to invest at all like everything is working fine in some cases you may come to conclusion that you don't need HPA at all and It was to keep like five replicas running all the time and don't deal with the With the additional replicas going up and down because you're creating a lot of noise in In some situations you will come to the conclusion that you That everything is fine, but in some situations you will come to the conclusion that you need to improve and Knowledge this knowledge is like half of the answer any particular questions carpenter cluster autoscaler KIDA HPA Wow, all right. Thank you very much