 Hi, everyone, thanks for joining us today here. Before we get started, we have a couple of questions we'd like to ask you today. How many of you set the requests in your Kubernetes workloads? Okay, nice. How many of you know people that doesn't do it? Yeah, well, know people. I think one of them, though, so it's no big deal. And how many of you set, sometimes, higher requests just in case? Because you want your application to run smoothly. You don't want problems. There's a deadline. You don't want surprises. Okay, so we might have some greedy developers here. My name is Jesús. And my name is David. We work at Cyzdik. We help our customers understand their Kubernetes clusters. And we are also the maintainers of PromCAD.io, which is an open-source project that gives you curated lists of Prometheus exporters that are just ready to go. You can go there, search for the technology that you want to monitor, and it's very straightforward. It helps you configuring your Prometheus jobs, your exporter, et cetera. So there is a problem with the redevelopers, right? And what happens here is that knowing the exact amount of resources that your application needs to use is a really complicated matter, right? So we don't normally know how to do it. So in this talk, we are going to show you guys how we right-size the requests of our clusters for a lot of benefits that we'll discuss later. So first, normally a redeveloper will set higher limits and higher requests. So what happens when you set higher limits? Well, first, if you set lower limits, you might kill your application. Your CPU might throttle. You might have some bots kills by the out-of-memory. But on the other hand, if you set two high limits, you might be starving other applications in the cluster if the usage rises, right? Something really similar happens with requests. If you set low requests, it's the same, right? CPU throttling, memory kills. But if you set higher requests, at first, your application runs smoothly. But you might be doing harder for Kubernetes to allocate more bots. And also, you're wasting a lot of resources. You are buying for more CPU and memory that you won't use. So you might be wasting some money, right? Jesus, let's explain first what are the limits. Okay, let's do a quick recap. What are the limits? Okay, the limits are the maximum amount of resources that your application can use. But these resources are not guaranteed. This resource depends on the available resource that the node has. And what are the requests? The requests are the minimum resources that your application have reserved. So how is that in a real scenario? Yeah, well, for this presentation, we are going to right-size a real application. Well, not real as in real, but this application existed in an actual cluster. So we have set one gigabyte and one core CPU for the limits and the same for the requests. And we are going to start from here, okay? Of course, right now the application runs perfectly. This is how we, if we take a look at our dashboard, this is how we see. We see four gigabytes in one line. Why is this? Because we have four pods. We have four replicas for our workload. So we have four gigabytes of memory. And why there is just one line? Because we have the same limits and requests. So in the chart is the same, okay? So before we start, we need some theory first. So we need to explain what these are QoS classes. Quality of service depends on the memory or request that you set in your workloads. If you set the same amount of requests and limits, you will have a QoS of guarantee. That means that your pods will be the last of all pods addicted. If you set the limits and the requests, but the limits are not the same, then you will have a QoS of a bootstable. That means that your pod may be or might be addicted. And if you don't set limits or requests at all, then your pod will be the last of all. So that kind of pods will be addicted the first. Okay, so let's put it on example here. If you have one pod, your application runs in one pod. For example, a database that is just one stateful set. You don't want by any means that that pod gets addicted, right? You don't want your database to be down. You just have one pod. So the story here is obvious. You need to set the same limits and requests. So if current has problems, your database will be the last pod that the cluster will evict. But sometimes you need to optimize your limits and requests. And maybe there are applications that you can take some risks. For example, an API gateway. If you have an API gateway, you have, say, 10 pods, 10 NGINX pods. And if two pods are addicted, it's not a big deal. We'll talk more about this later. But you can take the risk, because the benefits are a most efficient, more optimized limits and requests, okay? So you could go with the possible strategy here. And the best efforts, well, I don't know. Any non-critical application, I guess. It depends on the use case. So we're talking about that some pods can be evicted. So if Kubernetes evicts your pod, is it such a big deal? No, it's not a big deal. Actually, if your application is running in multiple pods, or you don't have just a single pod, it's normal. I mean, Kubernetes works like that. It's the natural thing in Kubernetes. And after any things you do in the request or in the limits of a workload, the scheduler will try to arrange your pods. So it will be eviction there. So don't worry about where your pod is. Okay. So normally, usually, setting limits and requests is a good idea. Okay. Before talking about how to right-tie your request, let's see how we should be setting our limits, the strategies. How can we do it? For limit right sides, we can use two strategies. The first one is the conservative. That means that we trust more in the request than the limits. So in this case, for example, a database that has several pods, we want to assure that we are using the request more than the limits. And that way we will have enough room to move it. And on the other hand, in that one, we could use like 25% more than the request. On the other hand, for the accuracy first of the day, we trust more in the limits. So for example, if the application has some peaks, then the limits will absorb that peaks. And we are saving money with that. But your application might run out of resource in that case. Okay. So let's get started with the right-sizing strategies. But first, why are we doing this? What are the key benefits that have right sides in your request? Well, first of all, you'll have a better understanding of your application. You'll understand how your application performs and what your application can do in the cluster. Also, you might discover some issues that your application have related to performance that you didn't know because your application was running with a lot of resources available. So maybe your application has some peaks of usage because of a misconfiguration or a software problem. And now that you have limited the request, you'll see it so you can fix it. Also, you'll make the most of the resources because maybe you're sharing your cluster with another team and maybe if you start right-sizing your workloads, the other team will have more room for their applications. And of course, you'll save money. That's obvious because you'll be buying less resources for your applications. So we have a plan here. First, we need to monitor. This is the core thing we are going to do today. This is the most important thing that you need to do with your applications. Monitor them. You need to monitor a cluster. You need to monitor your application. Then you need to know what are the requests that you have to set. You have to discover this. Then we'll do this resize, right? We will perform the resize. But then you need to monitor it again because right-sizing your request isn't something that you do once. This is something you have to do quite regularly because your application is a living thing. So maybe today is one number, but tomorrow or the next month is another number because usage can vary. So please monitor your applications. We can tweak it and then we back to 4 for this monitoring. So we need to monitor. What are we going to use here? Of course, Prometheus. We love Prometheus. Prometheus is the standard for monitoring Kubernetes. We all use Prometheus, I guess. So what information are we going to retrieve from Prometheus? First, we need to know what is happening inside the containers. We need to know how many resources are your containers using right now. And for that, we are going to use the C-advisor exporter. But also, we need to know what happens inside Kubernetes. We need to know how we configure it, what are the requests and limits that we set up. So we need the kubestitmetrics exporter, KSM, for getting that information from Kubernetes. So how can we detect unused resources? So in order to calculate the percentage of an usage memory, we can create this... We can use that math. We can get the memory request minus the memory usage and all of that divided by the request. So that will be the total percentage of memory unused. And if we see that in a real scenario, we can use that query that it's doing the same thing but without the percentage. So we are getting the request minus the memory. I think in that query, it's upside down, but that doesn't matter. And we can get one of the workloads that are unused more memory. Okay, so what if instead of using... Instead of aggravating by workload, namespace, and cluster, what if we aggregate this just by namespace? Then we might are... We are might creating the kubestitlblame command because this is like a picture of what are the teams of the projects that are wasting more resources. I'm just kidding. Don't do this because you don't want to fight anyone. Okay, so we monitored. We have a lot of information. How do we use this information? Now the next step was to calculate the request. So there are two ways of doing this. The first one is the conservative way, the conservative strategy. This is using the max... This is calculating the max number of resources that your application is using, and that's your request. These have some benefits and some drawbacks. The benefit is that you will be full of room for your application. The drawbacks is you might be wasting some money anyway because what if your application has peaks of usage? If you said the request as the max, there will be some peaks that you'll be paying for, but you won't be using when your application isn't in one of the speak. But this could be a nicer starting point for a start to tweaking because there is not a formula. There's some trying here. There is another strategy, maybe a more aggressive one? Yeah, you can use a more aggressive strategy that is calculating the average of the consumption of your application, and that way your application might use some peaks of memory but that's okay. The limits are there, so the limits absorb that peak, and if you need to tweak it up, so you have to increase, that's okay because it's a living thing, as you say, so we can increase it in case we need it. So let's first recite the application that we saw before in the previous slide. We are going to help the request, the memory and the speak view, and let's see how that's the impact. Let's check the impact. Okay, let's check the impact. In that panel, we can see the unused memory of your application. What is the amount of unused memory? We can see that now we are wasting less resources. And in that one, we can see the percentage of the resources, the memory versus the request. So now we have increased it, the percentage of the memory that we are using. So we are fine, but we are not in our final point. So it was a good starting point, right? Now we are wasting less resources, which is nice. But we have some room for improvement here. We monitor it again, and we could tweak it again. So what if we halve it again? We go from 512 to 256. What happens? Let's check the impact. So in this screenshot, in this chart, we went even further. We went from 2056 to 128. And this is like all the process. So when we requested one gigabyte, we were using 20% of the resources, which means that we were paying for 100%, but we were only using 20%. Now that we set 128, we are using a boat, a boat 100%, because we have the limits there. And this is where we want to be. We want to use all the resources that we are reserving, because that way we won't be wasting any resources or any money. This is the same image, but the other way around. This is the quantity of resources that were wasted. We went from a lot of percent to almost zero or even zero after setting to 128. So this is it, right? This is what we wanted. Now we are wasting any resource. So we are happy, right? Well, we had to talk about the real thing here. Real thing, okay. We had to talk about the money. Everyone likes money. Well, I do. So let's see how much we are paying before the resizing. So in that, we were paying like $100 per month for the request. And after resizing, we were paying $25 per month. So it is an improvement here. So we improved it in $75 per month. And just with one workload? Yeah, we did it only with one workload. Imagine if we do that in the entire cluster, we could save a lot of money. Okay. So what are the conclusions here? We learned that our application was lighter than we thought. We thought it was a more complex application and it runs with less resources. We also saved $75 a month for just one workload, which is great. But the most important thing from my perspective is that we are starting a workflow that is easily repeatable to monitor the resource usage. We created some dashboards, depending on its use case with those queries, with those metrics. You can create different dashboards. But you can have all the information there, leaving there. So you can use that same dashboard for all your workloads. So you can take a look from time to time and see if your cluster needs some right sizing. So this will help you to keep keeping your applications light in shape. So that's all that we wanted to show you guys today. Thanks for listening. Have you ever been in a situation where a workload uses a lot more resources at the beginning but then goes down? How would you set resources and limits in that situation? Yeah. The question is what happens if the workload uses a lot of resources at first but then it lowers it? Can we manage this situation? Yeah, that's why we do an average for overtime. So you don't have to take, and even if you use a lot of resources at first, that's why the limits are. So the limits will be absorbed that amount. So you have to absorb the limits. In that case, the problem is the limits. So we have to put the limits there. You know what are the peak. So then you have to request about that. So after that, you know the average of your application. So that problem is solved in the future. The beginning of the problem is the limit. You have to put the limit right. And this is something you are doing regularly. So it will be fixed after a few time monitoring. You got me? Okay. Hey, here. Do you think the amount of money spent and manpower spent on resizing is always less than money saved after resizing? Excuse me, can you repeat? The amount of money and the manpower spent on resizing work is always less than the money we save after resizing the workloads? You mean what happens if you need to spend more money? I'm saying resizing each workload is a lot of work, right? So you need a lot of man hours for doing that. Do you think that will actually benefiting after the money? You see the savings after resizing? You see what my question is? I have no question. Do you want to put that question back for me? Excuse me. So if it takes you a week of work to save 100 bucks, but your salary is higher? Yeah, that's a good question. Sorry. My question is what if you are saving a lot of money, but you are spending a lot of time and effort, right? Yeah, this is a good question. As we said, we worked in this dashboard once, and then we are using it for all the workloads. So yes, at first you need to be sometime writing queries and knowing how your applications work. But once you have this, you can replicate this anytime. So well, it depends on your use case, but I think it's worth it. Thank you. Yeah, that's a really nice question. Thank you. The question is if we are using 100% of our resources, that could be misleading to our developers. They might think 85% is better. The application never working like in a stable. Probably are like that. So yeah, there is like a range like 19% or 110%. So that is the sweet spot. The thing is you have the limits there. Yeah, the limits are for that. Yeah, so you using 100%, it's a good idea because you are taking the best of your requests. And if there is something that makes your application go even further, you have the limits to cover that, to absorb that. And of course, if that go even further and you have memory issues, as you are monitoring this, you'll know this and you'll be able to fix it. So yes, that's a good question because at first I was thinking the same. So I have 100%. This is halfway in the edge, but no, no, no. It's because you have the limits for that. Thank you. Hi. Actually, my question is related to the same question he asked. So I got something like a five, six different environments and each environment I got like probably more than 100 applications running. If someone is an SRE, if he's sitting around and thinking that, okay, I've said quite a lot of money by tweaking my apps and I go to bed and then my ports get evicted. So I have to make decisions like do I make my life easy or do I save money or just vice versa. I think this problem had been going on for a while. I think there is, it's still a bit of a firefighting kind of like a manual approach. It starts well and at one point it can just give up. Like someone said that initially ports are fine and eventually load goes up, then you constantly have to keep and watch on where the metrics where the load is flowing through. It might become a bit quite cumbersome job to constantly have to tweak your apps. And then if you have hundreds, more than 100 applications and then each has got, and I got four or five environments, where do I go? Thank you. Excuse me. Can you repeat the question again? So what is the question? I don't hear it. The microphone I think is a little bit low. It's not actually a question. It's like it's a little bit difficult to actually, it's good to start with this kind of right sizing, but if you have say 100 workloads, 100 applications and I got four, five different environments, that means four times 100, 400, how can I maintain this thing? It starts off from the beginning and over a period it keeps changing. The question is what happens when you have a lot of nodes, a lot of clusters, how can you manage all this work? The thing is in the graph we saw, we can prioritize, because we know what are the worst applications, the applications that are wasting more. So you can focus in that and then you can improve it and you can continue with that. Yeah, it depends. The other person asks something like, is it worth it? Well, you can start with what's the name space that is wasting more resources. You can start there and you don't need to do it in one night. So yes, it depends on each scenario. You want to have that slide with like the guaranteed burstable and best effort. Yes. But like a different scenario that wasn't up there was what if you set your requests but don't set limits? Like is there any scenarios that you would advise doing that or have you ever seen that being done? Do you say if we set requests but no limits? What will be? Yeah, he's asking. In what place will it be? Related to, this is not working. If you put requests but no limits will be vastable. It's considered burstable, yeah. Is there any scenarios where you would, does that make sense? Yeah, if you don't set the limits, it's like, if you put limits very higher, it's like no limits for Kubernetes. So it will be the burstable. Yeah, I was just curious, is there any scenarios where that makes sense, like you gave an example on the previous slide, like database or API gateway? Sorry, can you repeat? Is there any scenario where that would make sense to not set limits? Well, yes. It depends on the use case. Yes, if you have... Sorry, the question was if it makes sense in any X-scenario to not setting the limits? Yeah, if you have an idealist scenario where all applications can be down, or can be evicted without any problem, yeah, you can put no limits. So every application that needs that CPU will receive the CPU. That is for CPU mainly, not for memory. For memory, it's not... But for CPU, yeah, sometimes, it depends on all teams of your class because sometimes you have a lot of teams that don't talk to each other and you don't trust in the other team, so take care about that because maybe that team is disturbing. But if you monitor your application, you can answer your own question. In that scenario, you can know what's happening. So I would go without limits and do something and try it. This is for CPU. Only for CPU, yeah. My question is similar to what you were just saying about the limits. How do you handle a case where you're almost at 100% but your traffic is very burstable and you need a little bit of overhead at short periods of time? Sorry, I don't know where you are. Sorry. I was looking at the end of the room. Just how do you handle burstable network traffic where a little bit of peaking you want to be able to grow, otherwise you're going to run out of memory and your node's going to die or your pod's going to die and then it gets worse and worse and worse. What is the question? What happens if we go above that 100%? Yeah, per period of time, yeah. It's not a problem. But 100% of the request, pressure the request, right? It's not a problem because for that is the limit. The limit is for that. Yeah, if there's a problem, if you go above your 100% of the request and you have problems with that, you should set the limit because you might need to tweak the limit a bit if you need a guaranteed state, you might want to increase both limits on request. But it depends on any scenario, sorry, because there's no formula for this, but if you monitor this, you could have some answers. So first of all, really great talk. I really like how systematic your approach is for this sort of inefficiency and problem. But the question I have is that the system you have, it involves an iteration kind of manually changing the limits and requests. And I was wondering if you have any knowledge of any kind of automation to do that, so that rather than a human having to keep pushing, commits to tweak the numbers of the limits and requests that you have some operator that would go in dynamically change it based off of the commercially as symmetric status category. The question he asked is, is there any automated way of doing this instead of being manually tweaking the queries and the limits and the requests? This is a very tough question because this is a very craft man operation because it's a scenario, it's application. Even two teams could be using engine X and the limits and the requests of each team could be completely different. Maybe it's happening like the first question that in the beginning you have a big, very big peak and if you do it that automatically, maybe you are missing. So I'd go, for example, if you go to the Cystic booth here in the salon, you can see how Cystic, for example, do this. We have this thing called Kubernetes dashboards that give you this information and can help you finding what, without you doing a thing, finding what's the right request, the right limit. But I don't think there's an algorithm that you could deploy in your cluster. Yeah, I get it. We have a few formulas. We have an average and max. And we said that that was an starting point. So at first you have nothing. With the first formula you have something. So you could automate the average formula. The max is the more, sorry, you could automate the conservative strategy, which is the max. Yeah, I wish I had a better answer for you. But yes, it's quite like that. You could use, for example, KIDA. No, sorry. No, because that would be with a VPA. Yeah, we could use the current VPA for right sizing the limits and the requests. But that wouldn't be the most efficient way of doing this because you'll have some... But yes, you can try. The good thing about this is the numbers are in Prometheus. So just take a look. Okay, this, I think we are... Great one. Okay, thanks for joining us today. Have a nice day.