 Welcome to Cloud Native Live, where we dive into the code behind Cloud Native. I'm Annie Talvasto, and I'm a CNCF ambassador, as well as a Senior Product Marketing Manager at Camunda, and I will be your host tonight. Very excited to have you everyone join today. So every week, we bring a new set of presenters to showcase how to work with Cloud Native technologies. They will build things and they will break things, and they will answer all of your questions. So join us every Wednesday to watch live. You can also watch On Demand later if you miss an episode. So this week we have Andy Südermann from Wehrwinds to talk about building stability. But before we get to the topic of today, another exciting thing that's happening at the CNCF Universe at the moment is the KubeCon Europe co-located event CFPs are closing soon. So if you have any talk ideas, or you need to come up once, you can go ahead and submit them now. Soon we'll be too late. And as always, this is an official live stream of the CNCF and as such, it is subject to the CNCF Code of Conduct. So please do not add anything to chat or questions that would be in violation of that Code of Conduct. Basically, please be respectful of all of your fellow participants as well as presenters. So with that, I'll hand over to the speaker of today to kick off the presentation. Great, thank you. So today I wanted to talk about resource request and limits and those who know me or are familiar with some of the things that we do at Wehrwinds. Resource request and limits are kind of a pet peeve of mine, if you might say, or just a thing that I commonly harp on or talk about. And so I feel like we spend a lot of time telling people to set their resource request and limits on all their workloads. And we tell them, you know, different strategies for doing this and things like that. But what we don't talk about as much, at least out of the open, is what happens when you don't set them properly. Or what we don't get to see very often, except in real life clusters, are some of the negative side effects that can happen. And so I've been wanting to do this for a while. I put together some demos of kind of the different things that you can break. So hopefully today we'll get to break some stuff, see some things fall over, and have an idea of why that's happened. Perfect. So all the code I'll be using today, let's raise the screen chair up. Sorry, I can't actually tell. Can everybody see my screen here? No. There we go. Awesome. Thank you. All right. So everything I'll be doing today is in a GitHub repository that I made public this morning. So if you want to go tinker with this, what I have here is I have a GKE cluster. It's on 1.21. I have just N2 standard two nodes. So they've all got two CPUs and eight gigs of memory. I've enabled node out of scaling across the three zones. So by default, I have three nodes. I can scale up to two nodes per zone, giving me six nodes total. And then I have an application running in this cluster. So if you saw my last livestream a few months ago, I used the same app. It's kind of a fun little app. You can just go to the app and you can vote for where you want to have lunch. And the counter goes up. We see how many page views there have been. You see the name of the backend that we're connected to. And all of this is stored in a DB in the cluster. So if we go take a look at our cluster in the Yelp namespace, we have the app server, which is kind of the backend. We have the DB. We have the UI. And there's a Redis server for caching as well. And all the code to deploy this into the cluster is in this repository in the app directory. So if you want to deploy it to a cluster, you can just keep ctl apply that app folder and get this app running by default. It creates a load balancer. We have just an IP address here. Very simple setup, really easy to recreate. And right now this app is, as far as we know, functioning. It seems to work. I can click. I can vote for stuff. And it seems to be doing its job. And we see it's using relatively low amounts of CPU and memory right now. So we have one millicore and 36 megs of memory going on here. And so we have a happy app. That's great. So now the first thing I'm going to do is I'm going to show what happens when you deploy a noisy neighbor next to an application that's properly configured. So this app, if we look, we're going to focus on the app server today. So that's the back end for this application. It tends to have the most dramatic swings in resource utilization with traffic coming into the application, which is why we're going to focus on that. And if we take a look at that and we scroll down to the resources section, we see we have a resource request of 200 millicores and a limit of 200 millicores, a memory request of 100 megabytes or megabytes and a limit of the same. And so what we have here is a guaranteed QoS class. And what this means is that Kubernetes and the kernel are going to try to, they're very best to give this all the resources that we have listed here. So what I'm going to do now is I have in here this core channel. It works with an open source tool that we have. Since I'm going to deploy a Helm chart and I'm going to deploy a Helm chart, two different Helm charts that are going to spin up a program called stress, which is just going to eat up CPU and memory in the cluster. And these two Helm charts that I'm about to deploy, these pods are going to spin up and they're basically just going to attempt to eat all the resources in the cluster. So then there's an audience question. What tools used to show pods, CPU slash memory usage? Thank you to Moana Taran for the question. Yeah, no problem. So I'm using here, this is K9s, K9s. So it's just a kind of a 2E or a terminal user interface for interacting with Kubernetes. What we can do also, essentially what this is showing us is if I do a kubectl top pods, I can see the current utilization of the pods in the namespace of my current context, which is the Yelp namespace. That's essentially the same thing I'm seeing here. The nice thing that K9s or K9s, I'm not sure how people say that, does for us is that it shows us the percentage of the request and the percentage of the limit that we're using as well. So that's percent CPU requests, percent CPU limit, percent memory request, percent memory limit, which is kind of a nice thing to see as we go through the rest of this demo, which is why I'm using K9s in this case. Great. So I'm going to go ahead and install that Helm chart and it's going to create the namespace called stress and it's going to deploy those pods that eat all of the CPU memory in this cluster. And what we're going to see, because these pods are trying to eat up essentially as much memory in CPU as they can, but we're not requesting any CPU, we're not putting any requests or limits on these pods, what they're going to do is hopefully succeed at just overwhelming the node. And so we have a couple of different ways to look at this. We can do a QCTL top nodes and we can see the CPU percentage and the memory percentage utilization currently. We can also see this in K9s by going to the node view in K9s. Or we have another tool that I typically use that we'll use a couple times throughout this demo that's called cube capacity. And if we pass it the usage flag and make this a little bit wider because there's a lot of output, there we go. We can see the CPU requests and limit totals for the nodes. So what every pod on that node is requesting, the total limits of all of the pods on that node, and then the current utilization. And you'll notice that all three of these nodes are now pegged at 103% CPU usage and over 100% memory utilization as well. And hopefully our view down here will start to catch up with that. So yep, there we go. CPUs at 2000, we have two CPUs on these so that would be full utilization. It's 103% of the available CPU on the node. And so now what we're going to do is... Well, first I'm just going to go click on the app and see if it's still working because that's the easiest way to check. But another thing we have in place for this demo that's going to be useful is I have another file in the repo. It's a JavaScript file called load.js. And what this is going to do is I'm going to use a tool called K6 which is a load testing tool. And it's going to run a load test against this app. So essentially what it's going to do is it's going to go click on those buttons. So I'm going to go click on those buttons. The default is set to 10 iterations. So it's going to go in, click on each button, and then, sorry, load the main page, click on each button, and it's going to do that 10 times. And then right now I'm using two what they call virtual users. So it's going to click on... It's going to use 10 different essentially processes to do that. So they all kind of happen in parallel. And if we see here at the request duration, the average HTTP request duration for this test was 78 milliseconds. And the average iteration time was about 4.8 seconds. This is the baseline of what I expect for this app. So if I had run this before we started stressing the cluster, this is what we would have seen the app perform at. So we can see that the nodes being fully utilized and completely overwhelmed by this other application that is improperly behaving are not affected by... Are not affecting the application that we're running in our cluster because we set our resource request and limits to that guaranteed QoS class. This is why for anything that is critical or important to you, I generally recommend that you use that guaranteed QoS class. Set your custom limits exactly the same and set them to a reasonable number that you've tested. Yeah, great. And I see a comment from Gary. Sorry, I showed up late. Not sure if I missed it. Was there a link given for any of these tools? I don't think we've given links so far, but we can see if we can add some during the duration of this webinar live. Definitely, definitely. If we can share the link to the GitHub repository, there's actually a section at the bottom that has a list of tools used and some links to those as well. So if we can just share that initial GitHub repository URL. That has been shared if it's the GitHub Fairwinds Resources demo. That's the one. So if you go to that page and scroll to the bottom of the readme, you should see most of these tools. Perfect. So there you go, Gary. You can get the tools from there. Thank you so much for asking a question once again also. Yeah, thank you. So now we've seen kind of what, A, well actually there's one more thing to show. Sorry. So the other thing we're going to do is we're going to go take a look at the pods and that stress namespace because we're spinning up a whole bunch of pods that are attempting to use way more CPU and memory than is available and they have no resource request and limits set on them whatsoever. And we're going to see something particularly ugly here. We're going to see a whole lot of pods that have been evicted because they're trying to use. So many resources and because they have essentially the best effort QOS class, we haven't set any requests or limits. We've said just try to run this, see what happens. We're going to see them get evicted as the node has the condition memory pressure. So we're running out of memory on the node. We need to find some pods to get rid of, to make space for other things. These are the first on the chopping block to get removed because they have no resource request and limits set. So now we've really seen the, A, the detriment of not setting any resource request and limits. You're going to see pod evictions. You're going to see potential issues with applications running. And then also the benefits of setting your resource request and limits properly on your critical apps so that they're not affected by other workloads in the cluster that may do bad things. Let's see, any other questions about that? I think we're all right. So I'm going to delete the stress namespace and we're going to stop stressing this cluster so much. We may have noticed here in our node list, we have six nodes now. So we've scaled up to our maximum number of nodes because of all of the extra pods I've been attempting to schedule and all of the memory pressure on these. It's also interesting to note that if we were you cluster auto scaler in this case, if this wasn't a GKE node pool, it's possible we would not have scaled up the cluster because there are no resource requests for those that needed to be scheduled. And so it may not have known the pods would have gone into pending state and the cluster auto scaler would have known what type of node to spin up. So in another type of cluster, this may have had even more detrimental effects not allowing the cluster to scale up. All right. So that's the first demo that's covered in the readme kind of describing what I did and what the different effects are. And we're going to go on to the second thing that can be a problem, which is not setting your CPU limits correctly. There's been a lot of debate in the community about CPU limits and CPU throttling and what you should set your CPU limits to and Linux kernel bugs that resulted in increased CPU throttling more than you would expect. And so I'm just kind of cover what that looks like when you're experiencing a lot of CPU throttling. So the first thing I'm going to do is I'm just going to put a little bit of stress on the cluster. I'm just going to schedule some pods that use some extra CPU just to kind of create a little bit of extra noise in the cluster while we do this. So this is the same app I was running before but we're just stressing CPU and we're not running nearly as many of the pods so that we don't get quite the same behavior. And then what we're going to do is I'm going to go take a look at the app server deployment and I'm going to edit this and we're going to find the resources block and I'm going to turn this way down. So originally we had CPU requests 100 limits 200 or I think it should have been different from that but that's all right. I'm going to turn those way down to CPU requests of 10 millicores and a limit of 10 millicores. And we're going to take a look at the pods in that deployment we're going to see what happens. So when the container creating state and we're waiting and describe this we're still pending. Let's pull it again see if we see what's happening here pulling the image I'm surprised this is taking so long. Yeah I think it's the demo I get to see every time it's every time. Yeah live demos are a dangerous thing. Sure. All right. So we've started the container it's running. We're waiting for it to go into a ready state and now we start to see readiness probes failing. So we're trying to do a pull or a get request to the API endpoint get stats as our readiness probe in order to tell Kubernetes when our pod is ready to receive traffic and these are just failing. And the liveness probe is now failing as well. It's the same API endpoint. So I would expect that. And if we try to get the logs for this pod grab the previous oh there's no previous yet. Grab the logs and it's not logging anything. Nothing's happening. So essentially what's happening here is this we have throttled the CPU down so far that this app can't even serve request it just can't serve these requests. So if you have random intermittent failures of probes that you can't explain if you have a pod that doesn't come up just because the probes are failing and there's no logging or there may be there's some logging but it's very intermittent. This is usually evidence of CPU throw. Now you can go look at graphs from Prometheus or from Stackdriver and construct graph disease or whatever your monitoring tool is. But first thing I always look at when I see just unexpected probe failures I know the app should respond on that that endpoint is the the CPU limits specifically the limits because that's what controls the CPU throw on. So we're going to go back to our deployment and we're going to edit this again and I'm going to turn this up to something a little bit more reasonable 10 millicores is tiny we originally had 100 millicores so let's bump this up to 40 maybe we just started this thing up maybe we kind of expect it should only take 40-50 millicores we want to be conservative we don't want to give it too many resources up front because we want to have the most efficient cluster possible and so we'll see if we can get our positive start with this new setup. If we look at the last one that tried to start which is now terminating we see it was well over its CPU limit and request and it crashed twice probably killed because of its failing probes and so it just wasn't in a good state so we're going to try this new one here and we're going to see how it goes so we take a look at the logs on that it's actually starting this time so we have something good going on there and it's running now we're just waiting for that readiness probe to succeed and there it goes so we're going to finish the rollout I'm going to let it stabilize before we do anything else here it is also fun to note that due to the wonderful features of Kubernetes our app has actually been functional the entire time throughout this that new pod because of our deployment strategy just didn't come up but our other two pods were still running and so if I had gone and clicked on the app or run my load test here we would have seen that the app still fully functional all right so we have one running two running we should only have two so right now I have pinned the HPA to oh no I haven't hang on let me fix that I'm not supposed to do that till later and then we have a question from Mark about the Slack channel for the chat I think it was linked or told a bit earlier yes there we go you can see there so you can join in there but obviously you can also ask the questions as you did Mark already we had the chat in your preferred streaming provider so that you already are doing really well on that one perfect and then actually there's a question from Muhatham again sorry if I'm failing in pronouncing the name by the way um so they say looks like still CPU takes 100% of 40 m better to increase limit CPU 100 m and then thank you so much to Jonathan for saying awesome great that you're excited to be here great uh that's a great point about using 100 of 40 millicores as the limit right now we're sitting at 2% and 62% on our two pods that are running with I would assume very little to no traffic unless everyone watching the live stream has gone and started to click on this thing so uh and that is actually part of the demo so I'm going to talk about that here in a second so now we've got we're sitting at 40 millicores our app seems to be running we're passing the probes everything's kind of stabilized out and so I'm going to run my my little benchmark thing here that we ran earlier that k6 load test and I'm just going to take a look at the numbers here see what happens and you may have already realized it's taking far longer than it did last time and we look we see our average request duration was 500 milliseconds and our average iteration was 12 seconds so we've almost doubled the amount of time it takes to run this test and this is a very small test this is not indicative of any real low real life traffic or anything like that uh if I ran this for a lot longer turn up the view we have to use it 10 but change the iterations to like 10,000 and just let this sit we will see that the app will use 100% of that that cpu and it's just sitting there and getting throttled over and over and over again this would be another good opportunity to go take a look at our metrics graphs and see that throttling in action um you have to be a little bit careful with those kind of graphs because sometimes they can be misleading sometimes you see throttling and you're not sure if it's you know consistently a problem what I prefer to do is take a look at actual latency just look at your application performance look at those golden metrics and see if your app is performing the way you expect and in this case at 40 millicores we know that our request duration of 500 milliseconds is way too high that's just not right for what our application what we expect our application to do and so the uh the suggestion in the chat to bump up to a uh a limit of 100 millicores definitely a great idea so we're going to do that um let's find the resources block and bump this up to back up to 100 and 100 and hope our app starts to perform a little bit better so it's also interesting to note that during that that test we never actually saw the cpu spike so cpu throttling is is a very complex mechanism it's i've watched a few videos on it i'm not sure i fully understand it uh but i i understand the effects of it and so if we see that our app is performing and our limits are too low we're seeing a lot of cpu throttling turn those limits up even if you're not necessarily seeing full utilization all the time um it's probably indicative of a problem with your resource requests and limits so we're going to wait for this to stabilize again looks like we have two new running pods that happen a lot faster this time because we're not being cpu throttled quite so heavily uh and let's run our test while uh while we wait for that as well there's another question from Antoine is it a good idea to profile applications to know worst case cpu usage and then adapt it i think it's a great idea to profile your applications i think most people uh typically don't do that or don't have the ability to do that or just don't spend much time on that but a great way to understand your application performance you are correct in the last comment as well last comment is resource cpu is not i say this or i'm assuming you mean uh request cpu request is not the same as limits uh and you can keep the request low the limits high and that is definitely true but i am uh going to cover that a little bit later in the demos and so uh we'll talk about that in a few minutes uh so we ran our tests were back down to 120 milliseconds still a little higher than uh what we started with i think we were at 200 on the baseline so i'm going to go back and uh change this again and we're going to go up to 200 probably more than i actually need but not a huge deal for this demo we're just going to double it see what's happening and antoine you are very welcome thank you for asking questions keep the questions coming it's a lot more fun when people are interacting all right so hopefully we've got this down back under 100 milliseconds let's see what happens all right yep we're back down under 100 milliseconds so 200 seems to be a pretty decent sweet spot for us i'm going to leave it there for now and we're going to move on i believe the default in the actual yaml that we use to deploy this is 200 for both the request and the limit it's set that way because we want to start with the best so if you go to run this demo yourself or tweak this you you should start with that 200 so that's the end of the cpu throttling demonstration uh i'm going to move on to the third demo much simpler than the last one what happens if we just turn down the memory requests and so we're going to edit this again and this one most folks are probably familiar with this is a really common thing it's relatively straightforward because the reaction to running out of memory is much easier to understand than the then cpu throttling cpu throttling as i talked about it's very fairly complicated uh what happens when we turn the memory limit down is we're just going to keep getting killed over and over again so if we go to describe this we see the last state was terminated we are oom killed or out of memory killed the exit code for that is always 137 you may not necessarily see the reason as um killed you may just see terminated but if that exit codes 137 you know there's not enough memory there and your pods just going to sit there and crash over and over again so we're going to go back and uh we're going to edit this again oops wrong button edit and find that resources and we started i think 50 so let's pump this up to 20 because you know obviously 10 10 megabytes was not enough for this app to even start just like in the cpu one we weren't able to even get the app running on that but let's pump it up to 20 and see if maybe maybe it'll start on that this is a little bit harder to do nope it's not going to even start i believe this app uses the most memory at start and so we're gonna go back up to 40 here another great place to test this out in this app is the database the database uses a lot more memory but i'm trying to focus just on the app server today all right see if this new one can come up 40 megabytes yeah there's another comment from moral i think so bad for then pronunciation today um so if java-based admin requirement 512 milliliters you are correct but this is not a java-based app so as far as i know i actually don't remember exactly what it's written in we can go find out this is the repo it's linked in my repo and it looks like yeah front-end's javascript back end is i'm going to guess ruby by the uh by the github analysis here i'm not going to go dig through it because that's not important to the demo so much um so we're at 40 megabytes we see uh just at steady state well i'm not running any traffic right now we're using 70 percent of our memory limit and so i'm going to run that uh load test again and i'm just going to let it run for a few minutes here and we're going to see a lot of times what we'll see is we'll see intermittent boom kills so things won't necessarily um you know the app will start up but under load it will start to fall over this is an opportunity to either adjust our memory limits up and have a little bit of buffer for it to surge or it's an opportunity for us to turn on a horizontal hot auto scaler and that's usually a much better option to scale horizontally rather than vertically um but we will see this apps not so much memory bound as it is cpu bound it seems to use a fairly consistent amount of memory and so i'm not going to likely get a ton of results out of this but we'll see what happens all right so we're up to 85 and 83 percent on our two little pods here we go take a look we'll see how much traffic we're at 22,596 requests we've done another couple thousand or so the app is still running just fine um and we can take a look at the stats and look for duration htp request duration we're at an average of 67 milliseconds so what a p95 of 102 so it's actually running quite well and another thing to to think about here is that 92 percent memory utilization might be a good thing uh that might be exactly where we want to be so if the apps if our golden metrics or our golden signals aren't showing any issues if our latency is still where it needs to be 92 five i'm just going to let it run great and thank you so much for all the questions so far and then there's another one from Antoine thank you um is there a way to reduce the delay between the moment more resources are required and the moment a new node have been added to the cluster reduce the delay between the moment more resources are uh predictive auto scaling that would be the golden goose wouldn't it um i don't know of any great things up there there's a couple of solutions for sort of uh creating buffer space to reduce the amount of time it takes to scale up um the um it's the node over provisioner is a common solution where you essentially just run a very low priority pod that sits and holds those resources available and then automatically gets evicted when those that those resources are needed by something more important um and so essentially you keep those nodes kind of pre-warmed um so that can reduce the amount of time it takes to scale up as far as predictive auto scaling you really have to know like the patterns of your traffic coming in to be able to do that and so it's a much more complex problem to solve and i don't i don't know personally of any great solutions out there for it uh generally the solution is to just scale more aggressively so turn your targets down or uh things like that so we're still running at uh 93 percent here we uh have an average eight uh request duration of 67 milliseconds so or 69 milliseconds so we're doing quite well here even at 91 95 percent memory utilization so i'm going to leave this i think it's a great spot to be at um i don't really care that it's at 95 96 97 percent as long as it's not occasionally getting oom killed if we do start to see those occasional oom kills then we may want to increase that just a little bit but as i said before i don't think this app's memory bound it's very much cpu bound uh and so i'm not going to worry about it so that was the easy demo boom kills very simple we all relatively understand them um so i am going to move on to the fourth one and this is my favorite uh because we see this a lot in the clusters that we run for our customers and that i've run for people in the past um it's very common to think well i know my app uh i know my traffic is going to be bursty i know that my resource utilization is going to be bursty you know i don't need to request as much as my limits so i'm going to take my requests to my limits i'm going to use that um burstable qos class and i'm going to set them really far apart so i'm going to uh request 10 millicores because that's what i need to start with you know that's that's really all it needs to get going it's kind of what it uses its steady state uh and i'm going to set my my limits and i'm doing this backwards on the screen from kind of talk while i type um but i'm going to set my requests to 10 millicores which we already know is far too little for this app uh from the earlier tests that we were showing on cpu throttling but i'm going to set my limits of 500 millicores so our limit is way up there we're not going to get cpu throttling um and uh so i'm but you know i i don't need to request as much so it's common thing very common thing to do uh and i'm also i'm going to do the same with my memory uh this isn't actually in the script i'm going a little bit off script here but we're just going to see what happens uh when we do that i'm going to set the memory requests to 10 millicores and the limit to 100 uh which is way higher than what we had before and so i'm going to set that and at the same time where this really becomes a problem um because um it's not so much a problem when you're running a static number of pods uh it can be a problem but where this really really becomes a problem is when we start using horizontal pod out of scaling so i'm going to edit that hpa horizontal pod out of scaler and i'm going to let it scale way up so i'm going to say minoreplicas 2 max replicas 200 and what we're going to see here um if we take a look at that hpa we'll see it's a cpu autoscaler uh and we're targeting 50 percent cpu utilization and the interesting thing about this is that see uh horizontal pod out scalers based on request and scheduling is also based on request and so by requesting only 10 millicores but allowing a limit of 200 or whatever i set it to 500 that huge number um we're going to see some probably interesting behavior in our cluster um and this is where i like to go back to that cube capacity tool um so let's just keep watching this for just a minute um and actually i'm going to keep running load against this and i'm going to run a little bit more load so say we got quite a surge in traffic i'm going to use 30 virtual users instead of 10 but we're going to see what happens um so we're we're already sitting at 280 percent of our memory request and 278 percent of our memory requests on these two pods so we've we're asking for not nearly enough so we've scheduled this pod on a node thinking that it only needs 10 megabytes memory and 10 millicores uh the scheduler has made this decision to put it somewhere but um that that's way off because now we're using three times that amount already so the scheduler has already made a bad choice because we told it to um and let's also who not demo one sorry um are we still we're still running some cpu stress in the cluster as well um so if we take a look at our hpa we'll see that we are now 1500 percent of our target cpu usage and so what this is going to do is we're just going to explode the number of pods we're just going to scale way up here for really no good reason um so if we look at our stats um and take a look at our current request duration um let's see nope that's iteration duration we're at seven seconds so we're a little bit high right now but not passively remember that was five earlier um we look for the request duration we'll see that we're at an average uh max of 900 milliseconds an average of 200 so we're a little bit high um but we're not as high as we'd expect for the massive traffic spike that we have um so if we take a look at the number of pods here um we now have 75 pods take a look at our hpa we're starting to settle down on our target here um but we probably don't need this many pods we you know for the amount of traffic i'm running i would not expect to need um 70 replicas that's going to go up again in a second because of where we're at here uh we're going to do we're using a hundred pods for a relatively small amount of traffic so first we're going to see problems with scaling scaling is going to happen too fast because our percentage of our request is so far off from what we're actually using the second thing that we're going to start to see is nodes are going to start to get uh overwhelmed so this is where that tool cube capacity comes back in that i talked about great by an old coworker of mine rob scott um and uh it's going to sum up your cpu limits uh per note so if we look at all the pods running on this node we add them up you can get this information from a get nodes output uh but this really sums it up nicely in an easy to read way but you'll see that our cpu limit is currently uh our summed cpu limits are currently five hundred four hundred three hundred percent of our node capacity so we have the opportunity to attempt to use five hundred percent of our cpu or uh you know i didn't really constrain on memory but i've seen this happen with memory where you get to three four hundred percent of your memory available on the node and as soon as you get a large amount of traffic and those nodes start those pods start to consume more and more resources uh your nodes are going to get overwhelmed things are going to start to fall over we're going to start to see evictions like we did in that first test where we just spun up a whole bunch of stuff with no requests and limits and it just really can result in a very unstable cluster i'm not saying that all applications have to be guaranteed qo s class but for your critical applications use that guaranteed qo s class and when you're going to use burstable be mindful of the entire cluster in the ecosystem that you're deploying into and understand that if you're you know combined usage starts to get to a point where maybe you could hit six hundred percent of a node that's probably not a great situation to be in and so be mindful of those some limits and those some requests to see what's happening here so we'll just run the regular test real quick here and see what are uh current uh utilize our current numbers come back at looks like we're we're at 382 milliseconds average with an iteration of 10 seconds so even with all these extra pods spun up we're still not getting great performance we are at four nodes we haven't spun out any new ones and we can take a look at the utilization output way too wide um let's see our memory utilization looks good but all of our nodes are spiked out on cpu which is probably why we're seeing such high latency in our test because with that level of utilization on the node we're going to start to see more and more cpu throttling great and then there is an audience question again um she's amazing so this is part of chaos engineering over communities from Jonathan I mean you could kind of say that what I'm doing is a form of chaos engineering uh if you really wanted to introduce some interesting uh chaos into your cluster deploying something like that stress application to kind of attempt to eat resources would be a form of chaos engineering so sure yes thanks for the question all right well that's actually my last demo I know we're a little bit early on time um are there any other I'm sure I can come up with some other ways to break this cluster any other questions from the slack channel or anything like that yes perfect there's a lot usually when we're near the end the question amount is increases and increases which is amazing to see by the way thank you so much so Dave asks what do you prefer k6 versus pie test for endurance testing um sorry I didn't quite catch that what's the channel in the cloud native I can go pull these up it was from Dave from youtube side of things oh from youtube okay oh I see k6 versus pie test for endurance testing um I really enjoy k6 it's super easy to run writing the test is easy they're they're uh documentations great they're cloud products actually pretty nice but um you know it works for a whole bunch of different situations I've been using it for a while I don't know if I've used pie test directly for this type of testing myself so I don't know if I can necessarily give a good recommendation there but huge fan of k6 it's very simple to get running great keep the questions coming if there's more we have time to take them and then a LinkedIn user asked is this cubanese native application um no I don't believe it is uh originally um I will say it runs quite well in cubanese I've had good luck with it as a demo application all the different pieces are nice I grabbed the the the yaml for deploying it modified it pretty heavily from the repository so it definitely runs well in cubanese whether I could say it's necessarily like originally cubanese native I'm honestly not sure so great uh and you to you Dave by the way um first for asking the question great uh so any other questions or did you have some some other way to break break things more oh there's always fun ways to break things um so other scenarios we can get into with cpu and memory requests and limits what's another common one that we see um I had an idea this morning it just left me uh while we wait for that inspiration to strike again um I have a question actually um so you've talked a bit about different projects and so forth but do you have a favorite cncf project or another open source project regarding stability for cubanese um definitely uh I'm a little bit biased in that we have some that ferowins has released some open source projects and goldilocks is around uh is designed for setting your resource requests and limits or setting a baseline for them using the vertical pot auto scalar project which is um part of the cubanese project um to set to get recommendations on how to set those things initially so goldilocks gives you a nice way to just view those on a single dashboard uh and then as far as other stability issues that are not necessarily cpu limit and request based uh we have a tool called polaris that checks for best practices in your configuration of your workloads so not just cpu request and limits but usage of hpas and pdbs and uh other configuration things that are important liveness probes readiness probes uh and and stuff like that so those are kind of our two main ones around stability um and then of course any of the tools that I showed today all of the the cubanese native tools cluster auto scalar uh metric servers obviously in use here um as part of gke so I think that's a good list of tools that I frequently use great perfect and yes gentlemen thank you for for hyping us up with great awesome thanks for the share or for this increased capacity over cubanese thanks a lot thank you for attending and thank you for asking questions and good day and good night and and so forth to you as well and um great and then there's I think a question in the slack side as well um so it's clear about setting good requests what would be the best practice to set good limits hmm well if you're gonna set your I mean I feel like I touched on this a little bit so if you're gonna set your requests and limits different if you're not going to use guaranteed queue else class I would say a good guideline is to just not keep them too far apart um so I generally like as a very very like total generic baseline you could start with not setting it maybe 10 more than 10 percent over your request depending on various various variables obviously I can't give a blanket recommendation it's going to work for every workload in every cluster there's just too many variables at play um so if you're going to use burstable just don't keep them too far apart and be mindful of the whole ecosystem um but in general if it's a critical workload if it's your main application if it's sensitive to um being throttled or you know any sort of disruption then using that guaranteed qos class is going to be your best routes so set your resource requests and limits the same and set them high enough that your app can function and you get the lowest latency possible uh there's actually another tool out there that I forgot about that I've used in the past to do this is a much more complicated to use but it's a really interesting concept um it was made by a company called carbon relay um that is now part of storm forage I believe forgive me if I get your names wrong but uh essentially what it does is it allows you to run a series of tests so you you set some variables that you want to monitor you say I want to look at my latency and I want to balance that against my you know how many resources I'm using so effectively my cost and so the I want to maximize or I want to minimize latency I want to minimize cost as well and I want to tweak the cpu requests and limits and so you give it a set of parameters and what it will do is it'll it will um and then you give it like a test to run like a k6 load test or something like that and it'll make the change do the iteration of the test and measure those variables and it'll do that over and over again and plug those into a machine learning algorithm and essentially give you a curve of like okay if you take your cpu request and do this with it what how does that affect your latency it's a really interesting tool and a fantastic way to um sort of get an idea of what what what's going on in the cluster around these two types of things so that could be a fun thing to look at for folks great thank you to most right for that great question um yeah there's been a bit talk about the slack as well so um there was a comment then and if you send cnc ach on youtube your email for monatran uh we will get you started on the slack as well on that front but yeah those are the questions so far so if there's any still we do have few minutes to take them so keep them coming if there's anything that pops into to everyone's head did inspirations strike and did you remember the good way to break everything um it did not actually i was too busy answering questions um so i think these definitely cover the the primary scenarios that we see most often um i'm sure there are dozens of other ways to break the cluster uh but these definitely are the the best uh you know the most common ones that we see yeah and uh to to dumb chicken priani you can see the slack um channel on the stream itself where they join our live chat and cncf slack there and for the cncf slack i think you can find that from the cncf website for example linked to the slack to begin with it has also been linked to this chat so you can maybe find it from there uh but it's not like the cncf website itself should have the link to the cncf slack where you can then find the cloud native live channel itself and then there's a comment from jerk saying we've had long reviews on the request and limb range and we've come down to 20 in our particular case that's great um reviewing your cpu requests and limits is always a good idea something you should definitely revisit on a regular basis um and uh definitely see you know it's balanced between performance and efficiency or performance and stability and cost right because we can always get a very stable cluster by just turning our requests and limits way up always using you know guaranteed qs costs on everything all the time and it will be very stable and we won't have any other questions but we're probably over provisioned considerably if we do this uh and so we're viewing that over time looking at your metrics over time and then making tweaks to those uh that don't affect your you know latency or you have a reasonable uh range that you're shooting for with latency uh is super important to do so i think that's super great great everything's going smoothly then over there um and then there's another question about what is this channel for the slack and it is still on the stream so you can see it you should see i think this direction so join our live chat on cnc of slack club native live so that is the slack channel you should be joining but obviously in here we actually see comments from youtube so steven we do see your comments here already live from there as well or from linkedin or from the other places but that is true you can join the slack as well and join the conversation on that side as well and thank you so much for joining and i've synced on the thanks and got it and great messages on that side and the slack link i think you can find that from the cncf website itself um if i'm correct um and that should be there so that you can join there there we go there's the link to the cloud native slack once again all right great everyone's interested in continuing the discussion there and um it's the same channel that we have every week for these cloud native lives so um if you um hop in over there you can then join in next week as well and see um the chat in action in there as well as well as um the link to all of the live shows are linked to that slack channel as well so that is very nice tip for everyone here again so we have three minutes left in our scheduled programming time um do you have any final comments or recommendations summaries or anything not anything new just make sure you set your resource request and limits i will continue harping on that probably for the rest of my career uh so and uh thank you so much to everybody for all of your questions and for listening uh it's been great having you all here perfect thank you so much for joining so now that we have two minutes left i think it's a great time to start wrapping it up we've had a lot of great interaction um thank you so much to everyone so thank you everyone for joining the latest episode of cloud native live it was great to have andi here talking about building stability we i and we both probably also really loved interaction and questions from the audience and as always we bring you the latest of cloud native code every wednesday so you can join in next week and the next week after that as well so next week we will have another great session from jason morgan talking about their um amazing topics as well so thanks again for joining us today and see you next week