 Hi everyone, and welcome to this talk on how to carefully replace thousands of nodes every day My name is Adrian Tuyo. I'm an engineering manager at Datadog. I lead the compute node life cycle team And today I have the pleasure to be speaking with Ryan Hey folks, I'm Ryan McNamara. I'm an engineer at Datadog with the same team as Adrian And I've been doing Kubernetes things for a few years now Datadog is an observability company. Every hour we ingest trillions of data points from our customers applications and infrastructure To process those data points we run hundreds of thousands of Kubernetes pods on Tens of thousands of nodes in many clusters and to meet our customers where they are we run on multiple clouds Also to better control the performance and reliability of our platform and as consistently as possible across cloud providers We run Kubernetes from scratch part of our duties as cluster operators is to replace nodes when needed and At all scale this happens thousands of times a day and we do that without breaking applications If you use a managed distribution, you may not fully control when nodes need to be replaced But you are responsible for protecting your workloads when that happens So today we'll first explain why nodes need to be replaced how it's done generally and more specifically at Datadog You'll learn about some of the strategies that we use to protect our workloads when we replace nodes And we we hope to start a conversation on turning some of those strategies into Kubernetes enhancements So there are many reasons to replace nodes when we started running Kubernetes we needed a solution To first quickly react to hardware failures like bad memory failing disks at our scale That's not so rare and they don't necessarily break the nodes completely But they have a negative impact on performance. So we need to do something about it. We also wanted to anticipate VM retirements That's when the cloud provider reclaims your virtual machines And yeah, we don't want those to come as a surprise But these days the main use case by far of the solution that we've built is to upgrade machine images We do that for Kubernetes upgrades and also for operating system security security patches as you may have seen this morning In a keynote presentation by my colleague cement and Laurent We had a good reason to disable unattended upgrades and we exclusively rely on node replacement for operating system patches now More and more we also replace nodes to use faster and cheaper VMs So our applications are highly available and they generally tolerate Involuntary disruptions, which is the sudden loss of a node and its bots But the the reasons listed here just occur too frequently to just kill the nodes. So luckily, there's a better way So you may be familiar with some node replacement solutions the three major cloud providers Automatically upgrade nodes and handle VM retirements with their managed node node groups and node pools On AKS and GKE for node auto repair. They rely on an open source project called node problem detector And that can run either as a daemon set or a spot of machine images It detects some common problems like bad memory failing disks Adds a condition on the node status and that condition triggers no replacements Finally, I'd like to mention carpenter. It's an alternative to cluster autoscaler on AWS And that also takes care of no replacements Importantly all of those solutions rely on the eviction API to protect your workloads Kubernetes doesn't fully handle node lifecycle. It doesn't start nodes doesn't stop nodes. That is delegated to the cluster operator But one thing it does provide is that API as a building block to build upon for to create a no replacement solution And the eviction API is basically a conditional pod deletion The eviction API protects pods covered by pod disruption budgets or PDB for short PDB is a Kubernetes object That you can represent as YAML like this it matches pods It covers spot that match a label selector You can specify a maximum number of unavailable pods among those covered And while enough pods are available any one of them can be deleted You can also specify a minimum number available pods or use percentages instead of integers Available means that all other pods containers Pass their readiness probes readiness probes are defined in a pod spec. They can be HTTP calls TCP calls CRPC commands, and they're executed by the cubelet And the results of those probes sort of cascade down to the PDB status The primary use of readiness probe is actually for service availability When So when a pod is ready its IP is registered as an endpoint of services and sort of reused as a Healthiness indicator for pod disruption budgets, and we'll see how that is sort of limiting in a way later So to replace a node gracefully a solution must implement four basic steps Cordon the node evict the pods together. That's called draining the nodes Terminate the old VM and start a new one you are you may want to start new VM earlier to speed up scaling as we'll see later To cordon a node you mark it unscalable. That's a spec field or or you taint it So a new pods kind of land on it Then you evict the pods and that's a conditional deletion protected by pot disruption budget as I said If there is no PDB Careful that just the pod is simply deleted. It's it's unprotected deletion The the pod disruption budget status is a reflection of the readiness probes from the putz containers And going back to the eviction if enough pods are available the eviction can proceed into a deletion the containers are terminated and Typically the deployment or stateful set controller or any controller that owns the pods will recreate pods immediately So when a node is empty you've evicted all the pods the node replacement solution can terminate the VM that stops the cubelets and There's a component called clock controller manager Which is a cloud specific Kubernetes plugin or add-on that detects that the VM is gone and it deletes the node Interestingly Coprenter has a sort of an alternative Way to do this it deletes the nodes and determination of VMs is a finalizer on the nodes Then or earlier the solution could start a new VM that starts a cubelet It registers as a node and the scheduler sees the node and can bind the pod to it containers can start Etc. So note the time gap between the new pods creation and They're scaling in this diagram. They so that's why the It's sometimes useful to start a new node before draining the old one However, that's not always necessary the new pods could find some room on existing nodes to be been packed So I've mentioned a few examples of node replacement solutions that you may be familiar with I've explained how they generally work and I'll briefly talk about The solution that we built at datadog for more details I invite you to watch an episode of datadog on that are recorded recently with another colleague of mine called David bank But in short we first considered using node problem tech node problem detector But we we already have a daemon set that collects health data about nodes and that's the datadog agent So our node problem detector is simply runs as a controller and transforms datadog monitors into node conditions In our solution any reason to replace the node is transcribed as a node condition To to drain the nodes that have conditions. We initially used a tool called Draeno by planet labs, but we ended up writing our own adding some node life cycle hoax that Ryan will discuss later From the very beginning cluster autoscaler has been a key component of the solution for any scale up or scale down including node replacements and finally We added a component that we call the disruption budget manager that we used to enhance the eviction API and put disruption budgets So at this point, I'd like to emphasize the main keyword of this stocks title That's carefully to drain thousands of nodes every day. We need velocity But more importantly we need to be careful not to break applications and that is true at any scale So so let's have a look at some careful strategies As I said in the introduction our platform supports hundreds of thousands of pods and we care about most of them But when we started this project we we quickly realized that Not all workloads came with their with PDBs So and without PDBs for destruction budgets evictions are unprotected deletion So so we needed to enforce PDBs and so as with any enforcement issue in infrastructure We could have added checks in CI and admission bugged our users But instead it's always better. We think to just do the work for the users So so we create PDBs by default if they're missing For each workload If the if a custom PDB is not provided we create one with max unavailable equals one That is the safest default and it's actually fast enough in most cases Now remember that PDB select pods using label selectors And we realized that if you try to use some of the existing pod labels you can end up with overlapping PDBs So when a pod matches to PDBs The Qube API server Does know which one to use. It's a configuration error and denies the the eviction So to avoid PDB overlaps For default PDBs we label the pods with their deployment or stateful set or any other controllers Unique identifier. That's a metadata field And we select the label in the PDBs So users can still create custom PDBs and in that case we still monitor for overlaps So they don't block our node replacement campaigns now I'd like to talk about the readiness probe and how it's not expressive enough for disruption budgets As I said that the primary use of readiness probe is to tell when pods are ready to accept traffic When a pod is ready its IP is registered as an endpoint of a service So and they're reused for to express disruption budgets in terms of available pods and To replace nodes we need non-zero budgets in general, but there are circumstances when disruptions should be delayed and At the same critical time All pods should receive traffic So we can't use the readiness probe to pilot the budget Let me give you some examples An application is under pressure at that point losing a pod could push it over the edge or An application is undergoing maintenance an upgrade or any other type of operation at that point losing a pod could disturb the operation or There's an incident ongoing. It may or may not be related to the application. You don't know yet But evicting pods removing nodes at that time could actually delay the investigation. It could remove evidence Or it could make things worse So to deal with those situations we dynamically set disruption budgets In particular we reconcile them with data dog monitors and an internal distributed lock system that we have But any third-party state could do for incidents because we don't want to update all PDBs at once We we take advantage of the fact that evictions are a pod sub resource and so as a Kubernetes API server pod resource evictions can be intercepted at admission using a validation webhook and so During some incidents we deny evictions at admission. So I've talked about how to better protect your workloads from evictions and Ryan will now discuss some ways to to optimize drains Thanks Adrian So as Adrian mentioned Kubernetes delegates node lifecycle management completely to cluster operators If you compare this to something like pods where Kubernetes is able to provide things like pre-stop post-op hooks Kubernetes there actually has complete control over the lifecycle But for the case of nodes, they're just the reflection of unowned external state Namely virtual machines So for this reason We've had to create our own node lifecycle hooks to make node scale down more graceful So here's the problem we have This distribution for pod scheduling latency when pods need a scale up in order to be rescheduled So you can see that the p50 is about a little over two minutes and the p99 is five minutes and this isn't terrible But for our applications and just to make it more graceful we can do better than that So to solve this problem, we do what we call node pre provisioning So here's a diagram explaining what happens without pre provisioning So we'll have a drain controller that will come decide to drain a node and it will evict all of the pods on the nodes And they'll be deleted And then you can see that the key controller manager is creating replacement pods But depending on the state of the cluster these pods may or may not be able to schedule right away. They might require a node scale up so what we do is Before we evict any pods we create what we call a set of fake pods And so the idea is that these fake pods are representative of the pods that we're about to evict And so what that means is that once these pods are scheduled We can have high confidence that if we were to delete them and reschedule all the pods that are actually currently running That they would be able to schedule pretty much immediately or as fast as the queue scheduler can schedule them and in practice What this means is we have Scheduling times that go from minutes like I showed in the previous slide down to just seconds So after the pods are scheduled the cluster auto scalar will scale up if it's needed Once we say that they're all All of the fake pods are scheduled we can delete all of them We can start evicting the pods and then they'll be scheduled more or less right away I say that they'll probably be scheduled because the state of pending pods on the Kubernetes cluster is relatively dynamic So it could be the case that there's some other scale up happening at the same time Or there's a set of node drains happening But if we do this for multiple node drains that are happening concurrently What we'll get is the sum of the resources required and so we'll be able to schedule Say there's two nodes currently draining will be able to schedule all of the pods that are replacing the pods from those two nodes The next node drain hook that we have is about persistent volume claims so at datadog we use local persistent volumes and suffice to say we do this for two reasons performance and cost and Unlike remote volumes, you might be more familiar with local PVs and their associated PVCs are bound to a node tightly until they are deleted and when you delete them It's a representation of actually throwing that data away So until we get rid of a node the PVC associated with it and the local disk Will persist so what we have here is we have pod a and a PVC a that are coupled together When we evict and delete pod a PVC a continues existing the stateful set controller will create a replacement pod b But this pod b will have no PVC that it can actually use because the node that the PVC a is on is cordoned So the fix is quite simple when we evict in our drain controller pod a we Simply delete the PVC associated with it and you can see why we care so much that our evictions are protected because like I mentioned This is potentially a destructive action. We're going to lose the persistent volume for this local node and this is fine because our databases are set to handle losing a single database replica so once Once the PVC is deleted the stateful set controller can go ahead and create pod b and What will happen is the cube scheduler will see the new PVC and will Schedule the pod the local volume provisioner which runs as a daemon set on nodes will be able to Provision a local PV that this pod will be able to use It's worth noting that before Kubernetes 1.27 the stateful set controller create of pod or sorry of PVC b Would not actually happen automatically One of our team members Raul contributed an upstream fix so that the stateful set controller would look at Missing pvcs at all phases of life cycles for stateful set pods and not just when stateful set pods are being created Our last drain hook is a bit more of a generic one and it allows applications to decide what logic They want to perform when nodes are being drained So the API for this is similar in spirit to how the pod readiness gate API works just for review There you have an external controller that decides when a pod is ready And it does this by updating the status of a pod to just open the gate and say that the pod is in fact ready so what we do for Pod eviction gates is we annotate the drain controller annotates the pod as a candidate It's up to the app controller. It's running as a controller to notice this and Then trigger any logic that's associated with it So the contract here is that the pod is about to go away So this is an opportunity for the app controller to really do Whatever kind of logic it wants to do to smooth this transition So perhaps the current replica is a leader and we want to preemptively Transition the leader to a new replica Perhaps the replica is reading from a shard and we want to redistribute that shard to other replicas You can imagine snapshots other examples Once whatever the app controller needs to do is finished The app controller will annotate the pod as being done and then at that time the drain controller Can evict the pod and proceed So this is a recurrent theme we have Something that we're doing to make scale down more graceful We don't have a guarantee that we'll be able to do this But when we can it is a bit less of a burden on our applications For our last topic, I'd like to talk about recent ongoing and some possible changes to eviction and PDBs and Kubernetes So evicting and leading unhealthy pods is very important to us. It's how we recover from degradation And it's important that we are able to do that with the scale that we run at So here we have two deployments or stateful sets and they're running with our default PDB Where we set max unavailable to one and you can see that the blue pods are ready and the gray pods are not ready So the question is when you try to evict these pods, which ones will succeed and which won't so prior to kubernetes 1.20 evicting the unready pod in the left example would fail and Simply the logic was the PDB says max unavailable one. There's an unavailable pod and so the eviction fails After kubernetes 1.20 the default chain the default behavior changed to what's called if healthy budget and What that means is that the check is done such that after the eviction as long as there's only one max event max unavailable The delete will go through so that's great for us and that is the setting that we use In kubernetes 1.26 There was an additional Option added called always allow and it's exactly what the name says it just means that any time a pod is unready You're able to evict it And this makes sense in some cases. However, this is not the option that we use so then I'll give a couple examples as to why so When a pod is not ready, you don't actually have a strong guarantee that it's not doing something useful So you could imagine that in the right hand case where we have two not ready pods The first pod is doing useful work It's continuing to operate and perhaps the second unready pod is actually having a problem If we evict oh and the first pod is unready because the cubelet is simply not able to Heartbeat at the API server suppose so if we evict the first pod then we're actually going to create an issue Where there wasn't one already and so we want to take the sort of pessimistic approach and assume that this is the way to go The situation gets more important when you consider that these pods might be having local data like I mentioned earlier so in that case if we Suppose that one of the pods had corrupted data The other one is in the same state where the cubelet can't heartbeat if you evict that pod Then you're going to have two replicas that now have their data effectively deleted And so you might have data loss depending on the database that you're using So here's a review of some suggestions that we've made in this talk We talked about a disruption probe to decouple service routability and voluntary disruption handling We talked about default PDB's which is a simplifying assumption to assume that all evictions will be protected And I went over some node lifecycle hooks that make node draining more efficient For a last idea proposal, I'd like to talk about voluntary disruptions and whether or not they should always respect PDB's Spoiler alert the answer. I'm going to propose is yes with some qualifiers so pod preemption occurs when there is a high priority pending pod and a low priority running pod and Suppose that this pending pod can't be scheduled anywhere and is configured with a priority class That is configured with preempt lower priority that in that case the cube scheduler is actually going to delete the existing low priority running pod and This is an unprotected delete. So for all of the reasons I mentioned earlier This is problematic for us and indeed we had an internal incident related to this So what happened was we rolled out a new version of a Damon set where we slightly increased the resources of it now for most Nodes that this Damon set change rolled out to everything was fine. We were just able to increase the resources slightly, but for some Nodes this actually wound up squeezing effectively some pods off of the nodes And so when you do this times a thousand nodes you start to run into problems A lot of our applications were fine and could handle losing a single replica But in some cases we would take down one replica for an application And then we'd move to another node and take down another replica of that application and that's where we started to have problems So there's a Kubernetes enhancement plan already in progress that's been approved to provide an option to guarantee respecting PDB's to use eviction when preempting pods by default Currently it's done best effort, but it's not a guarantee like I mentioned So the next case is taint-based eviction. So the name says eviction, but it's actually deletion And the way the taint-based eviction works is that a node will acquire a no-execute taint The most common one that we see is unreachable like I mentioned a couple times The side the cubelet is unable to heartbeat to the API server. It'll get a unreachable no-execute taint after some time The pod either does not tolerate the taint or no longer tolerates the taint It's toleration seconds has elapsed and at that time the node controller in the cube controller manager is going to delete the pod And this is again problematic for all of the reasons we've mentioned before So the problem the proposal then is to have some configurability options to have taint-based eviction Respect PDB's and use eviction And if you allow yourself to imagine a little bit you could imagine also using taint-based eviction in order to drain So it could potentially replace the drain controller that we have been talking about and that would be great for us Because it would allow us to you know maintain fewer controllers and just kind of stay more in line with default Kubernetes behavior This is a bit of an open-ended idea and probably requires something like promoting conditions to no-execute taints and Probably some configuration about when you give up trying to evict and just move on to delete So here's a non-exhaustive list of other places where currently deletion happens where we think eviction could make sense So we talked about the first two the third one is node pressure eviction So this occurs when a cubelet does not have enough resources and needs to start getting rid of pods on the node in order to make In order to guarantee that it can continually operate correctly So this is probably a case where you can't always go to eviction because the node needs to reclaim resources immediately, but it probably could make sense to Start with eviction and maybe move on to deletion again And the last one is the case of rollouts. So for example when you roll out a deployment There's an update strategy associated with that deployment and that's actually completely separate from pod disruption budgets So you could specify that you're okay with losing two pods when you're doing a deployment rollout and your PDB would only say one and in fact what you'd see is you'll have two pods deleted and that's because deployment rollouts and controller rollouts in general use deletion instead of eviction So if you take something away It would I would like it for to be to consider using eviction instead of deletion We find it to be a very simplifying assumption for us It's guaranteed to be safe. Of course, it's not actually guaranteed. There are some qualifiers But we think it's something that's useful for us and could be useful for the community in general as well So thank you very much for coming to our talk We'd be happy to answer any questions now and if you see us around feel free to talk to us Can you talk a little bit about the differences between max unavailable in the deployment API versus PDB? Can you repeat the question? Sorry Can you talk a little about the difference between max unavailable in the deployment API versus using an actual PDB? explicitly sure so for Deployments have rollout strategies and you can specify like a max surge and a max unavailable And so basically it's just a completely separate track from what the PDB specify So when you're doing a deployment rollout the deployment update strategy is respected and then when you're doing evictions The pod disruption budget is respected. So they're just too completely the divorce which is surprising When you guys are doing high volume node replacements, how? Quickly, are you comfortable going like how do you base the amount of nodes you're replacing at one time? By like a static number or a percentage of your total node pool or like how fast are you guys comfortable going during high volume replacements? So We go as fast as needed so we have basically we We have a configuration by condition And some conditions are more urgent than others And so if we need to go fast we go as fast as the PDB's allow us and we also have a Mechanism of back pressure from the cluster autoscaler so that we don't you put too much scale of pressure on the system I like that you guys are doing the default PDB stuff Is that in reaction to people creating poorly configured PDB's or in the past or what do you guys do? To mitigate that risk. Yeah, so I think beyond poorly configured PDB's It's more that by default there just is no PDB and when there is no PDB in eviction is the exact same as deletion And so a lot of our you know this is basically just a burden that we can take away from our app maintainers and just do it for them and We found that like the conservative default well the most conservative default is max unavailable of one and we found that that works pretty well I hope you don't mind me cheating and asking two questions But the first one was can you elaborate a little bit on the fake pods? Are they just dummy pods or are they actually some kind of CRD? No, they're they're actually just pods that run containers that do nothing Okay, and the other one was you had talked about not trusting Kubernetes reporting and pod is not ready that it might be doing something Is there any thoughts or plans around getting to the point where you can trust that or are we just sticking with that assumption? I think you have to stick with that assumption. I mean Bugs whatever. I mean there's it's entirely possible the cubelet just has died and is not going to be talking to the API server anymore In that case all of the pods on that note are going to be marked as unready But who knows what they're actually doing so I think it's I don't think there's really any way around it My question is around the pod readiness gates So how do you manage it in a controlled environment like when you are not having access to patch the resources? No access to which kind of resources a patching a pod readiness gate Like if I want to update my border readiness gate, how can I do it in a controlled environment? You're talking about pod updates part readiness gates. Oh pod readiness gates. Sorry Yeah, so I guess I'm not seeing how it how it ties into node draining so pod readiness gates are a way to Basically have some controller that declares when pods are ready beyond normal Kubernetes mechanisms And for us that just ties into pod readiness We don't look to see if the pod readiness gate is past or not We just check that the pod is ready, which is and pod readiness gate is included in that So answer your question. Okay One question I had is we've seen issues when pod disruption budgets are Misconfigured and because of which trains are not properly done Some inexperienced developers or inexperienced administrators do sometimes delete delete the pods Which generally doesn't cause any issues, but can you talk about some nuances between the difference between eviction and deletion when We had what like what's exactly the difference between the two operations When it works exactly like what can you repeat the very end of what you said, so yeah, can you talk about the nuances between? a part deletion and part eviction and What what are the? Failure scenarios that we can see when we are deleting instead of evicting Yeah, we so we basically advocate for always evicting unless the the pod disruption budget doesn't allow you to You know remove the pod that you want to remove but most of the time like we Like the the there's little reason to go Need to go to go in and like manually evicted delete a pod that's So but You want to add something I guess the way we think about it is like the first step is eviction and There's really not a concern with the way that we run with eviction It's supposed to be always safe And so that's kind of a starting point if that's not gonna work because Maybe you lost all of your nodes at the same time And you have like work your workloads running on multiple nodes and you lose all of the nodes Well, then it's like gonna be maintenance And you're probably going to need someone to come in and look at it and start manually deleting things try to get things to A good state but like by default eviction for us is just safe So we try to default to that. Yeah, if you're deleting be sure of yourself, right? Like I mean you have to have a good reason It's deleting like a hard delete rather than and eviction is more graceful It's yeah eviction is more graceful because there are additional checks And delete will just stop the termination process of the pod. So It's irreversible And So once the pod start being terminating There are a few things that are called on on the containers like pre-stop hooks etc But it's irreversible and after a determination grace period the containers are killed So like what like if you if you start delete a pod you have to be sure of yourself Got it. Thank you something that I Was interested in hearing a little bit more about at the beginning of the talk was you said you had disabled unattended upgrades Can you go a little bit more into like what happened and why we're having the same fight At my company and you know, I'm curious to have more ammo in my chain in my corner Sounds like you did not attend the keynote of this morning No comment So we have a great blog post series written by Laurent who and him on to present the keynote this morning about what happened with unattended upgrades but it's basically Yeah, like a combination of factors that Made us realize that we didn't want not want to rely on it in upgrades. Yeah PDB being a lagging indicator which you pointed out like we have suffered to the same problem as well Do you think the pod eviction gate can help you solve the problem? Can you move closer to the microphone, please? The PDB being a lagging indicator that is something which we have faced in production as well Do you have a solution which you have thought about implementing and Can power eviction gate help there and our second question is that have you Have you are you in discussion with the stripping the power eviction gate? I Guess the way I think about it is that it sort of has to be a lagging indicator I mean, it's entirely possible that right when you go to evict. There's two of your pods Going with the max unavailable one There's two of your pods that are immediately unready But it's basically a raised condition as to whether or not you tried to evict before or after that happened So Yeah, I don't do you have anything else to add to that to your second point Yes, we're making contact with stream teams to discuss some of the ideas that we have How do you guys work with long running pods and batches do you like pack them into certain nodes certain clusters? Do you what do you do to address that and your goals of replacing nodes? To do what in batches? Like if you have pods that have to run for four hours or they're useless How do you guys address, you know longer running pods that you can't interrupt for a long period of time. Oh, yeah Yeah, so jobs basically that batch work loads We We In a way, we're lucky in that our batch work loads most of them run Have a reasonable runtime so we take the simplifying assumption that We don't a victim and we just wait for them to run to completion and So when when when the when replacing the node is not that urgent That's okay Another thing that we do is when we're deciding what node to drain next We do a simulation to see what we expect would happen And so that's not like a guarantee of what will happen But that puts us away from evicting or sorry for draining nodes that have these long running jobs first And so we do other nodes first and then hopefully by the time we get to that one The job is done or closer to being done. So that's something that helps Hey, Sean from Uber. So we are I think the current design for node pressure evictions that one is doesn't respect to the PDB, do you have any context like are we going to support PDB in terms of node pressure? Sorry, I did not hear the end. Can you move closer to the microphone, please? So I'm talking about node pressure eviction for example a node run out of memory Or you like like you said memory threshold to let's say 80 percent something like that. So whenever the node Consume more memory than that it will start cubular will start Eviction right that's that a kind of eviction doesn't respect PDB right now Yep, have you ever seen the issue and do you have any context about the discussion or proposal around that? Yeah, so So luckily this Doesn't happen very often in a way it does happen. It's often because of You know a misconfiguration of Basically People Asking for too much memory on the node for example on and not requesting it like using too much but not requesting it That's what I mean So it's isolated so but it is Like you can theoretically You know break the contract of the PDB because it's not respected, right? so so with we think that It would be a safe change to Best effort respect PDB's for no pressure eviction, which is not the case at the moment So no pressure eviction is not a big deal your experience I Don't think that we've actually seen a case where it caused an issue like we had a specific case Where like oh, we had no pressure we evicted this pod if only we had evicted this pod things would have been better I don't know that we've seen a specific case. I think it's just it's just another thing that you could imagine happening Okay, like what's what's the reasonable? Like host memory eviction threshold For your experience. I don't know what to we use Honestly, I don't remember exactly what we use as a threshold But but it's I think it's important to to Like not only like set a reasonable threshold, but I think Probably more importantly, it's important to reserve enough for the system in cubelet So that's that's a separate setting and we do that and that is like separate depending on the size of the VM type All right, thank you very much