 All right, hello everyone. Thank you for attending the SIG auto-scaling feature and just general update talk. I know it's a little bit late on Thursday, but glad everyone's here. My name is Jonathan Ennis. I am a OSS maintainer of a auto-scaling project called Carpenter at AWS in a software engineer there. And this is Macek Pytel, and he's a software engineer for Google. So just walk through a little bit of an agenda that we're going to do today. We're going to give a little bit of an overview between cluster auto-scaler and Carpenter. And the point of that being, we're going to talk a little bit about some work that we want to do between cluster auto-scaler and Carpenter to gain more alignment across some of our features that we have that look quite similar to each other, as well as some API that would make probably migrating between the two forms of the auto-scaling much easier for anyone writing application specs. And then we're going to talk a little bit about project specific updates. So horizontal pod auto-scaler updates, Carpenter updates, and then cluster auto-scaler updates, finally. So just giving a little bit of a review between the two auto-scalers or the two node auto-scalers that exist within SIG auto-scaling today. Cluster auto-scaler and Carpenter, they solve very much the same problem but in kind of a similar way. There are some differences between the way they actually do the implementation under the hood. And they do have some difference in terms of the, I guess, the scope of what they manage. But they solve a similar problem, at least for auto-scaling and down-scaling. So they provide nodes so that all pods can schedule. So if you have a pod scale up and you have pending pods and the queue scheduler marks those pods has failed to schedule, because they don't schedule on your existing capacity. Both cluster auto-scaler and Carpenter will react to that. And they also remove unnecessary nodes to minimize your costs. So we both have features that we call different names. Cluster auto-scaler calls it a blue down-scaling. We call it consolidation, but both of our systems will remove unnecessary nodes if we think that we can minimize the cost of your cluster and we can tear down capacity and those pods could schedule elsewhere. And that basically covers the implementation piece there. There are some major differences. Cluster auto-scaler kind of works within the construct of cloud provider-based APIs. So all of your configuration for cluster auto-scaler will generally sit within the cloud provider, typically within node groups. So ASGs, MIGs, BMSS, the life cycle of those node groups is fully managed by the cloud provider. So if you have a new image that you wanna roll out to those node groups, cluster auto-scaler won't manage that for you. That's the responsibility of the cloud provider. If you need to, like Carpenter specifically as a feature for rolling nodes that they're past a certain age, cluster auto-scaler doesn't manage that side of things. It only handles the auto-scaling and down-scaling of the existing node groups that you have. And so all the configuration will sit in those node groups in the cloud provider and then it will just basically increment or decrement the number assigned to that node group across the cloud provider APIs. Whereas Carpenter picks a slightly different approach. It handles the entire node life cycle. Like I mentioned, we handle node upgrades effectively. So if you have a new image and you work around that you're trying to roll out, then the node will roll out with Carpenter through a feature we call Drift. And likewise, all the configuration for Carpenter sits inside of the Kubernetes cluster through custom resource definitions, custom resources that are based on custom resource definitions that Carpenter provides. So those are at least the configuration differences between the two. There is a slight difference as well in terms of implementation if you're looking at implementing a cloud provider between the two different types of things. So cluster auto-scaler effectively has pretty much all the information it needs to make decisions. So your node groups, I think traditionally you'll say, here's the instance type, here's the AZ that's assigned to a node group. And then cluster auto-scaler takes all that information as able to make the decision with all of that given to it. Carpenter's approach generally is here's all of my options. I'm gonna send those options to the cloud provider and then the cloud provider is gonna make the best decision based on those options that it has. And this is kind of particularly beneficial for spot scenarios where spot capacity is not necessarily publicly available and cloud providers generally have more knowledge about what capacity pools are useful to launch inside of. And so giving that information to the cloud provider that has that information helps with provisioning spot capacity that doesn't get reclaimed back immediately after you launch the node. And then we get into some smaller differences. So the first things are kind of ideological and maybe more philosophical in terms of the differences between the two things and they probably will never change. These are the smaller differences and this is kind of kind of dovetail into the conversation that Mojek is gonna lead in a second around how we can fix some of these things and how we can better align some of these things across the two projects. So very simply, we have different wording which sounds silly but when it comes to documentation I do think it is a little bit confusing and one thing that he's gonna talk about is we're kind of looking to align documentation around some of these conceptual things that are very similar across the two projects. And so just we call it like Carpenter calls it provisioning, ClusterRiderScaler calls it scale up, Carpenter calls it consolidation, ClusterRiderScaler calls it scale down. It sounds silly but those differences do make I think conceptual confusion when it comes to figuring out what means what across the different projects. There's also some things like pod annotations are different. That's annoying if you're trying to move applications from one project to the other. And there's some differences in choices around I guess what you call actuation when it comes to no drain implementation in particular. We have different taints between the two projects and all of this has kind of developed over time as a result of the two projects kind of developing in isolation and then likewise there's slightly different behavior with how the pods get evicted which is also confusing if you're coming between the two projects and likely the divergence will continue to increase if we don't start collaborating on a lot of these things earlier. So with that I'll let Mochek talk a little bit about how that's gonna look. Thanks, let's see. Okay, those work and so I'm going through those. So as Jonathan already told us, right, there shouldn't be that much difference between running workloads on cluster managed by cluster autoscale and carpenter or at least there shouldn't be that much effort needed to migrate them. Ideally you shouldn't have to change your pod spec at all and you shouldn't have to learn any new concepts like the naming thing we mentioned or any other really differences which don't come back to some major changes in behavior. And I think we're not there definitely, there are all the minor differences that Jonathan listed but I don't think we're actually that far from that point. The main API that triggers autoscaling is pods and really the main way of specifying requirements is to pods scheduling constraints and those are already standardized. So I think if we can fix the minor differences Jonathan described and if we can obviously avoid adding any new ones, we're actually going to be in a pretty good place, especially from the perspective of just moving your workloads between clusters. Now there are those more significant differences and they show up much more on the configuration style or on the configuration side, how you set up your cluster in the first place is quite different between cluster autoscaling and carpenter and I don't think we see a way to address it. So as Jonathan already mentioned, we started a unification effort where our goal is really on the pod level or workload level portability. However we're not trying at least not at this time to unify the way that autoscalers are set up or configured. In particular, we don't have any plan to adopt carpenter node pool API in cluster autoscaler or build any other API of that sort that will be shared between the two projects. And the plan we have right now to achieve our goals is basically start with really, it's just basically the least of the minor differences, right? And how we're going to address them. So the first step is to build shared documentation describing node autoscaling concepts and use this as an opportunity to unify the naming between cluster autoscaler and carpenter and also hopefully if we can put this as a concept at Kubernetes level between any other future implementations of autoscalers. The next step is to unify any pod level API. So mostly annotations, like the annotations, I think we had an example on a few slides back. We want to basically introduce some new shared replacements for those and deprecate the project specific ones, probably still keeping them for backward compatibility. And we want to go over annotations and similar APIs on pod or nodes that exist in one project and make sure or validate if it's something we want to add to other project as well or if it's something that actually makes more sense just in one of those projects. A necessary part of this going forward as also Jonathan mentioned is going to be a regular communication between the projects. We're already collaborating to an extent on things like DLA which I'm going to talk about in a second. But I think we need more of this and hopefully we'll be able to keep going and really increase our cooperation. And those really are the three short term goals. I would say all of those are in progress to some extent. None of this is done yet, but we have kind of some pull requests in flight, some general agreement on how to do those. And then we have some future ideas. So again, addressing more of the minor differences described like things like how the drain logic works, maybe something around how we handle these options, building some new shared APIs maybe. There is no specific plan right now, but that would be really nice if we can achieve this. And finally, and this is I think most future looking goal, would be evaluating if you can take this beyond auto scaling and start looking into some level of unification on the node management level. But this is more future looking. And this is all we had on the unification effort. One other shared effort I wanted to quickly mention is the array of dynamic resource allocation. This is something that has been already talked about in the KubeCon, so I'm not going in this KubeCon. So I'm not going to go into much detail, but the short summary is that DLA is a new API for requesting resources that can support more advanced use cases than existing resource requests. It was first introduced in Alpha and Kubernetes 126 and the original design made it very hard to integrate it with either cluster, autoscaler or carpenter. And so there is a new cap that is a collaboration between SIG nodes, SIG scheduling, SIG autoscaling. Hope I'm not missing anyone, but I very well maybe. And this new cap addresses those issues, it have been submitted to 130 and there is implementation work ongoing in Kubernetes 130. And our goal is to support DLA in cluster autoscaler 131. I think there is no specific timeline for carpenter, but there is interest, right? Yeah, there's definitely interest, a lot of momentum from this KubeCon. So it's definitely something we're evaluating. I think the extended resource problem has traditionally been a fun problem for us, so it would be a good way to say it. But there's increasingly more support for that kind of thing, obviously. So we're definitely looking at it really hard. Okay, do you want to take it now to HPA updates? Yeah, so now we're going to go into project updates. We'll quickly cover the HPA project updates, the carpenter project updates and the cluster autoscaler project updates. So on HPA, the container resource metric type is hitting GA in 130, which is super exciting. So previously, if you were configuring HPA and you were configuring metrics to watch on, to do scaleups, you previously had to deal with the fact that it was basically performing it on a summation of a pod resource request across all your containers, which is an ideal if you have containers with highly different utilization models for CPO memory. And so now you can do it on the container resource type, which allows you to basically scale up your pods based on the utilization of a single container rather than having to do it across the entire pod. So that was, or it is currently beta, and it was beta starting in 127 and it will hit GA in 130. So we're super excited about that. For the carpenter project updates, V1, beta one graduation, everyone loves this topic. So V1, beta one, we were really excited to announce V1, beta one, effectively last KubeCon, KubeCon and A. We introduced V1, beta one in November of 2023, and this was kind of the natural progression from our V1 alpha five APIs, which had been around for quite a while, and we'd gone through various iterations of the alpha APIs and had kind of repositioned our resource naming and just the general API to remove a lot of technical debt and to also align it with a lot of Kubernetes more upstream concepts, kind of dovetailing into the acceptance of us to Sigado scaling back in November as well. So we originally had resources that were called Provisioner, Machine, and Node Template. We kind of realigned these resources around Node Pool, Node Claim, and Node Class, and these took inspiration from deployment and also from storage concepts, deployment, because Node Pool's effectively, if you look at their actual manifest spec, they have a section called Template, and the reason they have a section called Template is because Node Pool's templatized something called Node Claims, and Node Claims are responsible for basically creating a request for a Node resource, which is then provisioned by whatever cloud provider you're running with Carpenter. And likewise, Node Class has allowed you to define a flavor of a Node Claim that you're wanting to launch, so Node Classes, you describe, and effectively today, it is the cloud provider-specific API, so Node Classes allow you to describe what's the image I want, which subnets I launch in, what's my security groups, these kinds of things that are more cloud provider-specific and some cloud providers are opinionated about them and others are not. We collapsed a lot of the disruption detail, so there was, I would say, a lot of technical debt around the disruption sections that we had in Provisioner and V1 off of 5. We introduced consolidation, I think it was quite a while ago now, but we had consolidation and this was mutually exclusive to a concept called emptiness, and you couldn't set both at the same time, and that was kind of annoying, and we generally thought that, conceptually, consolidation is emptiness, like consolidation already reasons about empty nodes, it reasons about underutilized nodes, it kind of is a superset of the existing, like emptiness behavior, and so we collapsed on consolidation into a single thing, and now we call it consolidation policy, and you can configure the aggressiveness of that consolidation policy and consolidation in general with this field called consolidated after. Consolidation policy also allows you to configure how aggressive you want to be, you can either say, consolidate my nodes when they're empty or consolidate my nodes when they're slightly underutilized or underutilized in general because when Carpenter sees your pods can reschedule elsewhere, it will scale you down, or launch replacements that are cheaper. And that all fell under this section called disruption, and it was kind of teeing up additional work around disruption controls, which we've done a lot of work over disruption controls in the last six months, and we're also planning on doing a lot more work on disruption controls as we head towards B1 stability. We removed webhooks, which was another big pain point for a lot of users, and replaced it with CEL, and we graduated our drift feature to beta, which means it's enabled by default starting in V033. So yeah, disruption got better with budgets. So in V034, we introduced this concept called disruption budgets, which are effectively no disruption budgets, similar to pod disruption budget concepts. You can tell Carpenter how aggressive you want disruption to be, and you can also effectively define maintenance windows on your disruption behavior, which was a heavily asked for thing. So if you just look at the spec here, this disruption budget, so disruption budgets effectively define like the most restrictive one is the one that's going to apply. So this budget here says, okay, you can't deprovision or you can't disrupt nodes during non-working hours. So there's a schedule component, there's a duration component. It says Monday through Friday from 5 p.m. to 8 a.m. don't disrupt my nodes. From Saturday to Sunday, never disrupt my nodes. And then otherwise, you can set a percentage or a numerical value on disruption budget. You don't need to have a duration or a schedule attached to it to say, either disrupt, so if I have over 50 nodes, don't disrupt more than five, because again, the most restrictive one applies. If I have under 50 nodes, then consider 10%, because that would obviously be more restrictive under 50. And it's calculating that based off the number of nodes the node pool manages. So that was a huge win. What this also allowed us to do was it allowed us because now disruption is user configurable. How aggressive it is, the parallelism attached to it is user configurable. It meant we could effectively be as aggressive as we wanted to be on the backend respecting user configured limits prior to the introduction of this feature. We were doing one replacement on expired nodes and one replacement on drifted nodes at a time. And that was causing a lot of pain for a lot of people who had like thousand node clusters and they wanted them all to roll and they're like, can you please do more than one at a time? So this allowed us to basically say, if user wants to do 10, they want to do 100%, you can go and do it. Don't do that in production though. That's not a good idea. So just this behavior is kind of interesting and we definitely need to do look a little bit more into why this behavior is what it is. Some of that has to do with our safety mechanisms around how we reschedule pods to make sure that the pods are still scheduleable after we use subsequent disruption operations. But effectively what we saw is, you look here, it's like we're doing one at a time. So effectively 1.5 per minute. And if you look now after, we basically made a wide open disruption budget we can do at maximally 20 a minute, which very fast, or at least quite a bit faster than it was before. And scheduling got quite a bit better over the last six months as well. We did a lot of work on our scheduling performance by CPU profiling it. And so if you look at 28, which we're on 35.2 as the latest version right now. So if you look at 28, we do scheduling benchmarking on the upstream repo. And so this benchmarking, I think now it goes up to 5,000, but at the time it only went to 3,500 and it made like diverse sets of pods and then scheduled them and saw how long that scheduling took. And at 3,500 pods, we were looking at about 30 seconds back on v0.28. That got cut way down to about one second on v0.35.2, which is the latest version. And then on head, we did some more improvements over the last like 20 commits. And so now it's like 10 milliseconds at that, that scale for that kind of scheduling simulation. Now, not all those pods are like anti-infinity pods. There's ways that they could be, that could be more expensive and more complex. But we effectively cut our scheduling down like 300 times. So that's pretty good. And the last one, or I guess maybe the second to last one, we improved our cloud provider support. So AWS kind of shepherded on and built Carpenter. So AWS's cloud provider support and kind of it did from the beginning. Azure supported Carpenter starting at last QCon. They announced it, I think right during QCon. So Azure has support. We launched clock provider support, which is Kubernetes without kubelet. And that one's more of like a toy cloud provider, if you're interested in just messing with Carpenter without having to run any capacity. This exists within our repo. It's linked in the slides. Once you get the slides after this is done, you can mess around with Carpenter without having to pay for it, except for the capacity that Carpenter runs on. And then Cappy is also working on a Carpenter provider that would enable it to provision Cappy resources, which is kind of a cool concept. So that one's coming soon. That one doesn't exist right now. But there's been a bunch of work in the community and the working group that's happening around that as well. I'm gonna quickly run through this. Looking forward, what are we kind of looking at over the next six to eight months? V1 release is kind of our North Star right now. We're looking towards stability. And so we're doing a lot of work to figure out what's on our V1 roadmap and what does the V1 API look like. If you're interested in involved in the community, you'll be able to look out for RFCs that will describe what we think these things should look like and then obviously give feedback around that. A lot of this will include things like improved observability. We're gonna improve our kind of our entire metric story right now, where it's a little bit, I would say, sporadic across the repo and not holistically well-defined. We're gonna do a lot of work there to better define that. Improve our observability around status conditions and native Kubernetes objects and eventing and all that. There's gonna be a ton of effort that's put into that. We're talking about doing more realistic benchmarking with the clock clock provider that I mentioned. So we currently do the scheduling benchmarking simulations today, which are useful, but maybe not as accurate in the sense that they don't mock real nodes. So we can go a step further and do like why test scheduling simulations. We can also measure disruption performance through that as well, which we don't quite have benchmarking on that today. And then a lot more work on disruption controls. So we're talking about a bunch of different features. Not all of these may make it in, but these were mainly put up there, for example. I think a lot of these will most likely make it in in some form, but taint on disruption condition, this is one people want so that we stop scheduling or that CUBE schedule or stop scheduling pods to nodes that we're gonna take away soon. Cause that's caused different issues, different ways that you can control Carpenter's disruption mechanisms. A concept called disruption grace period, which means that if for whatever reason, like there's a PDB blocking my disruption and I like Carpenter hasn't been able to disrupt this node for way beyond when it was initially like considered for disruption, you can say actually force like proceed on, maybe not force kill it, but like proceed on and ignore my protections past a certain timeframe. Like you have a CBE, you need to patch it. That's kind of an example, like a day past your expiration period, please proceed on. Yeah, and then things like forceful non-graceful termination, which is more of our drain behavior and support for each or support for consolidate after with consolidation policy under utilize, which is supported right now and a lot of people want. So these are all things we're thinking about moving forward with that all handed off to my check to talk about CAS updates. Thanks. So coming to cluster auto-scaler, we've been focusing on performance recently and the performance in few different dimensions. So one simple update is that we've done a lot of work to optimize CPU and memory usage of auto-scaler. So between 127 and 129, we've seen more than 30% improvement in both in most clusters we've been testing and improvements actually are bigger in larger clusters. So in 5K node tests that we run, we've seen more than 2X improvement. This is split between 128 and 129, so probably best to measure between 127. Another thing we've done is we have enabled by default the parallel drain logic, which we first introduced in 126. This is essentially a complete re-implementation of the scaled-down logic that we had before. And it makes it both safer and more importantly or more significantly faster. Depending on the configuration and the parallelism level you set, it can be many, many times faster. Previously, we would only drain one node at a time. So with parallelism of 10, which I think is the default one, you get essentially 10 times faster scaled down. And you can play with the settings, see how far you can go in a specific environment. Improvements generally are especially visible in clusters where ports use long-graceful termination periods because those make for very long drain. And another effort which we are now working on, we're not quite done, but we've already made some progress, is optimizing scale-up by essentially reducing the amount of one controller synchronously waiting for another controller. Let me go into a bit more detail on this one. So, and this is honestly a bit of a simplification, but broadly speaking, if you create a deployment, let's say just one port deployment for simplicity for now, what's going to happen is that KubeControllerManager is going to create a port. This actually has a few steps, but let's just say it's creation of a port. Then scheduler is going to observe that port, mark it as unschedulable, and it's only this unschedulable port that triggers autoscaler, so only at this point does autoscaler notice it, and basically it goes on and requests more VMs. Those VMs obviously need time to boot up, and finally when the scheduler notices the nodes are there, it's able to schedule the port. Now what we've already done is we have managed to essentially cut out this one round trip to scheduler, so autoscaler can now react to ports before they are marked as unschedulable by the scheduler, and this already provides some benefit in latency, but our goal, hopefully in 131 we'll see about that, but our end state that we'd like to get to is essentially this model, so what we'd like to do is have autoscaler look directly at deployment job and similar controllers, look at replica account, look at port spec, essentially all the information we need is already there, we don't really need to wait for ports to be created, and so essentially the logic in autoscaler and node startup can potentially happen in parallel with port creation and any scheduler action. As I said, this one is still being implemented, the motivation for these changes is obviously latency, it's just going to be faster if you don't have to go through so many steps, and some of them can actually be significant in a large scale scenario, so we've seen port queue up in scheduler quite a bit. Another thing is it actually also improves the quality of autoscaling decisions, so if you create a thousand ports today, you're going to see essentially cluster autoscaler do multiple smaller scale ups sequentially, this is because as ports are created, autoscaler already triggers small scale ups, so for each of those scale ups, it only has a very partial vision of what's going on, and so it makes suboptimal decisions. If we are aware of all the ports that are coming, we can actually do the bin packing simulation with all of those ports and use that knowledge to select best shape of machine that we want to use. And one thing to mention is that the change in the skipping they're waiting for schedule is available in 129, but it is opt-in for now, so if you want to feel free to test it, maybe not in production they want, but it should be working. And now I have two very small changes that I wanted to quickly mention just because of, I think, potentially impacting some people, so one thing we've done is we're finally removing ignore taint. This is something I've talked actually in the past on KubeCon, it was a mechanism that caused a lot of confusion. Basically ignore taint was a way to, you could mark taint to be ignored by autoscaler in scale up logic, which would allow autoscaler to create new nodes as if the taint wasn't there, so in simulations it would know the pods will be able to schedule. And this really was designed only for the case of taints that are used for custom node initialization, like installing custom device plugins is the usual one. And it was used in many other cases, leading to very difficult to debug issues. So we split this logic into startup taint and status taint. Startup taint is what ignore taint was before. Status taint is I think what a lot of people wanted ignore taint to be, so it's just more general any taint that should not be taken into account in scale up logic. And final small announcement is we are changing the format of our status config map. It's technically a backward incompatible change since it used to be just human readable impossible to parse format, and now it's going to be AML. And we're also putting more information there, especially about back-offs which have been a challenge to debug in the past, so we hope this is going to help. So thank you, and do you have any questions? Hello, thanks for your presentation. It is for you John, I'm using Carpenter for a lot of points, and actually we have a little problem or maybe a misunderstanding how it works. We have a lot of pod in pending states. Carpenter spin-up VMs, pods are starting. And few minutes later, some of them are killed and new nodes are creating a little smaller. So pods are just a pod without starting are killed and are going to the new VM. Do you know this issue or it is something? Yeah, that's interesting. So sorry, just so I understand, you're saying that you have nodes that are launching, pods are trying to schedule to those new nodes and then we, I assume we kill those nodes and then restart smaller versions of those nodes? Yes, that's correct. Yeah, I've not, I unfortunately am not familiar with that, so we can talk after and also if there is a problem there we should definitely open an issue. Yeah, I've heard that unfortunately, so yeah. Okay, thank you. Yep. I have a question. On cluster auto-scaler, there is a long issue about scaling down and cluster auto-scaler not consider the balance between multiple availability zones in case the node group spans multiple of them. And if you look at the documentation of all the cloud providers, GKE, Microsoft or Amazon, they all say, yeah, create one node pool per pair availability zone and use the balance node groups but this like complicates stuff, right? So I wanted to understand if this issue is never gonna be solved because it's pending since 2020 and also I have no idea if Carpenter does better because I don't know Carpenter at all, so. I think the cluster auto-scaling site and then maybe Jonathan can do the Carpenter. And so we don't, as you mentioned, we don't have any mechanism right now for balancing between availability zones on scale down. We only have the balance similar node groups which only triggers in scale up. There is no current ongoing effort to fix it which I don't think means that it's never going to be fixed. We're definitely open to contributions and hopefully at some point we'll just get to fixing it on the site of maintainers of cluster auto-scaling. So we're definitely aware of this issue. Yeah, and on the Carpenter side I guess I'll answer the like which do I use kind of style question I guess because that's a question that I think we get a lot generally not like add any detail if he wants. Just kind of like we said at the beginning like they each kind of have their different trade-offs in terms of where you want your configuration to live and what you want managing your life cycle and there's also a difference that wasn't covered that in terms of like I mean it kind of makes sense there's only four cloud providers and Carpenter so there's a difference in cloud provider support right now as well. So you have to kind of use all those things to evaluate which project is right for you and maybe the answer is both. In terms of Carpenter's handling of like multi-AZ scenarios you because all of the configuration exists inside of the cluster and it's not necessarily tied to like I said we send off all that information to the cloud provider and it makes its decision and all that configuration is not tied to the cloud provider API multi-AZ it requires one basically one configuration surface in Carpenter versus having to create multiple no groups like you would in CAS traditionally. Hope that answers your question, yeah. Yeah, I'm curious about the because now you have two projects that are kind of doing the same thing and maybe you can talk a little bit about what is like the strongest motivations or not ending up with just one project doing the same thing. There's probably some reason. I can start and then we'll see. I think the major motivation is the major defense section we had before. So they do the same job and sort of externally as a user when you run pods there may not be much difference but how they work behind the scenes and in particular how they integrate with the cloud provider is very different and so I think large part of the motivation is it may, depending on the underlying APIs and behaviors of cloud providers, those are quite different. It may be that one or the other approach just works better and I think we are far too divergent in our code base to be able to just say it's one project with two different ways of integrating cloud provider. Yeah, I think maybe one example is if you, there's specifics on node upgrade behavior that you like about how your cloud provider handles it. Like it does like maybe a starting kind of like an exponential rollout where it would start slow and then move forward very quickly. Carpenter doesn't support that kind of thing today so potentially there's differences in how the life cycle management is handled on the cloud provider side that you might like more and therefore you'd want to use CAS versus carpenter and then there's also the detail about like do you want to manage the configuration surface through the, I mean that one's more I guess minor but do you want to manage the configuration surface through the cloud provider or do you want to manage it directly on Kubernetes? Yeah, and I think they're so philosophically different at this point that it's really hard to say like, oh I could merge them because they do solve some more problems but they are like, much like I said, they are so like philosophically different around how they actuate everything that it's really hard to say, we should just converge them, yeah. Thanks, I just learned about the do not disrupt annotations. So good, but I was wondering what the difference in implementation between CAS and carpenter are especially when it comes to does if there is an annotation existing there, does it, especially for CAS, does it just not scale because the cloud provider handles that and you can't specify, hey, you can scale but just not this node and I assume that there's better support in carpenter for scaling a different node but if you could elaborate, I'd appreciate it. I'm not sure I fully understand the question, you mean scaling down nodes? Right, during the scale down or the consolidation if there is the do not disrupt annotation or do not evict annotation, what happens in CAS? Oh, I see. So I think one kind of misconception or one easy assumption to make about CAS is it just changes the replica count of the underlying node group, but that's not the case. It picks a specific node. So it generally expects the underlying node group to have a function that allows reducing size by deleting a specific instance. And so it's always a specific instance that's selected for deletion. That makes so much sense. Thank you. Is the same for carpenter? Yes, it picks a specific instance. Yeah. Thank you. Hey, thanks for the talk. There's at least one other outskilling provider or implementation I'm aware of that escalator which I think was published by Atlassian. Do you know if there's any integration or if there's any initiative to also have them join the SIG? I'm not aware of any. We also have SIGLIT here, but it seems like I should have said no. Yeah, that seems like a no. So I think not so much. I think for carpenter, we were really as a SIG approach by carpenter, right? So I think that was the kind of direction it happened last time, but we'll see. Yeah, I think if there was some more effort from their side, we'd obviously evaluate them. Yeah. Thank you. Yeah. Hello, I used both projects and they are great. Thank you for the talk. My question is about carpenter. I can use the disruption annotations to turn down non-production environment nodes to turn down the non-production environment. This was what it was made for, or disruption budget was created for some other reasons. So let me see if I understand. So you're saying you... I want to turn down a non-production environment in non-working hours, for example. Oh, you want to spin down nodes during an hour? Yeah, I was spinning down. I see. Do not disrupt was invented most often for jobs for AIML use cases, where you don't want your job getting disrupted in the middle. Obviously for stateless applications, it matters less. Ideally, you don't want your stateless application getting disrupted every five minutes, but if it's disrupted every now and then you should be able to recover. Yeah, it was more invented to say, okay, let this thing finish, and then once it finishes, Carpenter can then act on it. It's less for orchestrating the scale down, because realistically what Carpenter does today is if there's a scale down during non-working hours of your pods, then Carpenter will just naturally, like with consolidation, will just naturally scale down for you. Like it'll remove the nodes because they're no longer used, and you'll save money during non-working hours. Okay, thank you. Thank you for all these updates, very nice. One got my attention, the one that you mentioned that is like the node disruption budget that you put there. At Datalog, we are using the cluster autoscaler, and we have a dedicated layer to tear down the node, especially when you want to roll out a new image and so on, we proactively drain the nodes, let's say. And one of the things that we do is we have a big node pool, so think about thousands of nodes. And the restriction that you have put in this node disruption budget that you show applies to the entire node pool, whereas the constraint that you may have may come from all the different constraints, may come from different application or running on that same node pool. So you may have different application coming with different needs in terms of space or working on different days or this kind of things. And if you have like 10 nodes on top of 1,000 that says I can be disrupted only on Monday, it's gonna be applicable to everyone in your case if the setup is at node pool level. I see, so you're saying a scenario where you might want to tweak your pod disruption budgets during certain hours to stop them from being disrupted potentially? Is that kind of the? Yeah, the thing is that multiple application are running on that node pool, and some may be very restrictive, but represent only 10 nodes on top of 1,000. I see, so you want to say this application shouldn't be disrupted during this time, but everything else can be. Exactly, we tackle this kind of problem and the thing is that we have this setting at application level, not at node pool level. You're saying you have it set at the application level? How do you orchestrate it today? But we have a dedicated controller to do this part. Oh, I see. Out of, so out of the scale. Got it. The cluster auto-scaler, finally tear down the nodes when they're empty, but all the, we are piloting the drain and everything with a dedicated layer. Yeah, I think one interesting thing that we talked about when we were considering node disruption budgets was like why do pod disruption budgets not have like a similar kind of semantic? Because I think that would probably solve the case you're talking about to some degree. Yeah, this is what we created here. I see, okay, yeah, so. So just saying that discussion, I think. Yeah, we may have it like at application level, maybe not at node pool level, if node pool is shared by multiple application, that you may have. Yeah, no, I think there, I mean, we kind of, I mean, you can tell by the way that CAS and Carpenter think about configuration, there's like, there's the two layers, obviously. So I think the cluster admin node pool layer is important because cluster admins generally need to be able to configure these things to protect their application developers to some degree, but the application layer is obviously, and we typically see like similar configuration surface at both layers because the use cases are very similar. So I think it's worth the discussion. I completely agree. And your limitations of PDBs are also something we've been talking about forever in cluster autoscaling communities. So maybe that's one more thing we can work together on. Okay, I think we're at this point, we're like 10, 11 minutes over time. So I think in the interest of time and wrap up, if anyone has a question, you can feel free to come and talk to us afterwards, but thank you.