 Hi, everyone. Good evening. My name is Alex Meyer. I'm Michael Dresser. And we're software engineers at KubeCost. We're the creators of the KubeCost app and also OpenCost. And we're super excited to share some interesting information we found with you tonight in the course of developing our app, right, surfacing cost information for customers and helping them act on that to size their nodes and workloads. So we'll do a quick overview of what we're going to go over tonight. We'll start by defining this concept of node overhead, or we'll shorten it to overhead. We'll go over what it is, how do you obtain it, write how it's calculated. We'll go over open cost very quickly, which is the open source component of KubeCost. We'll have a quick demo of how to use OpenCost to determine these concepts of node overhead and measure your sunk cost, if you will. We'll then use this tooling to conduct a study and share the results of that study with you, and we'll summarize one of our key findings from that. In the interest of giving action items based on the study information, we'll go over some node sizing algorithms, both our existing node sizing algorithm, and then we'll compare and contrast that to the updated node sizing algorithm that sort of incorporates these ideas of node overhead. So to start right off, if we define node overhead, it's basically compute that is used to run Kubernetes itself, right? The cost of doing business, if you will. So nodes typically have capacity. I'm sure we're all familiar with that, right? That's the sticker price. That's what you're billed for in GCP, or AWS, or whatever you're using. It's what's in the pricing APIs, and that's what you're paying for essentially, right? Now there is some subset of that that is defined as allocatable. Now allocatable capacity is something that's passed to the Kubelet through the command line arguments when it's booted up. So we're gonna study a little bit both of the property of Kubernetes itself, but also manage Kubernetes, right? The difference between these is overhead. So overhead includes things like the Kubelet, right? Any control plane infrastructure running on that node, if it's applicable, the container runtime, like Docker or container D itself. In general, any software running directly on the node, that isn't in a pod, right? Overhead does not include things like Prometheus, any sort of DNS pods, or CERT manager if you're running it, anything in KubeSystem, because again, that's all part of Kubernetes. So that is out of scope of what our definition of overhead is. So you can see just from Kubectl describe or any sort of Kubernetes API request that tells you about the nodes in your cluster, you can actually see this capacity and the allocatable and the difference immediately. This is a slightly cut up response. I'm sure you've KubeCTL described a node before, it's a lot more than this, but these are the relevant fields. We can see that I'm running an N1 standard two here. We have the capacity block and the allocatable block. We're comparing two CPU capacity, which is again that sticker price that Alex just talked about versus allocatable, 1930M, not too much of a loss there, but if we go to the memory, I think we'll see a pretty stark difference here. We're looking at, I don't know what that is, seven and a half gigs of capacity versus a five and a half-ish allocatable. Big loss there, obviously this is a small node type, but we'll dig more into node types across the providers in just a moment to show you that survey information. But I just want to highlight this here, like some of this information is accessible to you today in your cluster. A brief turn to open costs, just so we can set up some of the rest of the talk. Open Cost is a CNCF sandbox project. The primary maintainers is our company, Kube Cost, but we have many other excellent contributors that we appreciate, including Microsoft, Grafana Labs, and many others. What I want to highlight today is the REST API. It does come with a baked-in UI, but for the sake of the demo, we're gonna be looking at the REST API and how it combines that information I just showed you from Kube CTL Describe with actual cost information about your nodes. There's much more you can do with open costs, of course, but we only have time for a small slice of it. Cool, so to dive a little deeper into how we calculate overhead within open costs itself, we sort of have this flow diagram, so it begins at the bottom right of this diagram where basically we surface those allocatable and capacity metrics that we sort of referred to a few slides ago in Prometheus metrics. Now, these are standard Kube State Metrics metrics, so these would either be emitted from your Kube State Metrics pod if you have that installed, otherwise open costs itself actually emits these if you don't have that installed. So then those are, of course, exposed to Prometheus. We're not gonna go too far into that because I'm sure many of us here use that and are familiar with it, but Prometheus will scrape these and provides our critical sort of aggregation query functionality. On the left here we have open cost itself and in that we have the different components of the algorithm broken up. So we begin the overhead calculation by, of course, the query to Prometheus, right? And we'll follow just RAM on this side for brevity but CPU calculations would be analogous, right? So here we have an average call to the Prometheus server. That will return our historical data that we can use to calculate the overhead and then we proceed to the next step which we call pre-processing, right? So we have basically a calculation here that calculates the absolute value, right? We perform our classic capacity minus allocatable to get an absolute amount of bytes being used for overhead and then to turn that into a proportion, we divide that by the overall capacity, right? To give you a percentage of your node being used to run Kubernetes itself. Our final step here is what we call post-processing, right? Our goal is to consolidate all of these different components into one number that sort of summarizes the total overhead costs. So here we're performing a cost weighted average between the CPU which we brought back in for this last example and the RAM multiplying them by their respective costs so that out comes one proportional number that accurately represents the proportion of the cost that is being used for overhead. So for me this is the critical addition that OpenCost provides and of course CubeCost on top will provide on top of what the Kubernetes API already exposes to you. So we're gonna pray to the live demo gods here. I have an environment here that is backed by OpenCost. We're gonna be hitting the assets API which is what you do to access node information. Relevantly, we're just filtering down to node types and I'm only gonna show you the overhead information because I don't think we need to see a massive JSON blob and all you can do with it. The live demo gods have been kind to me. We can see I have four nodes in this environment. We have that CPU specific fraction on this node. It's like three and a half percent, not a big deal. RAM is more like 24%, that's pretty substantial. These are pretty small nodes and we'll see why that matters shortly. And then also we have the overall overhead cost fraction. This is the key bit that OpenCost adds. 15% of the cost according to OpenCost is being spent on the Kubernetes overhead and is not usable in my environment to schedule workloads. The only other thing I'll point out here is that I have a couple other nodes here where the CPU overhead looks like 50%. That's really, really, really bad. That's a particular outlier that we're gonna highlight in a couple of slides in just a moment. So we wanted to use this tooling to basically conduct a study, right? And we have this nice new API capability in OpenCost, right? And we wanted to use it to gain a better understanding of overhead. So we wanted to take it in a couple different directions, right? And this is an empirical study, right? It's worth noting that some cloud providers, like I know for a fact AKS does this specifically, they will publish sort of formulas, guidelines for their overhead, but we wanted an empirical study to determine how overhead behaves as things like the node size increases, right? The node family changes if there's any difference there, and then of course comparing and contrasting the different providers. With the end goal here being that we can generate data to input into our decision-making process when making the extraordinarily crucial decision to size your cluster. So a quick overview of our methodology, right? We'd call each cloud provider's API endpoints to obtain both the node sizes available in a given region and their prices. We then take 10 batches of those, or batches of 10 of those at a time, each node type in its own node pool and deploy to the actual cloud provider. And then after that, install OpenCost, and it's according to Prometheus on those, and obtain the allocatable metrics capacity, right? And those would actually come directly from KubeCuttle itself. We would use the OpenCost installation to obtain that weighted overhead cost metric we went over a little bit earlier, and then finally any node metadata just for later analysis. We'd save that in a JSON file, and then that was the input into our analysis pipeline that will generate the graphs we're about to go over. So we started by just putting together a couple of exploratory graphs just to look at a couple of the smaller nodes to have an idea of what kind of data we were looking at. We're gonna start here with CPU just because that's where I started, and I immediately found something very odd. There's a bunch on the left graph there. There's a bunch of dots down towards the bottom, but way up at the top, in near 50% CPU overhead is an E2 medium node. And we can see that reflected again on the right where we have three nodes. I'm sorry if it's quite small, but those are E2 small, E2 micro, and E2 medium that all have the 50% big red bar there. This is a particular outlier that we excluded from future graphs, but we wanna highlight it right now as sort of a case study and something interesting we found about this family of nodes in particular when used in Kubernetes. So what's going on here? We looked at these ones specifically. We didn't really see this in any other instance types. So upon further investigation, the GCP docs tell us everything we need to know. These are special burstable instance types. And we quote GCP here. It basically sustains two VCPUs at 50% of CPU time, effectively giving you one usable VCPU core as a baseline, but letting you burst up to two VCPUs, right? So, okay, you know, that makes sense, right? In the context of what we're seeing for overhead, right? We're seeing about 50% CPU overhead, and we have one core effectively usable. So how do the prices stack up, right? Like are we, what's going on? We see that we're about 3.3 cents in this region for the burstable instance. And you know, just using the E2 instance builder type, building an equivalent one VCPU instance with the four gigs of RAM for this medium, three and a half cents. So you're not getting ripped off here, right? You're actually getting a slightly lower price for something that can burst to a higher utilization. And maybe GCP is doing some sort of efficiency on their side and they pass the savings on to you. That being said though, you aren't quite getting what you think you are, right? And at minimum, right, this could be a challenge for capacity planning when if you try and schedule a pod that approaches to VCPU of allocation, you're not gonna be able to schedule it on these. So in the subsequent graphs, we're generally gonna remove these as outliers just because of this sort of interesting situation here. So let's jump straight into the metric that I think is the most interesting for most people, which is what percentage of the cost of a node is overhead and how does that vary as cost increases? Cost increasing is generally a good proxy for the capacity of the node increasing. We'll break it down by specific resources in just a moment. But I just wanna highlight the overall trend that we're gonna see repeated multiple times, which is there's some range of time for which range of price or capacity for which we're paying a pretty steep overhead price. And then as that capacity of a single node type increases, the overall penalty we're paying to Kubernetes to help manage that node for us is decreasing, right? So that sort of down and to the right trend is gonna repeat itself multiple times in our graphs. The only other interesting trend here is that AKS has a sort of a market step above GKE and AKS in terms of overhead, more work to do there to understand why I'm not sure if that's simply AKS being more conservative or something else. I'll also notice that on the Y axis here, we're seeing a penalty of up to like 20%. More commonly, it's like five to 10% and we're sort of tailing off towards not quite to zero, but almost to zero as we get to a certain extreme. Breaking it down now by individual resource type, looking at CPU. So now we're away from the cost dimension and we're looking specifically just that amount of resources paid. This is percentage of CPU capacity that's as the absolute capacity increases. This is pretty low, right? We're talking single digit percentages tailing off towards 0.512%. Not a big deal, which points us towards memory as sort of the main driver of overhead cost. We can see again that AKS is sort of popping above the other two here. Again, more to look into. As far as memory goes, here's where that driver is coming. We can see that the Y axis is now in tens of percent. We have again, same trend, really high spiking towards the left side and we're just sort of trailing off. There's definitely a sweet spot here. We can see in this graph, it's probably around like the 48 gigs of capacity mark for most for the average provider, varying of course. AKS is, you know, difference from AKS and GK's market here. Again, we don't quite know why, more work to do, but I just want to highlight the same trend and the fact that memory, as far as overhead goes, is the main driver of the penalty that you're paying. So we also included some studies where we look at the different families here, right? So on the left here, we have the KS instance families and the specific instances aren't terribly important, right? Just know that as it goes. So the right, they get larger, right? So we can see sort of here the same trend we're seeing elsewhere. The smaller members of each family have generally higher overheads, right? Moving all the way to the right where we have the dot metal instances, right? Which is AWS's abstraction, giving you the entire underlying VM essentially, with generally the lowest overheads, which makes sense falling on from our other charts. On the right, we have GKE instances separated by high CPU, high mem and standard. Again, the exact families don't matter. Our big takeaway was family type doesn't significantly affect overhead. So going to a high mem instance won't significantly reduce or increase your overhead per se versus the high CPU instance. So finally, a couple of interesting graphs on just the most expensive in terms of wastefulness relative to overhead as a percentage of cost. We'll flip in just a moment. The next one is the least expensive in terms of overhead you're paying as a proportion of cost. We've brought the E2 family back in here. You can see that really substantial difference. We saw both in the open cost API example and then that Alex explained. So of course they're dominating the figure here because they're unique. The interesting trend here I'd see that you can see as well is that AKS and GKE really dominate this field. And as we saw in the other graphs these are all small node types. So we're talking two, three, four cores, you know, four, six, eight gigs of RAM is what most of these node types are giving you. EKS just coming in towards the bottom there under the 10% mark. I'm not sure if that's some target that they aim for or something like that, but it is interesting. Switching to the lowest, the most efficient in terms of overhead in particular, EKS dominates the field here. Part of this is because EKS was the node, the provider that we surveyed the most broadly. So we may have captured the most largest size nodes on EKS. But again, we see the same kind of idea here where the percentage here we're like below 0.6%, this is basically negligible. But the thing I'll point you to is simply the names of these instances. These are just about the largest instances in their class. So that's just matching that trend we saw in all the previous graphs in terms of larger node size, you pay less of a penalty towards overhead. So to just summarize what we're finding, right? I'd say our top line finding we've got here is all the charts I've made were painfully clear. Larger to the node, the lower the overhead, right? To summarize it in one sentence, we converge towards 0% of overhead asymptotically, starting rather high for small node types. An interesting finding we had is there are a significant number of nodes with 10% or greater overhead, right? Which is a pretty significant number, especially as your cloud spend gets larger. We find that the total overhead cost, again born out by the charts, is overwhelmingly driven by RAM, right? We didn't really see anything over 6% CPU usage being given or reserved as overhead I guess you could say. We find that Kubernetes has an interesting way of handling the GCP shared core instance types, right? As we saw with those outliers. Provider-wise we generally see AKS with the highest overhead. Again, perhaps conservative decisions or other algorithmic work. And then finally we see EKS with in general the lowest overhead. Just a quick note on the scope of this study, right? We didn't include node disk, which also has overhead components. We also didn't study GPUs. Node disk because it's not a very large driver of node cost relative to CPU and RAM. And GPU we just sort of brought out of scope for this portion of the study. So how do we action on this information? I'm gonna give you a brief overview. I'm not gonna read the whole slide, I promise, of how KubeCost recommends nodes to our users. We're gonna talk then about some general trade-offs when doing node sizing. And then Alice will tell you about how we can incorporate overhead into this algorithm. So the basic idea is that we gather some heuristics about historical behavior of workloads that ran in your cluster. The exact ones are listed here. We then have to look through all of the node types you might have available for you in a given region, look at their costs, say, can this set of nodes schedule the workloads you need to have? How many of them do you need and what is the cost? And if you do that analysis for all the node types available to you, we pick the lowest cost one and make that recommendation. The specific thing I'll point you to is just towards the middle of the slide. We're looking at, I don't know if you can see that, VCQs per node, RAMGP per node. If you're naive like we were before we did a careful analysis here, that's the sticker price that we have just debunked as something that you don't actually have access to all of for scheduling. And so you might be able to work around this by simply saying, oh, I need overhead and you don't really think about it. But in terms of the sophistication of picking nodes for your cluster, if you're not taking overhead into account, you're not picking the right nodes to schedule your workloads. So how do we go about adding overhead into this algorithm and making it more aware of reality, I guess you could say, right? I guess to summarize the biggest challenge here is we have sort of forces that are in tension here, right? You naturally don't want to choose the biggest node you can if it's going to be sitting idle. So that pulls us towards smaller nodes, right? You don't want to pay for what you're not using. But what's pulling us in the other direction is the smaller nodes you are paying a higher, cost of doing business, essentially, right? That's not even available to you. So we have a few thoughts here of like, well, you can't take this to the extreme and say, okay, we're going to go to an 800% larger node. Unless you can use it right, you're losing a lot of money there, right? Just pay 800% more, just to save 10%, right? Unless you can use that capacity, right? So to sort of extrapolate this, if you are using, say, dozens or hundreds or thousands of small nodes, then you could potentially consolidate those into fewer, larger nodes. Subject to your availability requirements, right? That's an important consideration here, cost aside. However, if you're able to successfully do that and meet your availability constraints, you can see some significant savings here, right? Yeah, just a note on like a few percent can make a huge impact on the bottom line, right? Node costs are generally the largest driver of Kubernetes expenditures we found. And just in general, when picking node sizes, it might be better to just bias towards larger nodes because of the reasoning being, well, if I have to pay for this capacity, I might as well have it usable and available to me if I need to scale quickly or natural growth versus go with smaller nodes and just sort of give that up. So here we sort of revised our algorithm to incorporate these findings, right? We've highlighted the key changes that we made from this algorithm that Mike reviewed earlier. So what we do now is before we even begin calculating our optimal node sizes, we use the data that we obtain through this study to fit a trend line to it, right? In general, inputs being number of CPU cores or bytes of RAM. And out we get sort of the amount of overhead in general in a line of best fit. We compute the same set of requirements, right? Maximum individual pod requirements for node sizing, our total workload requirements, and then any daemon sets, right? That need to one per node. Then for each node that on a given cloud provider, first we begin by computing the amount of overhead, again, using our best fit line, right? It's a heuristic, but we compute the amount that we're going to discount the resources by. So we have here our algorithm for computing that resource. So we, again, pass the sticker, the daemon plate capacity into our function, out comes the amount of expected overhead. And then we also have in our algorithm built in an overhead penalty, right? And this is configurable. We defaulted to 5%, but it's configurable by the user, to discriminate against higher overhead nodes, right? And amplify the impact of the overhead for efficiency. So then we reduce the nodes' available resources by that calculated overhead amount. So now, when we contemplate our largest pod size, we can use our actual allocatable capacity to make that recommendation, right? So we won't attempt to fit a pod requesting exactly two VCPUs on an instance with two VCPUs of capacity. Then as before, in a prior algorithm, we sort of determined the minimum number of nodes required to satisfy your total workload amount. And then, again, we have inputs for any availability concerns in here. Generally, we rely on the user to supply those, right? It's so application dependent. And then as before, compute the total cost. We return that back to the user. But this time, we have additional information where, again, we surface to the user the total amount of overhead for this hypothetical cluster if they were to pick this node size. Again, just visibility, right? It's sort of the first step and just being aware of it is important here and can help with decision making. So finally, to close out, the main objective here was first just to raise awareness. We know that many very sophisticated users of Kubernetes at scale totally understand this problem. They have all sorts of other configuration problems they run into, but they know about this. But a lot of people don't. You'll tell them, oh, Cubelet needs space. And they say, yeah, that makes sense. But they don't really put a dollar figure to that or a scheduling penalty on that. So first, we wanted to just highlight the problem. We were very curious about the results of the empirical study. I found them very enlightening personally. But the overall conclusion is that nothing is free and you do have to pay a cost for running Kubernetes. But there are real ways that you can optimize that. We've seen 50 plus percent penalty. We saw also that was kind of an outlier. We often see more like 10 to 20% in the smaller node sizes. And then as you get larger and larger, we're getting to single digit percents of overhead, which is mostly negligible though at scale, it can matter. Obviously, we want to optimize and reduce this waste. So you might say, well, let's just pick, as Alex said, one or two very large nodes. There are availability trade-offs there that you should be aware of and be careful of. A larger number of nodes can be more resilient to degradation of nodes or simple failure of nodes. So it's always important to weigh that in your environment. But there's real cost savings available at larger node sizes. We also briefly went through our revised node sizing algorithm about how we can incorporate overhead. And of course, I have to plug our product in a single line, which is just install cube cost. It's free, help install, you can try it out. And then, yeah, use that algorithm that Alex and I just talked about. Thank you for your time. You can come see us at QCost Booth M10 tomorrow. I know it's kind of late today. And there's also an open cost booth at F34. Thank you for your time. Thanks, everybody. We can take questions now. And I know there's a microphone, or you can come up if you like. Hi. I had a question about how exactly do you measure this overhead, right? For example, you mentioned that it could be the host OS overhead. But if I ask for one gig, and I'm using only 500 meg, then the OS is free to use the other 500 megabytes to actually help me in keeping page cache, et cetera. So it may seem like I have 50% overhead, but actually, there's no overhead. So I'm curious about if I have a workload, how exactly do I measure the memory overhead? So the workload itself, which in Kubernetes land is a pod, doesn't have overhead. The workload itself uses resources on the node. They cost you money. Open costs can tell you how much money that costs you. Overhead is a trait of the node itself. We can't tell you how much of that is the host OS. We can't tell you how much of that is KubeLit. That's set, as Alex said, as a runtime parameter to KubeLit. So if you're interested in sort of digging into that, that's more of like a, say you have a managed provider, I don't know if you use GKE or EKS or AKS, or it doesn't matter what you use. They're choosing what that parameter to KubeLit should be, and that's what determines what KubeLit tells the cluster its available capacity is. Most people are setting requests above usage, I hope, for stability purposes, unless they're really redlining it. And so if, you know, that allocatable space is really what matters to the scheduler, and thus to things like autoscalers and things like that. Does that answer the question? Yeah, yeah, yeah. Thank you. There's a recommendation by instance family, right? Oh, would you just put the mic up, yeah. So there's a recommendation by instance family, right? And so like when there's like some types of instances which where the workloads cannot run, let's say maybe like this Graviton instances where the workloads cannot run, right? Maybe they're not compatible with those kind of Graviton instances. Is there a way like we could just use a subset of instance family types where we could get a recommendation on? Sure, so you're talking about, if I can rephrase, is there a way to look at a subset of families when recommending cluster sizes? Yeah. That is a thing that KubeCost does not currently do, but that sounds like an excellent idea because I know like when we're looking at like a graph like this, like by families like that, like this is a very general like, hey, all CPU cores are equal. We know this is not true. People pay, people want particular instance families either due to specific hardware availability, specific CPU architecture, whatever it may be. And so that becomes a real factor, but we'd like to do that. We do not do it today. Thank you. Hey, great product. So question is, we mostly run on-prem, but we're making our journey slowly to the cloud. Is there a model that you all have to say, hey, analyze my Kubernetes clusters on-prem, make a recommendation on, hey, your nodes would best, you know, this is the family of nodes you'd best run on if you're EKS, right? That's a really interesting idea. I have not heard of that use case before. It sounds like we support out of the box in the product. I bet we could figure something out for you. Is all I could say, I like say hi afterward, yeah. All right, thanks. Yeah. Okay, yeah, I wanted to clarify, because on resources on the node, I know back in the deployments, you can have like the limits and the requests. So I wanted to know like during the study, which one took consideration in the overhead calculations, yeah. I think we generally use requests in terms of calculating overhead. So overhead is a property of the node itself, right? Workloads have requests, nodes don't have requests, but when you talk about the scheduler, usually the first thing you think about is requests because the scheduler says, I need to place a workload on a node that has the allocatable capacity that can fit this pod's request. Like that's where we look first. I'm a novice at scheduler exact behavior. I know there are some things that will take a limit into account, but most people when they think about scheduling workloads, think about requests, and the way the scheduler generally looks at it is, is there an allocatable space to fit the request of the pod? So the overhead is what's reducing the capacity, that sticker price, into what's actually allocatable by Kubernetes for your workload. Since requests are like a minimum, right? It's like I need a minimum this run, right? Whereas a limit is sort of like an aspirational maximum, I guess. So we sort of went with the lowest common denominator there. Okay. And as a follow-up, because you noted that you also provide node recommendations, is that also based on the requests or the limits for the scheduling aspects? So that's based on both requests and usage. So if your usage exceeds your request, you probably, this is not always the case, but the general assumption is that you actually need that capacity, even if you're not requesting it. For example, if you exceed memory request, if we were to super tight-size your cluster and you had a bunch of workloads overusing memory above the request, they would die. So we look at both requests and usage, pick the larger of the two to make sure that there is not only space to schedule the workloads, but also to run them without failure based on historical behavior. Okay, thank you. You're welcome. I'll preface this by saying I might have missed something, but do you guys have a way with open costs to kind of factor in daemons that's like part of the overhead of a node? So that comes in more as, this is a very interesting question. This is like, if you look at something like cube system, it sort of falls into the same category, right? This is a cost of doing business, as sort of Alex described it, but it doesn't fall under that specifically capacity minus allocatable category. So in both open cost and cube cost, the way we treat this is as a shared cost, usually. Some teams like to allocate all of cube system and then anything under it to just like a specific team, maybe they're in for a team and some share it among all the tenants of the cluster. That's just like a, it's up to you kind of. But we don't think of it in terms of specifically overhead as we've defined it in the talk. Like it is overhead certainly of doing business, but it's not node overhead in terms of capacity minus allocatable. So it falls under a different bucket and it's tracked as a pod. So you do, you know, Kubernetes native reporting on it rather than something a little more complicated with node overhead. You generally have more control over what's running in the cube system namespace, right? When you spin up an EKS cluster with managed nodes, like the allocatable is set by whatever the platform team is using to calculate that, right? Whereas with cube system, you can play with those resource requests at your own risk, right? But at least it's available to you as an API user. Yeah, yeah, the, I guess the use case here is potentially like from running data dog agents. I'm also running EFS CSI drivers, EBS CSI drivers and a whole bunch of other stuff. Then all of a sudden like, oh, well, how much of my actual instance is still left for my actual workloads and not just operational overhead? Oh, for sure. So when it comes to specifically like recommending nodes, when we get away from simply just measuring overhead, but our algorithm for recommending what node you should be running, we're taking out not just overhead now as Alex described, but we have always been taking away that game and set capacity under the assumption that they're running on your nodes because that is effectively unallocatable capacity. We just measure it differently because they're pods, not this weird fixed overhead thing based on node type. Thank you. You're welcome. Any other questions? Hey, thank you. I'd like to do the same kind of study on premise. Would you like to share the methodology that you use to do that on the cloud provider, to share the raw data and maybe good plot also, the result of on-premise analysis and see if the on-premise how it performs compared to the cloud provider? So we do intend to open source the scripts and things we use and actually probably the experimental data as well. Very soon, we had every intention of having it done by this talk, but there's a lot of support for that. So yeah, it would be very interesting to see the findings on an on-prem system because largely you're in control at that point, right? Like you're the person passing the arguments into the cubelet, right? And you decide what your overhead is. So it might even be more interesting in that use case to validate your assumptions about what your underlying system is using versus what you think it needs to run case. Okay, thank you. Do you know if the overhead increase with the amount of bots running on the nodes, is it some sort of fixed reserve capacity and it's gonna increase eventually or it's really more linear scale? It's fixed at startup time. So when you spin up a node and then you spin up cubelet on that node, as Alex described a little while ago, I think you pass it a flag that says this is the allocatable capacity that you should report. So it's not gonna scale with a number of pods on the node. Like as the node gets busier, it's not going to increase the amount of overhead it has. It's just fixed there. I assumeably under the assumption that it's gonna reach whatever max capacity the cubelet is capable of. I know like the default's like 110 pods per node or something like that. I know that's configurable too if you wanna have fun. But yeah, it's fully fixed for the lifetime of the cubelet process. Thanks. Yeah, you're welcome. Back again, so when you show the efficiency and the dollar amount, like how much it is used, right? So for a specific workload or by your namespace, is it the recommendation, right? So is it by the average of CPU and memory or like is it like the max, whichever is the max of both or like, because oftentimes let's say if I take an instance type like a C-fi, right? So although like the compute is less but to have a lot of more, like an M-fi, for example. There's a lot more memory than the CPU consumption, right? So although there's more memory wastage but I'm at max at CPU so that might not be very accurate if I'm taking out max or an average, right? So what this chart shows is it's the cost weighted. So if your node has a lot of RAM, right? The RAM cost will be significantly higher than the CPU, right? And so if your RAM percentage of overhead is, you know, whatever it's amplified, right? Because RAM is such a big component of the node. It's not necessarily an attribute of like the Kubernetes namespace or the workload. It's more of an attribute of the underlying node. It's sort of one level lower than that. But in terms of profiling the workloads to recommend node sizes, most of the time we're looking at max information because that's conservative. I know that's a little controversial. We like to prioritize availability and recommendations until you tell us more, which is why we tend to look at max over averages because averages hide spiky behavior, which is particularly dangerous with memory and getting out of memory and dying. Right, yeah. Yeah, I think like max probably makes more sense but sometimes where like you have like, let's say M5 type of instances, right? Like there's a lot more memory available than CPU but if your applications are maxing on CPU already, that's fine. Yeah, so we look at the resource requirements independently, right? CPU is one resource, memory is one resource. So there might be some node types that simply are a better fit, right? We'll might give you a node recommendation that has less memory and the same or more amount of CPU simply because it's a more efficient use of resources, right? Like we saw that in memory in particular, right? There's like a lot of overhead we're paying for that. So perhaps that might be an optimization there. Does the recommendation today show like a homogeneous node type or like is it like we're gonna say you do two of this type, two of this type? The primary recommendation is homogenous but there is a recommendation that will pick two or three different node types just to try to pick a different method. It's obviously not like super optimal in terms of picking every possible combination because that can get kind of expensive in terms of computation. But we do do a best effort attempt to pick a heterogenous set of nodes. But for the scope of this presentation we just did our single node sizing algorithm. We started using KubeCost like pretty recently so that's where we come with this platform. Yeah, we'd love to talk more if you'd like at the end. Yeah. Hey, great talk. I wanted to follow up on that cost of doing business question that came up a couple of questions ago. And so you're taking a look at overhead is to run the virtual machine. I was wondering if you noticed any trends for what's running in Kube system, Damon sets that were applicable across distros. Like does EKS consume more overhead in running its pods necessary for running the system versus AKS versus GKE? Did you notice anything when you dug beyond just the difference between allocatable versus what was available for the node? So we didn't. We didn't look at the actual contents of the Kube system namespace. We sort of put that out of scope and made it more of a study of the underlying node itself. But that is a really interesting insight there. And like definitely is worth consideration, right? Because if a certain provider gets away with lower overhead as we measure it because they move a lot into or out of our definition of overhead, then yeah, you're not getting everything. But so we made a little bit of an assumption that all the cop providers, even though the services are different, they sort of were approximately. Which we know is not the case because everyone's managing their own flavor derivative. I'm very interested. I'll log in under future work for sure. And we also have choices to make like what CSI driver or drivers are we using? Because then that Damon said will affect the amount of allocatable space. Not allocatable from Kubelet, from what you're putting up here, but what we can put on in terms of workload pods. Right, the nice thing is that's easier to measure because it's, I mean, use a tool like OpenCost or KubeCost. And because it lives in cluster, all the native cluster monitoring tools are available to you. So it's a little easier to keep track of that, but the unified picture of both Kube system and overhead is very interesting. And what tools you pick, but. And so in OpenCost, can you go in and say, all right, include the pods from these namespaces as what we include as kind of overhead? Or is that? Yeah, so we would call that like a shared cost. And you could share that cost among whatever aggregation you do. So you say, I want to share this among, share the Kube system namespace among all other namespaces. For example, because maybe this is a shared cost of doing business, which is kind of what I described earlier, other teams will just say the info team is responsible for Kube system comes out of their budget. That's, it's an org by org kind of decision to make and it's flexibility of reporting there is what's important. Thank you. Yeah, absolutely. Howdy, can you switch to the middle graph? Which one? The next one, I think. Yeah. Why is there a dip there in the green line? So I think it went from such a steep curve to such a shallow one. It's probably just a shortcoming of our curve fitting algorithm. So yes, there is no hidden 0% overhead instance type in there. I think it's just trying to like make a polynomial curve. Okay, okay. Yeah, I hit the fit line button in the library I chose to graph things with and that's the shape it made. Okay, cool. It looks nice. Yeah, cause there's a green dot on the blue line. Just above, yeah, anyway. Does Kube cost or open cost report the cost of Kube cost? Absolutely, yeah. And open costs will report the cost of open cost and Kube costs will report the cost of Kube cost. Cool, cool. Thank you. You're welcome. Any other questions? When you were showing families for EKS, I wasn't able to see all these. It's very dense. Yeah, I was wondering, cause for GKE you said the E2 family, the reason the overhead was so high is because the way they calculate CPU availability because it's burstable. And I was wondering for the burstable types for AWS, did you see something similar? Obviously the numbers weren't as high, but was that just an anomaly based on the way that GCP computes what's allocatable or reports that back to the Kube bit or is that true across all burstable types across cloud families? I think it's the way that GCP just happened to be passing that through, right? So the platform team is configuring the allocatable and capacity. If AWS's burstable instance types are some multiple, right? You can burst to some multiple above the sort of baseline number of cores. We would expect to see more outliers of 50% plus 75% overhead. We didn't observe that. So yeah, we again, we studied a few hundred instance types so we didn't exactly do a direct comparison between the AWS instance burstable instances and the GCP, but definitely great follow on work. Yeah, and based on the numbers, I mean we're seeing even here, my suspicion is that AWS simply reports capacity differently for those node types. And so it just doesn't show up as markedly because like, you know, GCP shows the sticker price to CPU four gigs of RAM. Obviously like, you know, the allocatable is one. Alex said, you're not getting ripped off, but like you might feel like you are. I suspect that that AWS is just reporting things differently at the OS or cubit level. So it doesn't show up here, but worth the follow on. You could say that GCP's way is arguably more accurate, right, because limits will let you oversubscribe, right? So you can oversubscribe above your allocatable. So, you know, you can, like through this machination, right, you can leverage that bursting if your workload bursts. But again, it shows up weird. And maybe AWS product team made a different decision. Thank you. All right, we'll call it there. Thank you for coming everybody.