 There we go. All right, welcome to today's CNCF webinar, reducing your Kubernetes cloud spend. I'm Libby Schultz. I'll be moderating today's webinar. We'd like to welcome our presenters today. Webb Brown, CEO at Kube Cost and Nico Kovachevic, CTO at Kube Cost. Hope I didn't just butcher that again. A few housekeeping items before we get started. During the webinar, you are not able to talk as an attendee. There's a Q&A box at the bottom of your screen. Please feel free to add your questions there and we'll get to as many as we can throughout and at the end. This is an official webinar of the CNCF and as such a subject to the CNCF code of conduct. Please do not add anything to the chatter questions that would be in violation of that code of conduct and basically please be respectful of all of your fellow participants and presenters. Please also note that the recording and slides will be posted later today to the CNCF webinar page at www.cncf.io slash webinars. With that, I will kick off today's presentation and hand it over to Webb and Nico. Excellent. Can you, can you hear me okay, Libby? Yep, I hear you great. Excellent. Well, thank you so much for the introduction and I just want to say welcome everyone. Thanks for joining us today. We're going to talk about one of our all time, you know, favorite subjects that's nearing dear to our hearts, which is helping you effectively manage or reduce spend when running workloads on Kubernetes. What we're going to do today is first we're going to present an overall general framework for how to think about different optimizations or opportunities to reduce spend. And then we're going to go into some very practical examples or war stories that we've, you know, picked up over the past several years while working in this area. So let me first start with just a little bit of background on us. So my name is Webb. I'm joined by my esteemed colleague Nico. We are both part of the founding team at KubeCost. We build cost monitoring and cost optimization solutions for teams running applications on the Kubernetes platform. We have more than a thousand teams using our product today across all major cloud providers as well as on-prem. And we're going to talk about, again, some of the lessons we've learned by working directly with hundreds of them during that experience. A lot of what we're going to talk about is aimed at cloud environments, but a fair amount of it does actually apply to, you know, on-prem environments as well. So why are we here? First and foremost, we very much believe that Kubernetes as a platform presents an amazing opportunity to deliver applications more cost-effectively. We strongly believe this and feel like we've seen this in many migrations and many production environments. But we also believe that there are certain things that can nudge teams towards a risk of kind of overspending or increasing spend if they don't focus on these areas. And so there's kind of three core reasons that why we believe this to be true. First is that when we see teams move or fully embrace Kubernetes, oftentimes their decision-making process, how they're actually deploying or updating applications is more decentralized. This is amazing in terms of development or developer empowerment, but it can actually be harder to monitor everybody effectively. Secondly is oftentimes we do see teams having a higher velocity when moving to Kubernetes. This is, again, a great thing, but it oftentimes leads to more just moving parts to monitor and more dynamic systems. And this is, again, due to, you know, faster release cycles, but also things like, you know, autoscaling, which are programmatically modifying releases as well. And then part three, you know, now developers are empowered with the ability to spin up, you know, all kind of resources whenever they need them. You know, this can be, you know, hundreds of GPUs in any region in the world from any major cloud provider today. And again, this is an amazing thing, but it also means that mistakes or kind of oversights can be more expensive when they're not called. So, you know, these are just three things that we want to keep in mind as we think about kind of immediate optimizations, but also just ongoing governance of running workloads in Kubernetes. So now that we kind of have the overall kind of problem framed, we want to present a high level like function or framework to think about making optimizations or like providing a solution to that challenge. And so anytime we're going to make an optimization, we're going to think about touching at least one of these variables. Specifically, we're going to be impacting the amount of time or resources provision, the quantity of that resource, or the price of that particular resource. And so if we look at each one of these, if we look at the amount of time something is provision. This can be the amount of time your cloud provider provider is actually billing you for that particular resource. And so you think about doing this effectively, something like cluster auto scaling would allow you to, you know, shorten that time or adjust that time to just the period where you actually need those resources. Part two of this is the quantity of the resources you're provisioning. So again, here if you think about kind of optimizing this at a high level, this can be like a right size equation we'll talk more about, which is getting just the right, say, amount of RAM or amount of storage for your particular needs based on your applications. And then part three is really where the kind of finance component comes into play, specifically around, you know, looking at the cost of each CPU or the cost of each GB of RAM, etc. So this is really kind of getting into the more financial optimization part of the equation. And then this particular function can be further abstracted away and think of it as kind of resource efficiency times the price of resources. We're going to talk more about efficiency in a second, but we think this is a really powerful framework for getting a really quick glance at how effectively you are, you know, provisioning and consuming the resources you're paying for. So before Nico does into that, I want to hit on a quote that we think is is really good. That really thinks like about the bigger picture here, and it's throw quote, which is the price of anything is the amount of life you exchange for it. Definitely meant to be a little tongue in cheek, but kind of two big points here. One is that, you know, we're not just talking about the price of cloud resources, your time is also really expensive as a, you know, an infrastructure engineer. So we really want to think about getting the like, biggest impact for like, you know, your time dedicated to it. And then secondly is this, these changes can also oftentimes be like hard to estimate in terms of the amount of time that it will take for you to optimize it that day and then also like manage it going forward. So this is a sense for difficulty estimates, knowing that there will always be contextual, you know, differences across different environments, but also that there are common themes and, you know, common patterns when we look at this across again hundreds or even a thousand environments where we've seen our product deployed. So with that, I want to turn over Niko, he's going to touch on an important super important part of a kind of a precursor to optimizing your infrastructure, specifically around measuring allocation efficiency, etc. Niko you want to take it away. Sounds good. Yeah, thanks web. Thanks for doing really quick and then we'll step through basically a concrete version of the framework that web just introduced and familiarize everyone a little bit with some of the metrics that are open source kube cost project scrapes and compute and provides teams with the ability to do some of this cost optimization. So, here, we're looking at a dashboard of aggregated metrics from our kube cost open source project. And you can see that things are aggregated by namespace here over the last day. I'd like to basically just unpack the little heuristic equation that web just presented time quantity of resources price per resource. And also touch on this efficiency metric so efficiency is something that we think is is one of the most important ways of thinking about this problem. So, looking at this dashboard, an example of how you might step through this is something like noticing that we've got 2.2% efficiency here. As you might guess we would consider that to be pretty low. So, we can sort of step through why that is, and get to the root of what metrics underlie this, how to think about them. And then that will lead us into later in the talk talking about how to resolve some of those issues or up that efficiency number reduced spend, etc. So, we're in the default namespace looking at memory and CPU cost. Basically, if we just drill down into this we'll see within this namespace over the last day. We've got containers running different pods. And this A, B and C, these three columns, and then the total cost correspond directly to the equation that web covered, which is time running. So here we've got 24 hours running for all of these over the last day, which means they've all been running the whole time. And then we've got CPUs, that's the specific resource we're looking at, but we could just as easily be looking at RAM, for instance, or. So this is how much is being allocated to each of these line items. Price per CPU hours of price per resource. So, you know, predictably this corresponds to the know that these are running on, but it's broken down to specifically how much you're paying for CPU for that workload. And then that yields your total cost. So basically these are, these are your levers. If you want to be paying less. You've either got to reduce your hours running, which may or may not be possible. You've got to reduce your, the amount of resources, the quantity of the resource, which may or may not be possible but commonly commonly this is. This is a big one, and then price per CPU hour, which is also a big one. But again, we'll step through that later. For now we can actually drill down one step further. So if we just look at basically a Grafana dashboard of the raw metrics for one of these. We're looking at basically a test deployment pod that isn't really doing much. So what we're seeing here is basically on the left you see CPU, CPU usage and request. And on the right you see memory usage and request. So this brings us to back to the idea of efficiency. And when we talk about allocation at Kube cost we're talking about the max of CPU usage and request, because what we're trying to do is with the metrics that we're emitting we're trying to share with our users, what you're actually being billed for. So when we look at these we can see basically from the CPU perspective, we've got a request but we have essentially zero usage, and from a memory perspective, we've got, you know, 30% usage. But CPU is actually the primary cost of this pod. If we go back to the top level, we will see 14 cents has been spent on CPU and only one cent on memory. So even though our memory has, you know, 30% usage. It's such a minuscule amount of what is being spent and CPU has zero. So we're hovering really close to zero where we're basically a 2% efficiency. So, this is basically the way that we think about our core metrics. And then this provides us the three levers to say, Okay, what should we what actions can we take now who can be alerted. How do we how do we basically start here and end up in a position where we're spending less. So, I think the plan right now is to pause briefly and take any questions from this before we move into specifics on how to fix some of these issues so let us know if you have any questions, please feel free to drop them in chat or Q&A at any point. This is such a critical part to lead into the next part of the discussion because it's, you know, the example Niko gave I think is great which is, you know, looking that at, you know, the default namespace, you know, 14 out of 15 of every dollar spent is spent on CPU. So it actually spends makes very little sense to look at like optimizing your memory or, you know, storage or network from that namespace. Again, just because you're able to see like the biggest cost drivers. And whenever you're, you know, allocating your, your, you know, like, you know, very costly time we want to steer towards those areas where you can have that that biggest impact. Right. So, you've got a handful of questions here, Niko. So, first one is, how did you fix the cost per CPU. Yeah, great. Yeah, you want to take that one. Sure. Yeah. Thanks for all the questions. Definitely some good ones. Basically, if we go back to this screen. The question is basically about this column. And the simple answer to the first question and the second question, I guess, is that we integrate with cloud provider billing APIs. And we're aware of which nodes, your workloads are running on. And thus we know how much that node costs and what the workload, which node the workload was running on. So we can come up with the price that you were paying at the time for that workload per resource. And I believe we also had a question pop up about on-prem environments. And in that case, we just allow users to input how much their hardware is costing them. You can input custom pricing and sort of override this or provide whatever makes sense for your situation for on-prem. Yeah, and there, Niko, maybe worth hiding. We actually have two different pipelines to how we support that. We have a really simple pipeline where you can just say, you know, cost per core, cost per GB of memory, etc. And we also have like a more advanced pipeline for teams that have a lot of heterogeneous assets and want to actually, you know, go through and have an individual like asset ID for each VM disk, etc. And you can actually again kind of tag those with each, you know, a cost for each individual machine. Right, right. All right. A couple more here. I think you've hit on the first four there. One next question is, you have a product or service business model or both. So, like, you know, everything we're showing is, you know, showing metrics from our open source project. We do have a business and enterprise product with a lot of like extra functionality available. And we think about, you know, SAML SSO, RBAC, multi cluster view, a lot of these things that are useful for teams in bigger environments. But we also work really closely with our users to help them like optimize their spend when they like on board to our to our product. So next question for you, Nico is, if the workload is distributed across multiple nodes, will you take out take the average. Right, so this is a great question. I think we get questions like this a lot. And part of part of our answer here is that I think we're looking at the problem from a slightly different perspective than this. So in a sense we are, yes, but really what we're doing is taking each instance of each running container separately. So we are instead of trying to average things and break them down. We think of it as aggregating. So, if we look at this example again, we've got our CPU hour and you'll notice that they actually aren't the same. Even though we're talking about CPU usage in the same namespace. Basically, what is probably happening here is that these are running on different nodes, and those nodes might have different prices associated with them. So, when we look at it from the top level, we are saying basically like, we've aggregated every, every running instance of every container and every pod in each of these namespaces that have their own individual situations, and individual pricing types to arrive at this price. So we don't, given that like an individual instance of a container can't be running across multiple nodes. There isn't any average necessary, if we're looking at the problem from that perspective. So, I hope that helps. Yeah, that's great. And one thing I would just add is, like, that's the exact model we implement, which is truly building from the container level up. Teams do have the ability, if they wanted to like override that pricing, they can always use, provide like custom pricing. Sometimes we see situations where teams want to create like an internal economy, which may not like perfectly reflect what their cloud provider is building. So we do have that capability. It's actually really similar pipeline to what Nico just mentioned for on-prem environments. So you've got a couple more questions here. All great questions. Thank you everyone. Next question is, is there a range of savings based on your experience with various customers? Which services have larger potential for savings on AWS as an example? Yeah, I may just say that we're going to actually get into this some more with like five really practical examples after this. We, it is not uncommon for teams that like haven't focused in this area to be able to reduce spend by 70 or 80%. That's, I would say that's like really common for teams that are able to like devote, you know, real kind of like engineering resources to doing this optimization. We can talk more about specific examples though. So question is, could you compare Kube cost to fin up efforts complimentary or somewhat overlapping. So we are part of the fin ups organization. We're actually like a founding, you know, vendor with the recent launch. We're huge fans. We think it's doing great things and fully support, you know, all the like openness that it's bringing from a training and certification perspective, as well as just general education perspective. So definitely complimentary and we're, you know, involved on things going on there. So, in case of AWS, are you considering only AWS Fargate pricing. I'm not sure if I fully, so we will basically just be reflecting the cost of the node where these workloads are being run. We also have, and we can share more resources on it, but if you look at this notion there's one column here called external cost in the view that Miko is showing. If you had say, let's say you're not running like, you know, EKS on Fargate, you're running just other workloads in Fargate, or you're running, you know, like RDS instances, you have, you know, like S3 storage buckets, etc. We would allow you to like allocate those costs back to the actual Kubernetes tenant. So you can get just kind of a centralized, you know, unified view. Again, whether that be Fargate or anything else outside of Kubernetes. Okay, so that's lots of great questions from Q&A. I see I also have some here in chat. I'll, maybe we can split these up, Niko. I can take the first one really quickly because there's another one that we're going to touch on in a second, Sacha Ness. I can imagine how this would help, downsize in the cluster, etc., but real workloads are firstly, so isn't this headroom for quality of service. It's super, super relevant. This is where not only like quality of service comes into play, but also the like nature of your workloads, specifically around like usage patterns. I think you kind of point out that these workloads are very stable, like resource requirements. This can be true in production environments, but is typically less true. So, you know, again, this should absolutely factor into your decision making process when going through right sizing, and this can impact if you're doing dynamic right sizing using like a cluster autoscaler, or if you're doing more static, which we'll talk a little bit more about later in the presentation. The question is, does KubeCrust run as a separate component in the same cluster, or can you run it outside the cluster? Niko, you want to pick now? Sure, yeah, today it runs in your cluster. So we've found from the teams that we work with that it's actually been really valuable to be able to run it entirely from within a cluster, within your own cluster, because there's a whole slew of issues with like egressing data, data privacy, etc., that teams can sort of run this product and they don't have to, they get all the metrics and they don't have to worry about privacy concerns. Yeah, there's, we do have some people who have wanted to run this as like a SaaS solution, but for now, yeah, it runs in your cluster right alongside things. And so that presents a number of like really interesting behaviors as well as like Niko showed you with Grafana, these metrics are written directly to like a local Prometheus instance. So you can do a bunch of cool things like create custom dashboards in the cluster, you can set up alert manager for custom alerts from these Prometheus metrics, all like Niko said, while owning and controlling your data and not having to egress any of this. Right. But you can alternatively and we have a number of teams do this that take these metrics and send them to some external like BI tool or some like hosted solution, like say a data dog where they like monitor, you know, other infrastructure metrics. Right. So a couple others and then we'll go through these really quickly and we'll have time for questions at the end so. So does the does the price take into account savings plans. Absolutely, we'll talk more about this it would take into account if you're running workloads on spot or preemptible nodes. If you're using our eyes savings plans, etc. All that would be reflected and just like Niko mentioned part three of that equation which is like the cost of those resources. So what should be the best solution you recommend. I'm not sure. custom if you're if you have any more information you can share there I'm not sure fully understand that one but happy to come back to it. So, running outside to use it for multiple clusters. So, here yeah like, you know, our, not to get too distracted but like our enterprise product would use like a Thanos or Cortex where you can have all of this like outside of your clusters. And then you can have just a totally like unified view of all of your different cluster environments. That sounds a great questions. I haven't gone back to the Q&A but I'll circle back to that towards the end of our presentation, but I will jump back and let Niko share some of these very practical examples of implementing like optimizations now that we have this kind of new found visibility or like, you know framework for cost allocation. Cool. Yeah, thanks web. So, basically, we're just going to step through five of these, but there are, there are many more that, you know, if we had all day we could keep going but basically these are some of the top five. These are the anti patterns for overspending that we see teams routinely either not know how to solve or not be even aware that there's a problem until they start basically analyzing some of these metrics and then it becomes painfully obvious where the problems are and well so for each of these give our take on like from a general perspective how we think it's best to solve them. So the first one is orphaned resources. We would categorize this as fixing pulling lever one you can think of in this equation which is time running. And this is actually a pretty easy one. Once you see the problem, which is just that there are often resources. in your infrastructure that aren't doing anything. They don't have an owner. They're just sort of sitting there and you're paying for them. So, basically, this could be, you can think of this as IP is persistent volumes is probably the biggest one. We had teams installer product and quickly figure out that they had, you know, 10s of thousands of dollars of discs that were just sitting idle without owners load balancers are a big one. I think, I think load balancers and IP is it's really easy to think like, oh yeah I'll just like expose this and then, you know, maybe the project gets handed off to a different team. Or during tear down you, you know, eliminate the deployment but you forget to eliminate the load balancers and over the course of months or years that piles up and you can you can basically just find a treasure trove of things that you can just eliminate and stop spending money on so we consider this to be pretty easy in the grand scheme of things. The impact certainly can be high. It's probably not quite as high as some of the ones that we're going to get into. But we consider it easy because by definition these things are just not being used so normally it's a it's a straightforward solution. Speaking of solutions. Basically having a mechanism to detect when orphaned resources cross a certain threshold is is the solution and then having basically some some notification system where within your organization there's a hierarchy of ownership where it's like, hey if this thing is sitting in a namespace, and it's not being used and it's exceeding a certain amount, someone gets alerted, and then they can come come loop around. So, yeah, that's, it's a pretty straightforward solution, but the solution really is just delete the resource and stop paying for it. And how you implement that to some extent is up to you but it generally revolves around having this mechanism of identifying an owner and then communicating to that owner. This is, you're spending money on this, you should probably check it out, and probably delete it. All right, next one is abandoned workloads. So, as you see on the slide, workloads that do not provide real business value. So we'll talk about heuristics for this, how we think about this and you know some of the teams that use our product how they think about it and what we think should be done about it but again this is sort of like a category one thing, which is time running. You've got a workload running on your infrastructure that's chewing up resources, but so it's maybe it maybe is doing something. It's, you know, we're not saying that usage is zero, but usage could be through the roof and if it's not providing a real solution. Then it for all intents and purposes might as well not be doing anything. So sort of like one step more complex than than orphaned resources. So what you're seeing here is a dashboard that was built on top of our some of our open source metrics. So this gets back also to the Thoreau quote from earlier, which is like your, your time is also something we're trying to help you optimize. Just a quick glance here at this dashboard, you'll notice that we have basically one workload that is causing, you know, whatever that would be 9092 93% of this. Overspend this $107 overspend is just in one workload. So something that we're really trying to help teams with and that I think the teams using our, our products and our metrics have been successful with is just finding that that low hanging fruit that's a big win. And for not a lot of your time, you know, like this, this last one in the list 25 cents a month. Maybe, maybe you'd be fine if you if you let that one run but 100 bucks a month you'll want to take care of so and and to talk briefly about the heuristics here we can also field questions on this later people are interested but the way that we recommend teams measure whether or not something is abandoned is with network traffic. So if a pod is chugging away, chewing up resources, maybe it even has a request that's higher than the resources that it's, it's actually using, maybe not. But if it's not egressing any data anywhere else. We use that as a heuristic for saying like, is this thing really being used, you know, it might be computing things but if it's not sending that result anywhere, then it's at least a flag of like, you might want to revisit this and again, as we move into solutions for abandoned workloads. This would be basically like a medium difficulty because it's tougher to know. It's not as easy as this thing doesn't have an owner at all. We're talking now more like this looks a little fishy. But if, if we contact the owner of this, they should be able to justify it and often they don't even realize that it's still running so as you see listed here common examples, deprecated deployments. Maybe this is something where like, responsibility for it, you know, shifted from from one team to another team and the new team didn't wasn't even aware that it was still running and the original team didn't tear it down. Dev environments is a huge one so this is one where you know we have we have some other open source projects related to like cluster turn down that could address this but essentially like, your dev environment on nights and weekends is sitting there with a request. If you don't have it turned down and you're spending that and it's really not doing anything. So again, the general theme here lack of awareness organizational changes, things like this, where things fall through the cracks, but we can see like huge impacts from from abandoned resources. Again, the solution is basically to it's very similar to orphan resources, set up some sort of alerting rule dashboard where there's a point of communication who is an owner for, you know, let's say like a common one is people will have owners by name space. And then we can go in and say, Okay, like, here are all the abandoned resources in this name space. It crosses a threshold of how much we're comfortable, essentially wasting on things that aren't doing anything. So send that owner an alert saying, Hey, come check in on your name space. I think you might have some things to clean up. Awesome. Thanks so much. Yeah, I'll pass it back over to web for the last three. Yeah, I'll, I'll take it from here. So those are two of the five. You know, number three is is kind of like a catch all. We've seen a lot of like war stories, or like unfortunate circumstances here. We say that like, these are workloads that are behaving in unexpected ways. You know, common examples with these would be like an application bug. So we've seen, you know, actually a pretty recent story of, you know, essentially like an infinite loop that like auto scales resources that cost, you know, tens of thousands of dollars. And we also have had a user that had like a Bitcoin miner installed in their Kubernetes cluster, and that plus auto scaling led to just a huge burst in like resource consumption. And these are kind of those like long tail of unexpected events that when they happen can be even like in a relatively short amount of time, fairly costly. So there's the problem of kind of addressing that particular event. And then there's the also the problem of kind of like having monitoring or governance in place to where you kind of minimize those events happen. And when these are kind of, you know, present, oftentimes they're meaningful, I think there's a little bit of like selection bias here on our part. But like when teams present them to us, they're oftentimes like, you know, a real part of their their spin. This definitely crosses into the like medium, if not like, you know, medium hard category, just because there is a really long tail of things to monitor for. You know, part of the solutions that we've seen are like really just monitoring for kind of unexpected changes in spend or like spend anomalies. A common pattern would be like looking at say the moving, you know, seven day average for like the cost of a namespace or cost of a cluster, etc. And then having a mechanism to, you know, notify team members and like being able to take action quickly. One of the beautiful things about Kubernetes, which is a real change here is that you in Kubernetes metrics, you can truly have like real time cost monitoring alerting and not have to wait until you get a bill from your cloud provider, which may be like, you know, days or, you know, many hours later. Kubernetes metrics, whether it's Prometheus metrics directly integrated with like alert manager or another solution can get you this visibility in real time or near real time. So jump the slide there. So, number two is starting to get into the third input in that equation, which is, you know, managing the price of resources. And this is, you know, touching on usage type. And specifically when we talk about usage type, we talk about selecting across, you know, on demand versus like spot or preemptible versus making reservations, whether it's committed use discounts, reserved instances or savings plans. And this is really about kind of going above and beyond just using like basic, you know, on demand instance types. This, this can be, you know, hard just because it really involves an effort of kind of predicting the future or, you know, forecasting the future. And sometimes, you know, finance like we'll get involved if you're as part of a bigger organization. So just kind of, you know, managing that across teams can be difficulty, difficult, or again, kind of accurately predicting the future. But it oftentimes can yield really big benefits for teams that do have some predictability in terms of like their, their spend going forward. And this is kind of like a high level visual we want to present because we think it's a really powerful framework. And this is, you know, looking at using on demand versus reserved. And this is with a cluster auto scale or helping you kind of dynamically adjust for different kind of workload demands or usage patterns in your product. What you generally see here is that as you have more and more predictability into that kind of base level load that you know will always be there, whether this is from a compute and memory standpoint or GPU standpoint or something else, like data storage, etc. You have high confidence that you will again maintain that level of base load for at least 12 to 24 months, you can start layering on more of these reservations, and again have major savings. And then that coupled with like auto scaling on demand nodes can be a super powerful framework and again you know yield 70 plus percent savings. And then a very similar framework. When looking at spot or preemptible usage would again to be stacking on those reservations of you as you have like more and more predictability and baseline load, and then letting spot availability scale, you know, naturally given those kind of market places. This is a really big one and you know, kind of considered hard just because it requires architecting your Kubernetes workloads in a way that they are resilient to like you know node termination or like regular node failure. So that's definitely one thing and that can be touching on, you know, like managing replicas and you know pod disruption budgets, etc. But for again for teams where that is a potential fit. These can yield, you know, huge benefits. And then our last example is, you know, one that we had a question about, which is, you know, it is really easy to come to Kubernetes and say there's just so much complexity. I'm just going to start by over provisioning resources. And so that way will minimize the risk of, you know, downtime performance, you know, bottlenecks, etc. And this is makes, you know, total sense and we actually recommend for teams that are brand new to to Kubernetes to just follow that pattern right. It's only when, you know, you really reach production at scale to where these dollars can become meaningful enough to where it makes sense to kind of make these investments and really go through kind of this right sizing exercise. And this is not an uncommon scenario where when we first start working with teams, they regularly will have up to, you know, 80 or sometimes even 90% idle capacity or slack capacity. And just like we mentioned, this is oftentimes just making sure that, you know, they have tons of ample headroom for for burst. But teams that go through the exercise to kind of measure that an actual like measure, you know, peak usage or say P 99 usage, they can oftentimes reduce this in a major way without looking at some sort of auto scaling. So just doing that statically. So that's part one. And then part two is really taking into, you know, that quality of service, kind of SLA from this cluster can have a big impact as well. So oftentimes we see teams take kind of a uniform strategy with over provisioning, whether it's a dev cluster or staging cluster, a prod environment or like a critical or ha environment, when in reality you can actually apply that context. And oftentimes, you know, be say much more comfortable with, you know, running at lower compute and risking kind of CPU throttling in a dev environment, because the impact of that may be relatively low, you know, given your particular, you know, circumstances. So, you know, when taking that account, we've seen teams have, you know, major, major wins just by going through and doing this exercise statically, and taking that profile into account. And maybe, you know, programmatically, once a day or even once a week, making that adjustment and doing like a bin packing exercise where they see where their workloads will be kind of best fit, given the like instance types that are available from their cloud provider. And then the other part of this is, if your environment is a good fit for auto scaling, auto scaling can be hugely valuable. And again, you want to think about kind of the SLA of those workloads and the architecture of your workloads and if it would support, you know, nodes coming up and going down efficiently. You know, one big caveat with auto scaling is it can be hugely valuable. But actually, you know, there's a fair amount of complexity, and there's a lot of kind of rules that can impact the performance of auto scaling, whether it's pods using local storage or not having disruption budgets, etc. Having a tool to like manage those rules effectively can also make a big impact in terms of auto scaling, you know, allowing you to recognize the benefits that are possible from it. And again, this is something that we regularly see teams, you know, reduce Kubernetes spend by 50 plus percent. We do consider it like, you know, medium, if not hard just because it can actually create risk if you're not careful for when you do have like a bursty workload that you would not have, you know, extra headroom to support that. So definitely, you know, anytime you're going through this, we recommend not thinking about kind of the median case, although the median case is helpful in this scenario to get a high level understanding. But once you start moving into optimizations really thinking about, you know, closer to peak usage and what the impact of a right sizing exercise would be on peak utilization. So we have just seen time and time again that, you know, avoiding these patterns can reduce spend by, you know, 80% plus. When done right, we think you can do it with, you know, without creating kind of any performance or reliability concerns. Oftentimes it's just a useful like cleanup exercise as well because you get rid of, you know, like Nico mentioned, you know, abandoned workloads that may just be consuming resources that may create security risk that aren't actually providing any value. But again, we want you to think about when you're pursuing these to try to focus on, you know, the biggest like bang for your buck, given how valuable your time is, and starting with that allocation piece so you know where the biggest opportunities for, you know, spend reduction are. So, we love talking about this stuff. Reach out to us at any time team at Google cost calm, if we can help. And then we've got a little bit of time for, for any questions that we have. So I see we've got a couple more here. Yeah, y'all take it away. This is awesome, but we have about seven minutes so I will leave it open for all of you. Fire away in the Q&A box and we'll get to as many as we have time for. Awesome. So, question here is, does VMware or Oracle have a similar cost optimization tool. I'm actually less familiar with the like offerings of either but I do know that VMware has the cloud health product, which does provide cost optimization solutions. And the next question is for staple applications. Workload around. It sounds like about 1000 customers comes at one time. So then how many pods and how much cost is required to support that. Do you want to, you want to take that one or you want me to. Yeah, I'm not really sure what the question is, I guess. Yeah, let us know if there's more context you can provide. I do think that Nigos exercise they went through for measuring efficiency and going through like pod right sizing can be super valuable. And I would say like that combined with something like HPA can be great if that like stateful set is or staple application is architected in a way to support that. Yeah, I would say just in general, like, comparing your usage and request is a good exercise and doing that you will see either that your usage is for a long period of time well below your request. Or perhaps that it's actually too high. We see that sometimes in which case you're risking, you know, eviction, things like that. So just using those two metrics from from the Grafana dashboard earlier in the presentation. And then there are other ways of like running statistical analysis for like, you know, P 99 P 85 depending on how, you know, is this like something that you don't mind if it gets killed is this high availability you never want it to be killed to give you know, heuristics for for what sort of like overhead you want to maintain but to some extent it's kind of up to you. Yeah, yeah, no that's great. So next question kind of, you know, it looks like two part here. So in the case of applications are running at say 60 to 80% idle. Is it better to use a serverless solution. You know, like a Kubernetes native kubelis. And do we recommend that another example provided here is open vision, I'm less familiar with with open vision there. But I think like this kind of get to that like broader picture, whereas cost is kind of like one part of this decision it's also kind of similar to like, you know, should you migrate cloud providers for cost. We think it's a very like relevant input to this equation, but oftentimes like performance availability and like functionality or like very core parts of this as well. But I will say that like we do see serverless as a like useful tool for managing costs for sure. What we've seen is that you know we regularly work with teams that have like, you know, medium or either very high, like complexity applications. Most of the time like it's hard to move all of their workloads to serverless but even if they can move some component that can be, you know, super, super meaningful. Okay. Looks like we've got a question about the presentation link I would check out the chat. I believe there's an answer there about that. Yep. What is a generally allowable percentage of total capacity that has been observed empirically I assume been packing is not optimal all the time. It's slightly from team to team, but so in in some parts of our application we will use a notion of like profile, which is to say, it depends on the priority of what it is that you are running. So we would say probably as high as like somewhere between 75 and 90% if you don't mind getting evicted so if you're on a if this is like a dev thing and it's like, oh you know like 30 seconds of downtime here there is okay, but we really have to squish down the cost. I would say definitely a more generous overhead for a high availability. I, my gut says like 60, like 40% overhead something like that. Web might have a different answer here but it sort of depends on your situation. Yeah, I think it's surprisingly common where we see teams land at like, you know, 35 to 40% overhead. And I think it's a function of just what Nico mentioned which is again like, you know, quality of service. But there's I think two other things that come into play here. One is like variability of like, you know, resource requirements. And Nico showed you things were super stable. So if you have just a bunch of long running batch shops, you may be able to get to like 90 plus percent utilization, because you know, resource utilization is really stable. And then the second part of that is if you also have like high, you know, predictability in terms of resource utilization, you know, looking forward. I've definitely seen scenarios where teams are in the 90s. But I do think it is, is like, you know, not the norm at this point. And then so we've just got two more minutes here we've got two questions here at the end. So one is recommended books or resources. There's definitely a book called Cloud Phenops, I think is really, really good. It is authored by the one of the main authors is one of the creators of the Phenops Foundation, J. R. Stormant, definitely one that I recommend. It kind of paints a holistic picture of just, you know, managing spend in cloud environments. And the last question here is, I think it's good approach to invest in auto scaling with ML techniques, especially in VPA cases. Absolutely think it can yield benefits, but we definitely recommend starting with simpler solutions, just for like introspection purposes, and just like understanding why things are behaving the way they are. We especially recommend that if it's in like a production, you know, critical environment, but then, you know, really when you're looking at like fine tuning this, you know, down to the last dollar. We have seen scenarios where, you know, ML can be can be very useful in doing that. But oftentimes we when we first start working with teams, there's like bigger wins with just kind of, you know, just investing the first time to like do, you know, right sizing do kind of, you know, all of the like, you know, exercise that we went through here today. Yeah, I think it's fair to say that we could even be stronger on that point and say something like, we would not recommend it if you haven't gone through the path of like understanding what's going on, or else you risk sort of creating a second layer of misunderstanding of this of a similar nature as what we're trying to solve, trying to help team solve in the first place. Yeah, definitely. And again, that can, you know, impact not just costs, but reliability or uptime as well as, you know, just general performance with, you know, CPUs rallying, for example. All right, excellent. Well, we're out of time but want to thank everybody again for all the awesome questions and for joining us today. I really appreciate Libby and the team at CNCF for making it happen. Of course, thank you both so much for a great presentation and we look forward to seeing everyone again soon. Check back on the website later today and we will have all of this good stuff loaded and ready for your enjoyment. Talk to you all later. Thank you.