 who's joining us today. Welcome to today's CNCF webinar, Kubernetes cluster performance, resource management, and cost impact. I'm George Castro, community manager at VMware and a cloud native ambassador. I'll be moderating today's webinar. We would like to welcome our presenters today, Elijah Ollian Kule, platform engineer at Replex and Hashan Haider, developer marketing at Replex. A few housekeeping items before we get started. During the webinar, you're not able to talk as an attendee. There's a Q&A box at the bottom of your screen. Please feel free to drop in your questions in there and we'll get to as many as we can at the end. This is an official webinar of the CNCF and as such is subject to the CNCF Code of Conduct. Please do not add anything to the chat or questions that would be in violation of that Code of Conduct. Basically, please be respectful of all your fellow participants and presenters. With that, I'll hand it over to Elijah and Hashan to kick off today's presentation. Hello, everyone. Thanks a lot for joining us and welcome to the webinar. The topic that we are going to talk about today is Kubernetes cluster performance, resource management and cost impact. My name is Hashan and I'm the developer advocate at Replex. I'll be going through some of the slides, just kind of putting the entire topic into context and explaining the context of cluster performance. You know what we mean when we say cluster performance. And then kind of identifying the metrics that are important in the context of cluster performance. And then we also have Elijah with us on the call, who is a platform engineer at Replex. And he'll be digging into a lot of the technical stuff, digging into Prometheus, the queries that I use to monitor cluster performance metrics and also giving us a walkthrough of a pre-built Grafana dashboard to monitor cluster performance. He'll also be giving us an overview of the entire metrics flow from Kubernetes to Prometheus to Grafana and everything in between. All right. So just a couple of words about Replex before we get started. So Replex is a governance and cost management platform, a purpose built for modern cloud-native infrastructure. So I mean, essentially it gives finance and cloud managers a comprehensive look, a visibility into total spend across infrastructure and along with granular insights into the cost of individual applications and teams. You know, at the same time, it empowers developers and operators to write size resources for optimal spend without sacrificing performance. All right. Cool. So yeah, let's get started. You know, just to give some context to what we're going to be talking about here today. So we have a couple of statistics in terms of performance and utilization that should nicely frame the topic. So one is that 40% of instances are one to two sizes bigger than needed for their workloads. So when we say that these instances are too big for their workloads, we're essentially talking about the fact that, you know, these instances are not being utilized efficiently. I mean, the utilization is low. So essentially low utilization translates into resource wastage, which in turn means wasted spend. As you can see in the second statistic that, you know, we have 50 to 75% of the money being spent on these instances is being wasted. So yeah, right now I guess you're probably thinking, you know, this is a webinar about Kubernetes and containers. So, you know, why are we talking about cloud-provider instances? So why should we be worried about instances and, you know, workload sizes and utilization? So in fact, I mean, everyone can agree that Kubernetes, you know, kind of makes it easier for DevOps to do their jobs. You know, it essentially abstracts a lot of the heavy lifting that DevOps teams usually do and makes it easier to provision and manage infrastructure, you know, as well as making it easier to manage containerized workloads. But, you know, what this essentially does is it leads to a lot more complexity under the hood and it also leads to a lot more complexity on the operational sides of things. So when we talk about, you know, kind of complexity and, you know, the complexity that Kubernetes introduces on the operational sides of things. So, I mean, in the context of performance and costs, not only do we have to kind of consider the extent to which the underlying infrastructure or the cloud instances that Kubernetes is essentially running on, you know, we don't only have to look at the utilization of that infrastructure, but we also have to think about the utilization of, you know, pods or workloads and how well they are performing in terms of utilization and performance. So one reason for this is that Kubernetes introduces the concept of resource requests and limits. Resource requests are essentially the amount of resources that are reserved for a container. And, you know, once these resources are reserved, they cannot be used by any of the containers or pods. Resource limits on the other hand are the maximum amount of resources that can be consumed by a container. So, I mean, this essentially means that once you allocate resources to those containers, they are reserved for that container. And what we have to be careful about is whether those containers are using those allocated resources efficiently or not because, you know, if they're not, essentially those resources are being wasted, you know, and they're adding costs to the bottom line of the business without actually being used. All right, so now that we've looked at, you know, why it is important to monitor the utilization of Kubernetes clusters, we'll take a look at a couple of other numbers, you know, in the context of performance and costs. So $14 billion is essentially being wasted yearly in public cloud spend on idle or unused resources, you know, or provision instances. $5.3 billion of this is wasted on oversized resources alone. So, I mean, we see this as a huge problem where, I mean, most of these workloads and, you know, most of the infrastructure that is provisioned to run these workloads are not being utilized efficiently, which is, you know, essentially driving up the costs without really adding anything to the bottom line. So, yeah, I mean, in today's webinar, what we want to discuss is cluster performance and we're going to frame this concept of cluster performance in terms of, you know, three distinct aspects. The first one is cluster utilization. For cluster utilization, we will explore kind of how efficiently the underlying infrastructure is being used, you know, what is the utilization of the individual cloud instances, as well as the cluster as a whole. We will also look into, you know, the concepts of under provisioning and all provisioning and how, you know, we could essentially lower cloud costs by targeting a higher utilization. Then we look into workload efficiency. So, by workloads, you know, we mean containers and ports and, you know, as we discussed earlier, Kubernetes kind of introduces this concept of resource requests and limits. So, we'll be looking at the efficiency of containers and ports. And whether they're being, you know, whether they're using the requested resources efficiently, you know, and if they're not, you know, what is, what can be done to ensure that they're being used efficiently, you know, which can potentially give us a better utilization. The third aspect that we want to talk about would be ideal resources. And I mean, this would be more in the context of developer environments. So, developer environments are often, you know, only needed in the weekdays. They do not need to run, you know, over the weekend maybe, or they do not need to run during off hours. So, kind of looking at, you know, how we could reduce the number of these ideal resources, you know, so that they do not hit the bottom line. And then, you know, just kind of a quick overview of these three concepts, and then we move on to a technical deep dive, identify the metrics, you know, that are important in terms of cluster performance, you know, how we can monitor them using open source tools like Prometheus and Grafana. And yeah, I mean, kind of a walkthrough through a pre-built Grafana dashboard. So, yeah, cool. Let's jump now to the cluster utilization aspect of cluster performance. So, I mean, let's take the example of a Kubernetes cluster with a welcome node, you know, like an M5 Xlarge AWS instance. So, we've got four pods running on this instance, and every pod is requesting one CPU. The limits are two CPUs. So, I mean, essentially, what Kubernetes is going to do in this case is it's going to set aside, you know, it's going to guarantee one CPU for each pod. And this cannot, I mean, this one CPU cannot be then once it's reserved, it cannot be used by any other pod or container. And I mean, so we also have limits of two CPUs. So, if a container, you know, starts to exceed the limits, it could be throttled by Kubernetes, or it could move it to a new node, you know, which has, which has enough resources to fulfill, you know, the requirements of this pod. So, yeah, I mean, what happens if our application starts to exceed the maximum CPU allocation? So, Kubernetes will essentially move this pod to another node. And now, I mean, the new node has only has pod for running on top of it. And I mean, it's essentially utilizing only 25% of the CPU resources that are available on this node. And so, I mean, if you see the utilization of node one has also gone down since it's no longer supporting pod four now. And so, I mean, if you look at this, I mean, essentially, it's telling us that the size of the nodes is as much bigger than what is, you know, kind of required for the pods that are running on top of it. So, I mean, in this case, what we could do is kind of resize this node and replace it with one which has a much smaller resource footprint. So, for example, you could replace this with a T2 medium AWS instance, which has two CPUs and 8GB of memory. And I mean, essentially, resizing node two will give us a much better utilization of 50%. And then, I mean, the smaller AWS T2 instance, it also costs less. So, I mean, this essentially means that, you know, we are reducing resource wastage, we are improving utilization, and we are cutting down costs. All right. That was cluster utilization. Next, we'll kind of look at a mix of cluster and workload utilization. So, I mean, yeah, we again, we take the example of a node which has six pods running on top of it. And each of these nodes is requesting one CPU and they have a limit of two CPUs. But the real resource usage or consumption of this cost is, you know, 0.2 CPU. So, essentially, the pods are only utilizing 20% of the allocated CPU resources. And I mean, as we mentioned, these resources are reserved by Kubernetes. They cannot be used by any other pods and are there for being wasted and kind of adding to the cost. And I mean, what we want to bring forth in this is, I mean, essentially, this is not very efficient because looking at the actual resource usage of the pods, we could have a much lower resource footprint for these pods. And I mean, if we do that, we reduce the requested resources to 2.3, then we get a much better port utilization of 75%. And doing this will also kind of free up those resources, you know, to be used by other pods. But what does this mean for the node? I mean, we do get a much better CPU utilization for the pods. But the node still has the same CPU footprint. And so it also makes sense, you know, in this case, to look at node utilization. And as you can see, I mean, we only get a node utilization of 15%. So I mean, essentially, the reason for this is that the actual usage of our pods is still, you know, the same, which is 0.2 CPU and whereas, you know, our node still has the larger resource footprint. So then, I mean, we could also reduce the size of the node, you know, and if we can use, resize the node to 2 CPUs, you know, something like a T2 medium AWS instance, so we get also a much better or a much higher CPU utilization of 60% for the node. All right. So now, I mean, putting all of this in context and kind of thinking of this in terms of public cloud computing instances. So, you know, I mean, since more school needs workloads are running on public cloud providers, you know, looking at the size of the cluster, you know, and the instances and the utilization and kind of making informed decisions to right size clusters based on actual utilization has a huge implications for the Kubernetes costs. So I mean, as you can see, we've got three instances, you know, that they have different resource footprints, they also have different price tags. So essentially matching, you know, the size and usage of workloads with the actual footprint of our instances, and then, you know, kind of choosing the one which gives us a much better utilization, you know, is a great way to kind of reduce wastage and true utilization, and also reduce public cloud provider spend. But yeah, I mean, as you know, I guess in real life, it's not always as easy as that, you know, actual production clusters run multiple instances, you know, and organizations, you know, have multiple clusters. So I mean, essentially, what makes sense in that case is to kind of look at this in terms of overall cluster utilization, you know, and then taking proactive steps to reduce, you know, or maybe increase the size of the cluster based on actual utilization and actual resource consumption, you know, and then making kind of informed decisions based on that real time data that is coming in. Moving on to the next aspect of cluster performance, you know, where we talk about development environments. And I mean, to kind of put this in context, you know, development environments account for 44% of compute spend. And I mean, essentially, these development environments, they do not need to run 24 seven, you know, they're not production workloads. Most of them do not need to run on the weekends. You know, they do not need to be up and running, depending off hours. So I mean, in this section, you're going to look at some native Kubernetes abstractions that we can use to proactively monitor the resource consumption of development environments, you know, and also to control it, and to ensure that these workloads are not up and running 24 seven, you know, we can shut them off during off hours. And in the process, we see reduce resource wastage and spend. So the first thing to do in this regard is to kind of isolate development environments. And Kubernetes has this really nice abstraction to do that. So Kubernetes has this concept called namespaces, which we can use to isolate all the resources, you know, that are spun up as part of the development environment. So we can create a namespace, a monolithic namespace for the entire development environment. Or, you know, depending on the scope of development activities, you know, we can also isolate developer teams. So if you've got multiple teams, we can create a separate namespace for each of those individual development teams, you know, kind of isolate the resources that are used by those teams using namespaces. And I mean, since we are isolating the resources, we can then control the consumption of the namespace, the resource consumption of the namespace, using other native Kubernetes tools. And in the process, also control the resource consumption of the developer environments or the developer teams as well. So yeah, we'll take a look at some of these abstractions in the next couple of slides. So once you've isolated the resources of the development teams by creating separate namespaces, what we can then do is to kind of create, you know, default CPU requests and limits. So I mean, we can define these as part of a limit range object. And once we have this limit range object created, you know, Kubernetes will ensure, you know, that if any developer spins up a portal container in that namespace without defining the requests and limits, so it will automatically attach those default values to those containers. We can do the same for memory requests and limits, you know, as part of a limit range object. And, you know, Kubernetes will ensure that, you know, these, the default values that are defined in the limit range object are automatically applied to any containers that do not have these limits or requests defined. So with the limit range object, I mean, we can also control the minimum and maximum CPU constraints for individual containers. So we define a limit range object for the developer namespace, you know, we define the maximum or the higher limit for the amount of CPU that can be allocated to individual containers. And then we also define the minimum amount of CPU that can be allocated to those containers. And yeah, I mean, once we create this limit range object, you know, Kubernetes is going to ensure that, you know, if any container that is spun up, you know, the resource requests and limits of that container, you know, is, it falls within the specific range that we define. So yeah, I mean, essentially we are controlling the resource consumption of individual containers. And then I mean, since we've isolated these resources, and we're talking about the development environment, so yeah, I mean, essentially we are controlling the resource consumption of the entire development environment. Same goes for memory constraints, so we can define minimum memory, minimum and maximum memory constraints for individual containers and define a range. And yeah, I mean, Kubernetes will ensure that, you know, they fall within the defined range. So another native Kubernetes object that we can use to control resource consumption is the resource quota. And with the resource quota, I mean, we can control the total resource consumption of the developer namespace. So I mean, we can define the upper limit on the amount of resources, on the total amount of resources that can be consumed by all the pods that are running in the developer namespace. And we can also control the total amount of, you know, resource requests of all the pods in the developer namespace. And I mean, this will work for CPU resources, for memory resources. And yeah, I mean, we can also kind of control these for ephemeral storage. And as part of the resource quota object, I mean, we can also control the number of pods that are allowed to run in the developer namespace. And I mean, in the same way, we can control the number of services, we can control the number of persistent volumes, load balancers, not pods and, you know, replication controllers. And I mean, you know, essentially, since most of these objects, you know, are consuming CPU and memory resources from public cloud-provider instances, or, you know, there may be abstractions on public cloud-provider services. So for example, not pods or load balancers, which are essentially, I mean, abstractions on the public cloud-provider services. So, I mean, they're essentially adding costs on the public cloud-provider side. So, I mean, since Kubernetes allows us to, you know, control the number of these objects that can run in the developer namespace. So, I mean, essentially, it's allowing us to control the number of these objects, and, you know, which essentially translates to controlling the public cloud-span that the developer environment, you know, is allowed to consume. And yeah, I mean, since we're also isolating developer resources by, you know, either creating separate namespaces or, you know, creating separate namespaces for individual developer teams. So we are isolating the resources and, you know, we can kind of easily monitor the resource consumption and costs of these environments and teams. We can quickly spin them down, based on need. So, I mean, let's say on the weekends or, you know, in off hours, we don't need those resources. So we can quickly spin them down, you know, since they've already been isolated. All right. Yeah, cool. So now I'll ask Elijah to take off from here, and we'll be doing a technical deep dive. Elijah? Okay. Thank you, Asham. I'll be sharing my screen now. Okay. Hi, everyone. I'm Elijah, and now we are going to be examining the metrics the metrics that you can use to to evaluate the performance of your Kubernetes cluster. So first, we're going to be looking at we're going to be examining where the Kubernetes metrics come from. Then we are going to be running some premature queries to run some analysis on our cluster. And then we are going to be going through a graph and a dashboard that we are prepared for this presentation. So I have a Kubernetes cluster running now with three nodes and I have some services. So Kubernetes metrics, the metrics for your Kubernetes cluster they come from two resource sources. One is the node exporter that runs on every node in the cluster and gets low-level metrics about each node in the cluster, such as the available memory network devices and a lot of other information that are provided by the kernel running on each node. Then we are also going to be examining the container-specific metrics that I exposed with Cadviso. So first, we are going to see some of the metrics that I exposed by the node exporter. So I'm going to I'm going to port forward this to I'm going to forward the port for this service to my machine and then we're going to examine some of the metrics. Okay, so if I visit this now, so these are the metrics that I exposed by the node exporter running on running on on my Kubernetes cluster. We can see a lot of a lot of metrics that have to do it in node file system. So the metrics exposed by the node exporter are all prefixed with the node underscore. So when you're going through your parameters metrics and you see a metric with this prefix that is coming from the node exporter. So we can see a lot of metrics around about the total nodes in your file system. You can see metrics about IPv connections, packets, then here we come to some metrics that we that we would that we will find interesting. For example, we have memory, we have memory metrics here. This for example, the this represents the available memory on on this node. We can also see a lot of metrics around the free memory and total memory on the node. So there are lots, there are lots of other metrics here that have to do with network interfaces and so on. And these are metrics that you can, so parameters is configured to scrape this particular, this particular endpoint on the node exporter and all these metrics are going to be available to query of one. Next, we're going to look at some of the metrics that are exposed by the cubelet. So the node exporter exports metrics about the about the low level information for each node while the cubelet. So I have, I have three nodes running across that and I've forwarded one of the, a port on one of my nodes to my local port. So the cubelet runs on this port, 10 to 50 on each node. And these are, these are some of the metrics that are exposed by the cubelet. So this, this metrics, we send a lot of cubelet specific information. And then the C-advisor metrics are metrics that, that are specific to each. So we are going to set off really interesting container related metrics such as CPU usage, memory usage for each container. So the C-advisor is a component of the cubelet that gets, that is never getting this useful metrics to us. For example, we can see metrics around the container CPU load average over the last 10 seconds. We can also see metrics around the total time that, that a container was spent, spent executing system applications. We, so this is, this is one of the main, this is the main metric that is, that is, that is exported by the C-advisor, the container CPU usage seconds. And with this, we can, we can get, we can get the metrics that are, we can get the total CPU seconds that are used by each container in our metric. We can see some of the levels for each, for this metric such as the namespace, the port that the, that the container is running as a part of. We can see the container name, the, and then there's also the, the instance that the, that the, that the container is running on. So this, so these metrics are exposed by the cubelets running on each node. And then our parameters cluster, our parameters in installation has been configured to scrape. So to scrape the cubelets that we have, as well as the C-advisor endpoints and also the node exporter. So when you install our parameters into your Kubernetes processor, some of these are configured automatically, their service discovery. So you can see it, it scrapes a lot of other targets such as core DNS, the API server. And so some of the metrics that we've, that we'll just look at over HTTP, we're going to, we're going to try to, to run some queries on them now in the Prometheus portal. So one of the, one of the metrics, one of the queries that we're going to be running now is to calculate the superior time consumed by CPU in nanoseconds. So this, so this metric that we're going to be using is one of them. We just saw that it was exposed by the C-advisor. So what this query does is it aggregates this the overall, overall CPU usage for the past five minutes by namespace, port, and container. So when we execute this query, we can see, we can see, so we get, we get our results. So we can see the container name, the namespace, and then the port one in this particular container. And we can say it for all the containers in our cluster. So this is the, this is the total seconds that the container spent executing over the past five minutes. So next we're going to, we're going to look at our metric. We're going to look at our query to find, to calculate the total memory available in our cluster. And then we are going to be aggregating by node. So, so running this query now, we can see that we, in our Kubernetes cluster, we have about 72% available memory in our Kubernetes cluster. Next, we're going to be looking at, so we can, we can go to the graph section to see, to see the trend in this particular query over the past one hour. Next, we're going to be looking at a query to, to calculate the non-idol CPU usage of, of our Kubernetes CPUs. So this node CPU seconds total metric is expressed by the node exporter. So we can, we can see where it is here. Okay. So on this, on this node, we can see that it's, it tells us how many seconds the CPU spends in each mode. So the I2 mode is time spent by the CPU. User mode is time spent running user processes. Wow, system is time spent running system processes. And we have, we have a bunch of other nodes of other modes also that, that the CPU has spent some time in. So the I2 mode is, so all this, for this of our modes, the CPU is actually, is actually being active. But when it's idle, that's the time that the CPU isn't doing anything at all. So you would want to ensure that you are getting, that your CPU isn't idle a lot of the time. So this query is going to tell us how much time the, how much time our CPU is spent in the, so we, because we're subtracting it from one, the, what this is going to return is the time that CPU is spent, not idle. So we can see that, that's about 11%. So this cluster has really low CPU utilization of about 11%. So, yes, so this, so we've looked at three queries now that we can use to, to get some, to get some overview about, into the current utilization of your cluster. But this, so, so with this queries you can be able to build an actual dashboard that you're going to be using to monitor the, the current performance of your dashboard, of your cluster at a particular time. So one of a very useful tool to do that is Grafana. And Grafana has built-in support for Prometheus. So what we've done is we've built, we've built a Grafana dashboard running some, running some queries using the Prometheus plugin. And we are going to be taking a look, taking a look at some of the queries that are pouring each of this, each of this charts that you're seeing here. So this dashboard that we have here is the variety of four sections. We have a cluster general overview of the utilization in cluster. Then we have, we have a breakdown by pod. We have a breakdown by namespace. Then we have a breakdown by node. Taking a look at our cluster overview, the CPU utilization chart is showing us how much, how much of our CPU is being utilized at this particular time. Memory utilization is doing the same for memory while the disk utilization is, is telling us how much free disk we have on each of our nodes. The CPU request commitment is, is so, one of the ways to, to manage, to manage cluster resources as Ashama pointed out is through, is by specifying CPU requests and memory request for each pod in your, for your, for your pods. So this is telling us how much CPU has been requested by the pods in our cluster as, as a percentage of the overall CPU that we have. And this is doing the same for, for memory. So the query behind this CPU utilization graph is, this is it. And that's, that's the query that we just as a feedback in the portal. So, yeah, so, so this is the time. This is the total, this is the total time that the CPU has spent in, in non idle mode. Next, we're going to look at our city, our memory utilization chart. So this is the query that is being executed. Because, just taking, just taking, just dividing the three bytes by two power bytes is not going to account for some, for some other kinds of memory that the kernel is using for cache and buffers at that time. So we're going to add up everything. So this presents as a realistic figure for how much of our memory is being used at the moment, and divided by the total memory for our nodes. And then remove that from one. So that's how we get the total memory utilization at this particular time. Next, the disk utilization. So this is a metric that is also exported by the, by the node exporter. And you can, you can see, so we divide in the three bytes by the total bytes, total five system bytes. And that's how we get in our disk utilization. Next, for our CPU requests. This is the query that we are running to get this, to get this graph here. And then we also have, we also have a chart here for the memory request. Next, we're going to go to the part overview section. So here we are displaying the CPU usage performed in our cluster. And this is, this is showing up as course. So this, so this is going, this is telling us how much CPU cores each code in the cluster is using at this time, and is plotting it out here. The query behind this is, so this is the query we're running for that. Next, we'll look at the memory usage. So this is, this is, this is listing all the quads in our cluster and descending all the how much memory each is using at this part of that time. And this is the main metric that's firing this. And we, and so we are getting by port. Next, we take a look at the CPU requests chart. So here we, so this is, this is the query that we're running and we are getting by code. And then lastly, we have the memory requests section. Also, I've been getting the memory requested part called in the cluster. So the namespace section adds graphs similar to the ones we have here, but instead of aggregated by port, we'll be aggregated by namespace. Then we do the same for memory usage, CPU requests and memory requests. Then in our node section, we also have similar charts here. And here you can see it's aggregated by node. Then we, we now have an extra three sections for CPU utilization aggregated by node. So this, this is a fairly complex query, but the main query we, the main metric by using is the node CPU seconds total. And so it's, and then we, so we are we are also subtracting from one. So the total time that the CPU has spent idle. And then we are running a join to, to an existing, to an existing from which it's ruled. So what's this, so combining this, what this gives us is the total CPU utilization for each node in the cluster. And then we also have memory utilization graph here. And then we have disk utilization. So, so this, what is dashboard can give you at a very quick chance is insight into the current utilization and performance of, of your cluster at over the past 15 minutes. So this, so this is provided by Grafana. You can, you can, you can increase the the visible, you can increase the range that's, that is displayed by each graph here. You can also increase or reduce the refresh rates of the graphs that are displayed here. So, to, so to wind up this, this section, you can, we've taken a look at some of the prometiers that gets that exposed useful metrics in our cluster. And that's the cubelet, the node exporter and the shared visor. And we've taken a look at the prometiers portal. And lastly, we've, we've seen how this come together to give a really good overview of the Kubernetes cluster in a dashboard. So with this, you can have a lot more insight into the behavior of your Kubernetes cluster. So that was it. And I'll be handing over to Asham to conclude the webinar. Thank you. Thanks, Elijah for taking us through prometiers and, and the Grafana dashboards. So to wrap up, here are a couple of dashboards from the reflex platform. This, this first screenshot is, you know, it's, it's a Kubernetes optimization report from reflex, which, you know, kind of gives us an overview of the total Kubernetes costs, you know, and the potential savings we can make by optimizing our clusters. The costs are, you know, also broken down by cloud, they're broken down by cluster. And for every cluster, we get real-time alerts and notifications with, with a set of actions, you know, which can help us improve utilization and kind of reduce costs. So, I mean, yeah, that's, that's essentially, you know, kind of optimizing cluster footprints and real-time to, to reduce costs and, you know, kind of improve utilization. The second dashboard is a finance dashboard, which breaks down Kubernetes costs by teams and by the cloud providers. So, I mean, one major concern that we come across in, you know, conversations with customers is, you know, the inherent opaqueness of Kubernetes. You know, users don't have visibility into, into who's using what when Kubernetes, you know, there's no way to allocate costs because, because of the shared resources model of Kubernetes. So, I mean, essentially, this keyboard breaks down Kubernetes costs by team. We can also do this for other customer attractions, you know, we can break them down by projects, we can do that for environments, you know, departments or applications, or, I mean, we can do that on any other custom labels, you know, that are useful for organizations. So, yeah, I mean, just kind of breaking down Kubernetes costs and getting more visibility, you know, into, into who is using what on Kubernetes. All right, cool. Yeah. So, I mean, this brings us to the end of the webinar and now I'm going to hand it back over to George. Okay, we do have questions and we, we have about 12 minutes for questions. The first question is from Nikhil Jain says, Hi, how do I do the cost estimation of any Kubernetes architecture before actually putting it into practice? Um, yeah, I'll take that one. So, I mean, essentially, there are ways that you can do costed estimations of, of, of Kubernetes. But I mean, you would have to kind of hardcore the numbers in, to, to kind of, you know, get, get correct numbers, you would also need to maybe plug in into cloud provider billing APIs. I mean, and yeah, I mean, I guess that is the only way to kind of come up with, you know, with accurate numbers for, you know, Kubernetes costs. Yeah, I mean, if you look on Grafana, you can, you know, there are a couple of, you know, open source dashboards that do allow you to do some sort of cost estimation. But I mean, the numbers in there are kind of fixed, you know, they are hard coded into the dashboards themselves. And so, yeah, I mean, you cannot, I guess you cannot really, you know, because I mean, with public cloud providers, you know, you have a huge range of let's say instances, you know, when you think about storage and so on. So kind of getting all of that information hard coded in is not really, I guess it's not really practical. So, yeah, I mean, to do really kind of accurate cost estimation, you would, I guess, need to plug in into into cloud provider APIs. And yeah, with Replex, you can do that. Yeah, Sudeep, this seems to be more of a statement than a question. It says, I was already thinking about optimization of pods using various mechanisms like C groups, maybe. So I don't know if that's just a statement to any of your previous parts. However, they go on to ask a question. How does the resource request limits for CPU memory translate to C groups on the container runtime like Docker via the Kubelet? Okay, so I'm going to say that. So C groups are a Linux kind of feature. So they allow processes to be so you can control the amount of memory and CPU that processes using. So basically you use it to modify the runtime environment of processes in of the runtime environment of your processes. So the Docker engine relies on this on this feature and it provides some flags that for example, CPU sets. So this is a part of the C groups controller space. So CPU set for example is used to assign individual CPUs to the tasks in a C group. So when the Kubelet starts to execute your containers, it's going to specify the CPU set and memory. So these are parts of the C groups V1 controller space. So the memory feature for example limits memory used by the tasks in the C group and then also provides reports on the memory usage of each task. So the Kubelet specifies the CPU set and memory and so that controls the assignment of CPUs to the tasks and also the memory. So that's basically how the resource requests are translated to the C groups in the Docker runtime. Okay, and that seems to be the last question. I guess I can go ahead and answer last call for questions for those of you out there. I'll just give it a few seconds there and if not, we'll go ahead and wrap it up. All right, great. Thanks Elijah and Hashan for a great presentation. That's all the questions we have time for. Oh wait, there's one question. I'm going to, Nikhil, I'm going to give you time to type it there. It looks like you're typing as we're doing. Okay, he says they're planning on doing an on premise VM. How shall I do the estimation? Does he mean, I guess he means the cost estimation, right? Yeah, so I mean, we do provide that as part of the reflex platform. So I mean, yeah, of course, it's definitely possible to do cost estimations for on-prem VMs. But I mean, since most public, most Kubernetes workloads are running on public cloud providers, so we kind of targeted this webinar mostly towards public cloud providers. But yeah, I mean, of course, it's definitely possible to do that on, you know, on on-prem infrastructure. Yeah, I mean, if Nikhil requires any more information, I mean, we can definitely get back to him about that. Okay. And that's all the questions we have time for. Thanks everyone for joining us today. The webinar recording and the slides will be online later on today and we're looking forward to seeing you at a future CNCF webinar. All right. Have a great day, everyone. Thank you. Thank you very much. Thanks a lot. Thank you, everyone. Yeah, right.