 OK, hello, everyone. I'm here to talk a little bit about OpenCost, which is an open-source cost monitoring tool for Kubernetes and other cloud-native environments. A little bit about me. I'm Ajay Tripathi. I'm one of the co-founders of StackWatch, one of the original authors of the KubeCost project. And before this, I was working on similar problems as an infrastructure engineer at Firebase, Google, and Yelp. So before we get into monitoring costs on Kubernetes, I just wanted to talk a little bit about Kubernetes if you're not one of the 5 million back-end developers already using it. Kubernetes is container orchestration. And basically what it lets you do is take your applications to production faster, rapidly scale things out with things like auto-scaling, release code to production more frequently with containers, and be portable across multiple environments, whether that's on the edge in data centers or in one of your favorite flavors of cloud providers. But with a lot of this flexibility and power, it comes the inability to operate efficiently. There's a lot of different abstractions people can be using in Kubernetes. And just increased sprawl, right? Because if it's easy to launch a production service, it's easy to launch too many production services. I don't know how many of you guys are using Kubernetes today, but you've probably left a couple of clusters running right now that you're not using. So to help address this problem, we built OpenCost, which is an open source Kubernetes cost monitoring application. There's a community-built specification, as well as a software implementation of it. So who's in this community? Some of the contributors are practitioners like Adobe. Some of them are actual cloud providers like AWS, Google. Some of them are tooling companies like KubeCost, where I am a co-founder, and companies like D2IQ, New Relic, as well, along with consulting companies like Minecraft Curve. OK. Why are people excited about this? Basically, OpenCost gives you the ability to track and monitor container resource costs in real time and let you see how much of a VM that container is actually taking up. There's native support for shared resources, things like load balancers or system applications like your DNS router or things like that. And tagging is supported, but the majority of slicing and dicing you want to do across different dimensions is going to be without tagging. And there's support for AWS, Azure, GCP, and on-prem. Actually, I wanted to add to this list. Today, there is a pull request for scaleway support, which we're all really excited about. And I think that leads into the next question, which is, why are we doing this open source? In addition to getting people to bring their own cloud provider and extend KubeCost to support it, basically, we wanted consensus around a common language for cost calculation. There's a notion of fairness if you're going to actually be charging back individual containers for the price of the system they're running on. We want this to be interoperable. And we want open costs to be kind of a data source for existing open source tooling in the Kubernetes ecosystem. Things like the cluster autoscaler, which can turn down nodes. And the horizontal and vertical pod autoscaler can now become, effectively, you can send cost data to them as the inputs for doing that kind of scaling. So it's important to us that costs basically become a first-class citizen of concepts within Kubernetes. So I'm going to do my best to demo here. I couldn't get my displays to mirror, so we're going to have to do it over the shoulder. Cool. So here's an example of an application that you might be able to build with open cost data. You can see spend over the last week. And that spend can be broken down. The default view here is to break it down by namespace. Namespaces are folders for your applications. And then you can sort of, because the open cost spec goes all the way down to individual container costs, which are the building blocks of Kubernetes applications, you can drill down and sort of see data for all of those. And it's important, like I said, to be able to aggregate by a variety of Kubernetes concepts and the other first-class members of Kubernetes definitions, things like services, controllers, deployments, et cetera. And then sort of on the other side of the coin, we have the backing resources that Kubernetes itself is running on, although these aren't necessarily like first-class Kubernetes concepts. The open cost spec tracks things like nodes, cluster management fees, disks, load balancers, et cetera. Cool. Let's jump back if I can. So I just wanted to talk a little bit about the specification behind open cost, starting with clusters and cluster costs. So essentially, total cluster costs can be broken down into your cluster asset costs, things like nodes, disks, et cetera, and overhead, things like licensing or management fees. The asset costs themselves are broken down into resource allocation costs, things like hourly CPU cost, hourly RAM cost times a number of units that you get charged for regardless of how many you're actually using, and actual usage costs, things like network ingress, egress, cross zone data transfer, things like that, that only depend on the amount purchased. So making that a little bit more concrete, like I was saying, resource allocation costs are things like nodes, CPU RAM, persistent volumes, et cetera. And usage costs are things like network egress, and overhead costs are the management fees. So I think the next question is generally, how do you go from a node cost to speaking the same language that your containers and workloads are speaking, which are requests and usage of CPU, GPU, RAM, and persistent volumes? Here's a very simple sample analysis of how we arrive at those numbers from the total cost of the node supplied by your provider. For example, we'll take something like a T2 medium and a T2 large. In this sample, a T2 medium costs $0.04 an hour, has two CPU 4GB, T2 large is $0.08 an hour, two CPU and 8GB. So we can deduce that the cost of RAM is a cent per gigabyte based on the difference in cost and the difference in actual underlying resource. We do this analysis across a number of different providers to come up with this marginal unit cost of the underlying resource. That's just being a simple sample of how we might do that. So now that we are speaking the same language between resource costs of nodes and workload costs, we can express workload costs as just the sum of all the resources that that workload allocates. And when I say allocates, I really mean it's the maximum of the actual utilization and the requests. The reason that we think it's important to do it this way in Kubernetes is because when you request a set of resources, the scheduler will block out space so that nothing else can run in that space, so you should effectively be charged for it. And when your usage exceeds requests, you're subject to eviction, so you get charged for the full amount that you use. OK, so here's like an example of how we might actually put this in practice. Exact numbers and names have been changed to protect the innocent here. So you can see kind of we go to a team, look at some data. Notice that the Prometheus server costs are pretty high. Then we can kind of go say, hey, what's going on there? Because we know the names that it's running in, we presumably know the team responsible for it or the applications responsible for it. So you can look here and see that the network cost is high relative to the other sorts of workloads running in this sample cluster. Upon kind of like going in and doing some research, we saw a lot of cross zone network traffic. So a solution here might be to, for example, shard the application to only support traffic within its own zone. And yeah. But once we kind of address that, we come back, see the costs are still high. We noticed that there was a big GPU cost in the namespace, dug in a little bit, found that somebody was running a Bitcoin miner, for example. So those are kind of the simple, obvious places of, hey, this thing isn't doing what I expected it to do. It's a very simple monitoring problem. Once we get through that kind of first order of optimizations, we want to be looking at things like efficiency. The notion of efficiency for workloads in Kubernetes is relatively simple, although there's a few wrinkles. It's simply the amount used over the amount requested. Oh, that's a typo. Should say resources requested. There's a few wrinkles here. Namely, that CPU is compressible, for example. So you can actually, you'll get throttled, but you can go well over your requests and be over 100% efficient. I think exactly how to address those kinds of over-provisioning or under-provisioning, I guess, is beyond the scope of the open cost talk today. But happy to talk more after. Once you've sort of got workloads right-sized and running efficiently, the next step is actually turning down their backing resources to save money. So you can think of that as optimizing the cluster costs. So we have a notion of cluster idle cost. That's just the total cost of all the assets in the cluster minus the workload costs. And the efficiency is just the idle over the allocation. So tools like the cluster autoscaler can kind of peg you to a certain workload efficiency. But by supplying them with actual cost data, you can get pegged to a cost efficiency for scaling your cluster up and down. So yeah, just sort of in summary, putting it together. The idea is to monitor your applications, optimize your workloads, optimize the cluster, and then continue to monitor. And that's kind of the baseline of what the open cost project can do for you. Cool. We can take questions. Yeah, we're in the CNCF Sandbox, I believe. Is that someone should confirm that? But CNCF Sandbox is a project. We haven't put any logos on yet. This talk was designed before we were accepted into the CNCF process. All right, another one. I don't know if it's Scaleway itself. There's a PR today submitted by someone for supporting prices, node prices in Scaleway. So there's an interface that you fill out. I think you probably, it depends how complicated the pricing is in that platform. For a simple one, you probably do it in a day. Cool, I know this was a longer talk, but I'm happy to keep answering questions. If you want to talk, yeah, there's one back there. Yeah, so the question was, is it just cost monitoring for Kubernetes or are there plans for other things? I think there's a lot of interesting Kubernetes-adjacent projects or projects built on top of Kubernetes. I'm going to blank on the name for the Fargate, I think is one of them, like AWS Fargate uses Kubernetes-like systems of resources and requests. The ideas are generally extensible to anything where you have an underlying asset cost and a request and a usage. So yeah, it's extensible to other container monitoring frameworks. You could certainly fill out the open cost spec for things like Mezos or other things like that because the ideas are all transferable from within the spec. The specific implementation that I showed on screen is Kubernetes only, but the spec, I think, is generally applicable for containers. Yeah, the question is, is there a process for submitting a proposal to the spec? And yes, it's in the same GitHub repository as the implementation is today. I think, yeah, I think we should get those links added. But if you just go on GitHub and look up open cost, you should probably find it along with the spec. I think that's all the questions unless we've got anything from the stream. OK, yeah, thanks everyone for coming. Thanks for tuning in on the stream.