 Appreciate you here. If you're here for the open cost session, then you're in the right place. The title of the session was changed, or at least should have been changed. We introduced open cost a couple of weeks ago. And the abstracts for all of the sessions were due long ago. So that's why there's a little bit of a change in the title there. If you are interested in using open source tooling to help manage your cloud costs and Kubernetes specifically within that, then I think we'll have a good session. My name is Jesse Goodyear. I'm from KubeCost. My friend and colleague is not Sean who's listed on the agenda. But yeah. Hello. There we go. Sorry. I'm still Matt Ray, still a senior success manager. And if you have questions, I'm going to be like Jesse Sidekick. Take it away, Jesse. Thanks, Matt. So I'm the Solutions Engineer at KubeCost, which means I'm a technical pre-sales person, just to be very honest with you all. That said, I'm doing quite a bit of documentation for open cost and trying to make that experience better. One of the people that's most responsible for more of the documentation just joined us. So if you need any help, hopefully she can assist. I came from NGINX, where I was working a lot on the Kubernetes Ingress Controller, and it leads us to here. My friend Matt is from Australia, and we were joking around the other day about this map that I'm fond of, about how we measure temperature across the world. And it turns out that there's one country that has both, does not use Celsius and has sent people to the moon. So this conversation is all about measuring costs, though, so let's get to the agenda. First thing I have in this presentation will be uploaded to the site, since we're done, is resources. So we obviously have a GitHub repository with all the other code that supports open cost. The website generally has a few links that are good for our FAQs, as well as a Slack link to join. That's pretty active. A couple of people a day ask questions, typically about the API. And so I added some of the API calls to the session. And then I'll post them on our GitHub to give everyone that access going forward. The other cool open source utility that we'll be using later today, or later in the presentation, is kubectl cost. It is a plug-in for kubectl based on Crew, K-R-E-W. That, I think, is a much easier way to see your costs within Kubernetes. And so that'll be probably the highlight of the demo. What is open cost? It is, and where did it come from? It is what kubectl cost, the primary tool out there for measuring Kubernetes costs, has always used for getting its base cost model from Kubernetes. kubectl cost just builds upon that cost model by adding enterprise features that enable some scale, as well as some visualizations that everyone wants. It is a community open cost, is a community driven project. We've got many collaborators on the project. We even have a new pull request in our repo for, do you remember the name of the cloud provider? Scale, right? Scaleway. So it's community driven, really trying to simplify the process of getting costs for Kubernetes out of the cloud providers. We do also support on-prem via CSV files where you can open costs, your cost per CPU hour, cost per memory hour. And so it's really useful, even if you don't have. Or if you want to run it on a mini cube setup, for example, that works. So it helps us answer the question, how much does this namespace cost? And that's primarily the thing that, when I'm talking to users, that they're interested in seeing first. Your Kubernetes forces this abstract term for grouping all of your resources into a namespace. So we can aggregate by all of the controllers in a namespace, and damonsets, deployments, pods, whatever is in there. And ultimately, we're looking at the containers and some network metrics in order to build that cost. And we'll get into how those costs are done just so we run a level playing field here. So the two main drivers of those costs within a Kubernetes environment are the nodes. And the nodes are truly CPU and memory. And we're looking at the marginal cost of the CPU and memory in order to find how much a given container is using of those resources. Beyond that, it's relatively straightforward to say, if you've got a persistent volume on general purpose storage versus fast storage, we know exactly what those prices are in GCP, AWS, and Azure. And so that's a straight calculation. And the other side of that is network egress traffic and looking at whether the traffic is going cross zone or bound to the internet, and that will drive costs as well. You put them together, and you've got a pretty reasonable way of getting the total costs. Now, if you look at a deployment, it's got five pods. And then you look at a pod, and it's got a container. The container's actually driving the resource costs within the node. And so this is a really good slide to show how we are accounting for it. And we call it allocation. And that is measuring the difference of the container's requests versus how much it's using. And so if your container in the first graph here is requesting more than it's using, it's inefficient, but it has room for burst capacity. Versus if it's using more than it's requested, it potentially needs that capacity. And if the node runs out, it's just going to run slow. In either case, the greater of the two is what open cost is going to use for that cost calculation. The great thing there is we also can start measuring efficiency, and that is very much a part of the tools I'll be showing in a minute. So you take that allocation cost or that allocation of CPU and memory, and then you multiply it by the cost of the resource. And so like I said, we have the marginal cost of a CPU hour. And we multiply that by the hours of time that CPU has been used on average by the container, and then start doing aggregation all the way up to a cluster level. And so open costs can show you costs, again, from the entire cluster all the way down to the containers running on it at the smallest level. Another concept that you need to keep in mind when looking at the output of these tools is idle cost. That is resources that aren't being consumed on the nodes what you're paying for. And it's really common when a user installs open costs and coob costs to see 80% idle capacity not being used across their entire environment. And for some customers, this is hundreds of thousands of dollars a month of unused capacity that they may or may not have good visibility into. And when you start putting a dollar figure on that idle space, if you will, then it's like, well, I could save 30% of my cloud bill, potentially. Yeah. Yeah. I mean, a lot of times if you come in from the financial side, they call it waste. So it's not just, you know, we're being polite and saying, well, that's idle. And a lot of the financial documentation, that is waste because you've paid for it and you're not using it. And of course, it's really good strategies for reducing the amount of idle resources you've got with auto scalers. That said, there's many customers aren't using them effectively. And this is still driving visibility into what's going on there. OK. From a high level architectural perspective, and this is, you know, aren't exactly accurate, but it gives you the idea of the resources that open cost uses in order to come up with that cost model. We do rely on Prometheus. You can use other Prometheus-like tools. We don't ship with anything specifically. So, you know, in the demo later, we'll show you using the Prometheus Helm chart to install. We're using the metrics provided from Prometheus and the node exporter, Kupstate metrics, container advisor. And then we're also a scrape target for Prometheus. That is critical. That Prometheus is scraping open costs. Otherwise, we will not have accurate cost information. And you'll see negative idle costs. So it'll be your first tip that Prometheus isn't working. And I'll show you some other troubleshooting steps in a second here. You know, we're looking at the Kubernetes API to get all of the constructs that we need. Everything from labels on your containers and namespaces that we can do aggregations by, as well as just the names of the pods and just how much they've requested and how much they're consuming. So those are all the metrics that we're using to drive the cost model. Any questions on the high-level architecture before we dive into some demos? Cool. All right, so I apologize. The screen is in 100 inches. I don't know how big that is, but we probably need bigger for some of the terminal-based stuff that we're going to be doing. It is recorded. I've seen some of the other ones previously. So hopefully, if something comes up in here, you can reference the recording. And I can make this bigger if you need anything. First, installing open cost. Very simple. Two or three lines of terminal code here. And then we'll get into the Prometheus GUI and finally get the data out with Curl, which I'm actually going to use Postman because they're API calls. And it's a nicer way to look at it. And COOP CTL costs at the end, which I think is going to be the coolest tool to watch this with. OK, let me find my right window. And the documentation that I'm showing you right now, which hopefully you can read, it will be published on the open cost GitHub repository as soon as I create a pull request and someone approved it. Just to point out, from people in the back of the room, we're first adding the repo for the Prometheus Community Helm Chart and then installing it. I've already created or I've already added the repo, so if I just copy the second line here. And I'll go to a new cluster. I've spun up a GKE cluster, I should say, a little while ago. And I will switch to it. And if I do a COOP CTL get pods dash capital A, you can see that there is just basic pods that the Google is providing for cluster management running right now. So if I go ahead and install my Prometheus Helm release in the prom namespace, as well as this extra scrape configs, which I'm not in the right directory. I'll even show you what that scrape config looks like. But again, this is probably the main thing that is missed when users do the install, is to add this scrape config to Prometheus. Once that's done, the next step is to actually install the open cost pod. And we're going to put that in the cost model namespace. You can call this wherever you want. I'm probably going to update this in the future. Assuming it doesn't break anything to be actually just calling it the open cost namespace and then change the pod to that. And I need to change directory here. Or just add a couple of dots. And then apply all of the files. Oops. Oh, I did create the namespace. That's good. And to show you what's going on there, it's a very standard deployment here where we've got a service account that we're giving rights to open costs in order to read the Kubernetes API. You can look at the cluster role bindings to see what we're asking for. Basically, just get in a whole bunch of resources. And this is up on GitHub and the open cost repo. I think one of the things to point out is open cost is relatively new. It was contributed to the CNCF sandbox about a month ago. And so there's still some polishing happening in the GitHub. But we're getting there. And you'll see links to Kube cost, which is still recommended for the majority of users who aren't at the open source summit. I assume a lot of you in the room are very good with open source tools and able to modify them to your needs. And so we want to make that very easy. That said, if you're a new Kubernetes user and you just want to get your cost information, they helm and solve Kube costs. Generally, it's going to be a lot smoother path for most of those consumers. The big thing inside of this Kubernetes manifest is this line 33 that I'm on, which is pointing to that Prometheus server. And this can be some, if you already have Prometheus running in the environment, go ahead and use that. That's my point here. So you just update this value to the service name and namespace of the running Prometheus server. And that would work great. So switching over to checking to see if Prometheus is running now. And we'll go with an environment that I had from earlier, because I don't feel like doing the port forward and waiting for that. The easiest way to see what's going on with Prometheus is under status and targets. And you can see that everything in here is green. As soon as something's not, that means you're going to have inaccurate cost data in Kube costs. And we're completely dependent on this. There's some variables for setting timeouts or on queries that can help if this Prometheus database was very busy. But generally, if you're running in the same cluster, it's going to perform well. And we're not pushing a tremendous amount of data. The open cost model is it's only scraping the cloud every, I think it hits the cloud API like every hour. And it's the Kubernetes every minute. So it's not a lot of data. No. And the cloud APIs that we're calling are looking at, I should have mentioned this earlier, but the marginal cost information for the nodes. And so if you look at a M4 extra large in AWS, you can see that it costs $1.59 per hour, something like that. Don't quote me on it. And then that is what we're using to do the math on the container costs. So that's why open cost is making external calls to the cloud provider. OK, so the number one thing you want to see here is this Kube cost target. That's not out of the box Prometheus stuff. That was enabled when we passed that extra scrape config to the Prometheus Helm install. And Prometheus in itself has this nice graphing tool so that we can see this starting to gather data. There's a couple test queries that we've got on the GitHub repository for getting some basic open cost information. This one is looking at both the CPU and memory of the containers out there and actually showing the top five by cost of that query. And so you can see this top one here is $2.75 an hour. And it's a DNS mass service on the Kube system namespace. OK, so let's look at Postman. And I don't know how big this is going to be. But you can see this returning query here. The first query that I'm showing you is using the endpoint for the cost data model. And we've got a couple of parameters. The first one is the window of time that we're looking at. And I've just done one day, but you could say one hour. Or you could actually put in a start and end window. So you could say, and you'd have to get that time and date format that's specific. But there's an example out there that you could use. And it's just a comma separated. And I'm filtering to a specific namespace. And so this is the costs of the Prometheus namespace over the past day. Another endpoint we have is allocation slash summary. This one I think gives you a little cleaner output for this demo purpose here. This is aggregating all of the costs for the cluster in this single JSON output. And so if you just wanted that number, there's where you got, right? How much money that cluster costs for that day. That said, you want to do namespace instead. So you can see which namespaces are driving the cost of that cluster. And you just change the aggregation to namespace. And now you can see I've got that cost model namespace. It's doing most of the work. So that's why it's at the top of the list. And then we've got the Prometheus namespace. And it's resources as well. Finally, there is allocation slash compute. And this is giving us some totals, as well as more detailed data on what is driving the costs within that cluster. And the CPU cost adjustment, you want to talk about things like that? Yeah, so you can edit the JSON files for the cloud provider in order to give open cost rights to read from your AWS environment to get any changes in node prices that you have that are in addition to whatever the cloud provider is. I mean, typically, if you're running in any sort of scale, you've negotiated a better rate than the list price. And so we can pick up those adjustments to your cost. They don't kick in immediately, because Amazon, depending on your cloud provider and how you're using, consuming it, some of those discounts don't kick in for 24 hours, 36 hours. They don't kick in until you've used 200 CPU hours a month. I mean, there are just all sorts of weird discounts. So your results actually change over time as they get adjusted based off the discounting from your cloud provider. Exactly. So the cool tool that is also open source that I'll show you now is kubectl-cost. There will be a link to how to install that on your command line. And what this is doing is creating a port, its own port forward to the open cost service in the open cost namespace. You can see this whole command line here. I'm suggesting that that's an awfully long command. It says, basically, the open cost is running on port 9003. It's in the cost model namespace. And it's called the cost model service. And so if you were to use this against kubectl-cost, if you just follow the standard kubectl-cost installation method, it works out of the box. It just has those defaults. Because we're not using the same defaults as kubectl-cost. We've just got to override those things. So I like aliases. I even have a utility on my ZSH shell that when I type in kubectl-git-pods, it says alias-tip KGP, kubectl-git-pods. I love that ZSH tool. I'll sell it up here while I'm at it. So this kubectl-cost utility basically just installs another verb here for kubectl, which is the verb is cost. And then we're just passing some parameters to get the information we want. This first query, and you can see why I like it better than the JSON output that we were showing earlier, is showing a nice table of the cost of these namespaces for the past five days. And we're showing CPU memory persistent volumes, just because I think those are the most common things that people look at. But you look at network and GPU, if you're concerned about those things as well. It just makes this window harder to read, so I just ignored them. The other cool thing is that we can also do a query for what the estimated monthly costs would be if everything stayed like it was for the past two hours. So this window parameter here. You want to point it? Oh, yeah, there you go. I don't know how bright it's going to be. That window parameter, you can change that to whatever you want to model what the monthly costs will be for that given namespace, that given cluster, whatever you're going to do your aggregation by, and then get the totals. The last example is using this alias that we just created and querying open costs for a label app. So a very common label in Kubernetes is app. This is obviously a demo cluster, and you've only got a few namespaces here. But you can see the applications that do adhere to the app label. And let's just say you required cost center as a label within your deployments. That would be a really good label to do a query for kubectl cost time. The unallocated are things that are not adhering to the label strategy. And so if that was a big number, we'll yell at people for not following rules. It is our biggest number. I mean, it's the most granular. So the problem if you're doing namespaces is your aggregation layer, that there may be multiple namespaces that go together with the service that you really want to report on. And if you use the label on all of those services, all those containers that really go with that service, then you can get an accurate cost information for that particular label. But it just takes some discipline in order for people to adhere to it. And if your companies aren't very good at it, I can tell you there's very few that are. I guess it's policy agents, open policy agent policies where we can say we don't allow that on the cluster unless you have a label. That's the best way to mitigate that. But not everyone's big on policies like that. So that wraps up the talk. I'm happy to answer any questions you have. And I appreciate everyone coming. Any questions? Yeah, so we'll repeat the question. And so the question is, if open costs will give you recommendations for reducing costs and, in general, vital costs, is that someone? Yeah. So open cost is the open source core of the KubeCost product. So the KubeCost commercial product does have those recommendations. So there's definitely a lot of documentation out there that says, if you see this, you should do that. In the KubeCost application, we actually have recommendations that show things like, hey, you're only 9% efficient. Maybe you should look into reducing the pod definitions for lower memory footprints or lower CPU. We also make recommendations for things like reserved instances, spot instances. It's got about a dozen or so recommendations that it makes as suggestions for you to save money. Open cost doesn't do that. It's the raw numbers available to you. But also, going into commercial mode, KubeCost is free for single clusters. So if you're not running a large deployment, check it out. Yeah. It will say free for a single cluster per organization. So put it on your largest cluster with the most nose and see what KubeCost can do for you if you're interested in the savings recommendations that it can give you. That said, open cost is what we're using in order to find that information in order to make those recommendations. So you can develop your own algorithms against the information that open cost is providing you. Thank you. Any other questions? Yeah? Open cost is open source. You can deploy it as many times as you want to. Yeah, yeah. I mean, open cost is Apache licensed. Go crazy. Yeah, I mean, we want feedback. I mean, it's the open source community. You can get it. But yeah, you can do whatever you want with open costs. Including make enhancements. There is a reference CUI within the GitHub repository as well. I'm doing some work on that to make it a little easier to get running. But we do have quite a few of our users that just want that cost data inside of another business intelligence tool. So just getting that data out of the API and not forcing your users to use a new GUI, I think that's pretty valuable too. Yeah. I mean, you could do this as the input to an autoscaler or something like that even. Right. Yeah, you can go nuts. Yeah. Any other questions? Well, thank you everyone for coming. Have a great rest of the conference and happy Friday. Thanks a lot.