 Alright, welcome everyone. Are you ready to unleash the power of cost-driven Kubernetes optimization? We're gonna embark on an adventure to achieve cost optimization and efficiency. So we'll jump straight into it because we got a lot of stuff to talk about. So meet the Kube Cost Buster team. I'm Rachel Leakin, code name Razor because I cut through those costs. I'll pass it over to my teammate, Antoinette. And I'm Antoinette. Code name little GG means little go-getter. Some of you who know me know what that means. For those who don't, by the end of this presentation, you will definitely know I earned that name. And meet our mascot, Little Cuddy. So if you see Little Cuddy on the slide, that means we want you to pay special attention to this particular information. Alright, let's jump straight into it. Alright, so yeah, we're basically here today to share with you a few ghosts that you might be encountering as you begin your Kubernetes journey. But before we go there with the hidden aspects of it, Rachel and I thought it might be a good idea to review some standard costs that you might have already experienced to date. So when we talk about infrastructure, in essence what we're really talking about are compute resources. And this is where you're billed on the number and the types of nodes that you're utilizing. And so if you're using managed Kubernetes, okay, and welcome everyone. But if you're using managed Kubernetes, you incur costs for the management and the control plane services you ingest from your service provider. Okay, so let me pause and think about this. So when our Kubernetes applications often use persistent storage and the cost depends on the size, the type and the duration of that storage. And also when we're looking at networking, what we're looking at are traffic ingress and egress, which also can add significant costs as well. Okay, so for monitoring and logging, here we're talking about the costs associated with using tools like Prometheus, Grafana, or commercial monitoring solutions. And also there are licensing fees associated with reinforcing your security posture in your Kubernetes space as well. Okay, so I know all of us in here knows that the complexity of building, maintaining a Kubernetes environment and developer applications are fairly complex. And so the talent needed to maintain and build these and develop in this environment associated with attracting that talent and keeping that talent may be significant as well. So you need to consider those costs as well when you begin this journey. Depending on the industry that you're in, compliance and regulatory regulations may cost may be factored in as well, especially when you're starting to look at things like data privacy and security. So we're going to stop there. And while some of these are the usual suspects that you encounter, Rachel has a few additional insights that she wants to provide for four of the topics that I just reviewed with you. So with that, I'm just going to go on ahead and hand it off to Rachel. So Rachel, thanks, Tony. All right. So before we get into the details, right, how do you vanish these ghosts, right? You if you can't see them or you can't find them. So first step, right, is you need to identify where all your costs are coming from. So using a cost discovery tool or reporting tool to help is a great way to accelerate that start. But before we get into tools and services, keep in mind, right, you should have best practices around tagging and labeling your resources, because a lot of these these tools and services rely on you having good, what we call good hygiene, right, around your tagging and labeling of those resources. Another good thing to keep in mind is if you're struggling to enforce labels and resource tagging, use think about using something like policy as code, like OPA or Converno to enforce those labels in your resources. But back to the tools here. So we have many tools we can select. But I just went with one, for example, here open costs, which is an open source project vendor neutral that helps you measure and allocate cloud infrastructure costs as well as container costs in real time. So using a tool like this will help you identify those obvious costs, like things like your namespaces and pods. But also if you configure it correctly, you can really drive dive deep into those hidden costs to get more details around that. So first, you need to gather all that data before you can take action. Another thing to consider is leveraging metrics to dive in deep and drill into those hidden costs, right, find those little pockets of ghosts that you can't find. So projects like Prometheus can help you get those details with the appropriate metrics scraping. Prometheus, in case anyone's not familiar with it, is an open source monitoring, alerting toolkit. I'm sure they have a booth here on the floor today at the conference. So if you want to get more details around that, go check them out. However, I will say, please be careful of what you monitor. So monitor what matters to you and collect what you need, because you can collect all this data, right, thinking you're going to save on costs, try to get all those cost details. But those metrics need to be stored somewhere. So now you're occurring costs for storage now if you have too much metrics hanging around. So that's kind of what we're talking about tools. So Tony also mentioned scaling as one of our options, our costs here. So I'm sure most of you are familiar with cluster autoscalar. I'm going to just assume here, but just a quick refresher, right, cluster autoscalar is a way to scale your cluster, right, increase your worker nodes based off of pending pods. So by default, cluster autoscalar will cycle through your node pools, your node groups, to get the right instance type to join to your cluster, right? But sometimes, depending on how you've configured your node pools and your node groups, you don't necessarily have the right instance type for that particular situation that's happening or occurring. So one thing to keep in mind, cluster autoscalar has priority expander, which is a configuration that will allow you to have more control over this cycling through the node groups. So basically what you can do, as you can see here as example code, is you would have, you would say, I want to prioritize a two, three, sorry, T three large instance type over a forex large instance type. And why does this help you save on cost is because let's say you have two pending pods, you don't need a forex extra large instance type, you just need that large one, but you want to have control over that cycle. So think about using priority expander as well to help you save on those costs. Another thing you can do is leverage carpenter. I'm not sure if everyone is familiar with carpenter as well, but it is an open source cluster autoscalar created by AWS. If you want to talk more about carpenter, I feel free, I'll stay afterwards and we can have a deep dive on carpenter. But similar to to cluster autoscalar carpenter is acting on pending pods as well. But carpenter has this cool feature called consolidation. So what it does is it will look for underutilized worker nodes in your cluster and try to take action to consolidate it, basically meaning, let me remove this really large instance type, let's say an ex forex L large, and I'm going to move those pen those pods onto a large and decommission the at the forex extra large. So what it that how that helps you save on cost is a you're not manually having 10 people right looking at trying to how to right size your cluster is taking away some of that manual labor for you. But it's also is it doing this getting the right pricing and we can go into way more detail that has a lot more configurations you do with it. But this is a great way to help save on cost it while maintaining performance. So carpet does that as well. All right, so let's talk about storage. So I know I always forget about storage when it comes to my clusters. It's not something I think about. I just, you know, spin up stuff, leave my storage. So but this section is all about cleaning up, right? So how many of you have like 50 versions of images sitting in a repo? Yep. But how many are you going to go back to version one or two? Oh, but right again, review those images regularly, free up that repo space. Again, it seems quite small, but when you have like large enterprise customers, especially that Tony and I work with, you get hundreds of developers, hundreds of platform teams. It just piles up. So try to go through, look at your image repos and try to clean those up on our regular cadence. Again, how many of you have logs from the beginning of the cluster creation? Yep. Do you need all those logs? Do you need logs for all those locks for your development environment? Probably not, right? So if you think about again, cleaning those up, but keep in mind if you need them for compliance reasons, which is a valid reason, and you need to keep those logs, consider moving it to a cost effective storage. So it could be like an S3 or local storage or on-prem storage that you have. So keep that in mind. And the last one is clean up your PVs and PVCs. So I know when I create stateful sets and my deployments and they might have PVs or PVCs, I delete my deployments, forget about those things. I never think about it, but they will come back to haunt you when you try to delete them and hunt them down, right? So keep that in mind as well, because they add up costs, right? They're just sitting on your worker nodes taking up that storage space. So again, a regular cadence of cleaning up your cluster will give it that bit of shine and save on your bill. And then another one that we talked about was network optimization. Again, another thing that I don't think about as much, but as you probably all know, most cloud providers charge some kind of data transfer cost. So again, networking cost seems quite small when you have small clusters, but let's say you're pulling, you have a lot of images that you're pulling. I said, again, many developers are pulling images all day, and you don't think about it, right? Those are costs that you're incurring without them even thinking about it. So think about caching your images, especially the ones that are the latest ones that you're going to be using very often. That will help you save on those costs. Similarly, same thing with your pods. We know pods can talk across multiple zones, multiple regions, depending on how you have your configuration, but do they really need to talk to each other in zone one, two, and three? Most likely not, right? So think about using something like topology awareness routing, which is now released in one Kubernetes 1.28. So this way you can make sure your pods only talk to the pods in its zone. And then again, lastly, these are all tips that are quick tactical ways that you can start taking action on saving on costs. They are pretty obvious maybe to some, maybe not to all, but I know I always forget them. So even though they're obvious, they go often forgotten. So I'll pass it over to my teammate internet to talk about those hidden costs. And for some of you, you all might have thought that there was another person added. I also go by Tony. So if you also want to reach out to me as we get to know each other on this journey, feel free to reference me as Tony as well. And for those of you who just walked in, what we have just talked about, we've just reviewed the usual suspects associated with Kubernetes. But now we're just going to transition into some lesser known hidden ghosts that have equal significance along with a few mitigation strategies for you to implement. I'm just going to make sure because we're holding these microphones. Can you still hear me? Okay. Because I have it. Y'all are so awesome. I mean it. Y'all are awesome. So even if so less, we're going to go on ahead and start with orphan pods. So even if they're not actively serving any productive workloads, these types of pods still consume memory, which leads to increased costs. These, these types of pods may also generate. Yeah, may also generate network traffic, which also increases data transfer costs and may net negatively impact your network performance as well. And I don't want to pause and make sure I get my thoughts right on this one. If the software running in these nodes or these types of orphan pods have associated licensing costs, those costs are still going to persist even if those pods are no longer necessary. Okay. So this is going back to the whole cleanup and hygiene piece. That's usually, you're going to hear a recurring theme as we talk about this with hidden costs. And Rachel and I had already spoken about persistent storage volumes, but those associated with orphan pods will still continue to accrue costs, even if they're no longer being utilized as well. And finally, when you even even though these pods are orphan, they still need to be monitored, managed, and maintained and have to have troubleshooting, or you have to have a resource troubleshooting them as well. So again, this will incur additional costs that you may not have accounted for as well. So how do we mitigate? Well, little cutty suggest that we do the following, that we mitigate through regular auditing, implementing resource limits and quotas. Also monitoring and leveraging automation tools in these types of scenarios, and being familiar with orchestration best container orchestration best practices, as well as implementing cost management tools in this scenario. So next, let me check them down there. So next we're going to look at inefficient resources. So when we're talking about inefficient resources, here we're talking about oversize pods and undersized pods, okay? Do y'all agree? Okay, cool. Just want to make sure. So when pods are oversized, they tend to have a tendency to allocate more CPU than they actually need. So remember all the way back when we were talking about managed Kubernetes, we brought up the fact that when you're using managed Kubernetes services, you're charged based on the resources allocated to your pods, right? So we can just make the assumption or the correlation that therefore oversize pods leads to increased infrastructure costs. So this is something that you need to consider as well. So, and you know what? I didn't transition, but that's what I was just talking about. But so, but now we're going to transition over into undersized pods. So when we're talking about undersized pods and pods, they don't have enough resources to run their work or to run your workloads efficiently. And this leads to performance issues, crashes, and slow response times for your applications. And additionally, they may not be able to scale horizontally. Remember, we were talking about scaling early on as well. This is one of those things where it might apply. And this is because they are basically resource constrained at this moment. So finally, while, you know, intuitively, people might think that undersized pods are actually cost effective when in fact, they're just not, they're not efficient. And basically, you might need to provide provision more nodes to accommodate them if they're not managed effectively. So how do you mitigate? So little cutting suggests that you right size the pods by analyzing the resource requirements of your application and thereby allocate accordingly. And also use tools like horizontal pod auto scaling to automatically adjust and or automatically adjust the number of pods or replicas of the pods based on the CPU or the memory usage. And with that, we're going to go on ahead and slide over into the third, which is idle resources. And there are several costs associated with auto resources. But for the sake of time, we're just going to hit on a few of them today, but we'll be here afterwards if you want to dive into a few more of these topics. So even when these these resources are not actively used, you still have to have your cluster administrators monitor patch and ensure the health of these types of resources or these idle resources. And I'm sorry about that. Keeping up or keeping Kubernetes software nodes and node operating systems and containers, runtimes up to date is essential for your security as well. And also ensuring the stability of your of your various pods and running these updates on these idle nodes will still incur costs. So again, this is going back to that recurring theme of hygiene and maintenance. So you're going to hear a lot of these as well as with the mitigation strategies that are there. But there's a recurring thing that I want to call your attention to. All right. So all of us know there are adversaries out there looking to exploit weaknesses in our systems, right? Idle resources are a prime target for that because they can become security risk if they're not properly maintained because there are underlying software that goes unnoticed and adversaries will seek to exploit that if given the opportunity. So that's another reason why you have to pay attention to that resource fragmentation. This was something that was kind of new to me as well when I first started investigating it. But this is where clusters have sporadic pockets of available resources that can't be efficiently used by your active workloads. So this also negatively has negative cost implications for your organization. And finally, for on-premises deployments, running idle hardware consumes electricity and generates a lot of heat, which leads to increased energy costs. But on the back end of that, this can increase the need for additional cooling infrastructure as well. So to mitigate, you should investigate implementing resource requests and limits as well as enforcing those policies to prevent over-provisioning. So we got two more that we're going to go through. How y'all doing with me so far? All right. All right, cool. Thank you so much. So next, we're going to talk about automation. And when we're looking at automation, we see an investment in tools and services like to enhance CICV pipelines, configure configuration management, and backup solutions to reduce overhead. But these also, each one of these things that I just mentioned also come with their own operational costs. I want to call out the fact that data egress can result in additional charges. So make sure you understand the egress costs associated with your selected cloud provider. Okay. Also, with automation, as your needs evolve, you may need to customize your Kubernetes setup. So this may lead to increased development and maintenance costs, especially if you require extensive code changes and integration with other systems. And some of you who I've spoken to outside of this room are very familiar with this particular type of cost. We've discussed it often. So how do you mitigate? Well, simply you start by conducting a thorough cost analysis before, I want to emphasize this before implementing your Kubernetes customization. And finally, you want to keep up with best practices and cost strategies specific to your deployment environment. So with our time remaining, I just want to finish off with this by talking about load balancers. It's a lot here. So we're going to hit the key points. Okay. If your apps require load balancing, you may be charged for using the load balancing services within your cloud provider or offered by your cloud service provider. And most of most cloud services providers charge for the use of your load balancers, and this cost may depend on the number of services exposed, the amount of data transferred, or other factors, which we're just not going to go into right now, but we can discuss after the presentation. But this is especially true if you have many services or high traffic. The other thing I wanted to call out is that if you have multiple environments or you frequently create or destroy clusters, you may forget to deallocate your load balancers. Again, going back to hygiene that you may no longer need. So that's something you want to be on the lookout for. Okay. This is something that was keen to me. If you terminate secure socket layer or transfer layer security at the load balancer level, this can lead to cost increases, especially if you have or leverage premium certificates or require significant encryption or decryption resources. Okay. If your load balancer provider offers protection against distributed denial of service attacks, this can be especially costly if you are experiencing frequent attacks. So how do you mitigate? You monitor your load balance usage and costs regularly. You consider alternatives like ingress controllers, and also you review your secure socket layer and true Kubernetes service strategies to reduce your costs. And you also delete load balancers that are no longer needed. So with that, I'm going to turn it over to my partner, Rachel here to finish it and take us home. Okay. So we learned a bunch of tactical stuff, some hidden costs, some obvious costs, so now how do you actually trap those ghosts? So again, right size your resources. That could be your pods, that could be your cluster, use the appropriate toolings and try to leverage automation as much as you can. Same thing, like I said, with implementing scaling. So using things like cluster auto scaler, carpenter, either Kata, there's tons of ways that you can scale your clusters effectively to help you save on costs and not have extra waste. And again, how do you see those ghosts? You need to monitor continuously. You need to optimize, use the right optimizing tools. And then again, once you've kind of got all your data and you've had your lessons learned, consider those that data, right, when you're creating new clusters, right, in a new platform strategy, make sure you take all your findings and leverage them again. But once you trap your ghosts, it's not over. They can come back to haunt you still, right? So you have to keep up with your ghost hunting activities. So with that, any questions? And again, if you want to find us, you can scan our QR code and we can gladly chat more. Okay, so you're saying you have your developers, you don't want them to be focused on kind of this thing. So there's a couple of ways you could tackle it. I've seen very large teams just have a whole team dedicated to FinOps, right? Where they talk about FinOps, not just for containers, but they cover all, you know, other cloud resources. So that's one approach. You just have a whole dedicated team that goes through and tells them, hey, we checked every month, this pod needs to be addressed, or they take automatic action against it. If you have a smaller team, again, think about using some of these tools like open costs, you can get an enterprise version, similar like cube cost, or even there's tons of vendors, which you'll find on the floor here, right? I'm sure they could sell you on what they do, but you can leverage automation, right? So a lot of these tools do some automation on right sizing your pods as well. And then again, configuring your cluster auto scaler or carpenter to ensure that that's kind of taken handled out of the developer's hands. Any other questions? We got plenty of time. So anyone wants to talk? Any other questions or say we can always talk afterwards as well? All right. Well, again, thank you all so much for coming to sharing this time with us. Again, you could have chosen any place, any time. And I'm going to go on ahead and say it, Rachel. Thank you for hearing our voices on such a relevant topic. That's very important to us. And you could have chosen to go anywhere, but you took the time to come and hear us. And you have no idea how meaningful that is for us. So thank you so much for coming to hear us out.