 All right, let's go ahead and get this kicked off. Today, I will be talking to you about capacity planning, using tools within Kubeflow and machine learning. If some of you may remember, I spoke, I think it was at Detroit, where I had a similar talk, where I was talking about using statistical modeling for capacity planning. This is very similar to that. I'm actually going to brush over a little bit of the content I shared about how you can use things like Six Sigma and different types of statistical models, and then how that then eventually goes into using machine learning models. So a little bit about myself. I work at Red Hat. I've been there for about six years now. Originally, I was an architect with NR services, mainly focused on Fortune 50 US companies. So a lot of automotive and health care. I have recently moved into AI. So my background is actually in machine learning. That's why I really enjoy more of the metrics and observability and ways to use metrics to solve problems. So yeah, I've been there for six years. This is my sixth KubeCon between North America and EU. So some of you may be familiar with me. If this is your first time hearing me speak, I'm excited to have you here. So I want to start off with a problem statement. And it really comes down to capacity planning for Kubernetes. This was done, I think, back in 2022 was a survey of different CTOs. And nearly 50% were struggling to keep their cloud costs within the original estimates. There was a new one of these I was actually just published recently that I think this number has actually gone up closer to about 60%, especially as these prices have been increasing. So there is definitely a need. I know I think even the CNCF recently had an article published about the need for more capacity planning as well because of these rising costs. I always like this quote. The major myth of cloud computing is cloud computing is less expensive than on-premise. At the end, it depends. That depend factor really comes down to how you do capacity planning. When I was consulting for different Fortune 50 companies, there was the myth of that cloud reduces price. Very rarely did I see that, especially as you start scaling out horizontally. But there are ways to bring down costs through with good capacity planning. I'm going to be sharing a couple of those here and then how machine learning fits into that mix. So these are my four examples of managing resources, kind of the four do's of capacity planning. One of the biggest issues I see is people just not understanding their applications. I'm not going to dive too much into this, but it does fit into capacity planning because if you don't understand how your application works, you're going to run into issues. It's like throwing darts like I have here. You're going to try to get lucky and you may end up either spending way too much money or you may underutilize your resources and have failures and boot loops and things you don't want to be notified on a Saturday night at 11 PM. Auto scaling fits into this. I'm not talking about auto scaling today, but know that auto scaling obviously is very critical. When do you trigger auto scaling within your capacity planning? Not running performance tests. And then here, number four is the biggest one I'll be talking about today is making sure to calculate your resource metrics and then making sure that you're continually recalculating those metrics and understanding how your application works and how it consumes resources. So I'm going to show some examples without machine learning. This is an example of an application I pulled over the course of, I think, 24 hours showing different CPU average for a variety of pods running the same service. So this is a load balanced application, but it's the accumulation of the two. I'll talk a little bit about what a Z score does. So here we have a promql, and this is calculating really the variety of change. So those are for you who are not familiar with statistics. It's really giving a magnitude of how often something has changed and what is the variety of it. I have at the end here a blog post where all this is available on my GitHub as well. So I know some of you are taking pictures. Just know that you also can access that directly. So here's an example of what that Z score looks like. A lot of times in observability, we use Z scores for alerting purposes. Something has changed. Something is happening either for good or bad. It could be that an application is erroneously consuming resources or you have more users than you were expecting, and you just need to respond to it. So using Z score to understand when things are changing is critical, not just for statistical method, but you could even use this for machine learning where you use this in combination to know when you need to be changing your capacity planning. So anytime you see this score going up, it means there's something happening from a capacity standpoint. The way that I've calculated requests, and just for fun in fact, when I originally gave this talk, I was also talking about limits in Detroit. And then I had a mob of observability people afterwards waiting for me in the hall and had to educate me on why limits are bad. And so I learned. So I'm no longer talking about limits here. This is a way that you could then determine things like requests. The example that I have here is a 2 sigma. You might be familiar with 6 sigma in statistical processing. Same thing. We're just taking two of those sigmas based off of the average. And what we've found in capacity planning and there's some research along this line too is that around 2 sigma is around where you're going to hit that sweet spot for a request. Anything more than this is overkill. The example that I always give, I was working with a health care company. And I was brought in to do some capacity planning with them. And we had everything worked out. We had some estimates of how much they were going to be spending. They call me about a year later. And they're like, Chris, we're spending four times what we were expecting. Something's going on here. This isn't good. Can you come and take a look? So I come back, I take a look. And I realized that all of their teams were requesting four gigs of memory and four CPUs for all their applications. And I asked why. And they're like, well, we just didn't know what to put in there. So we put in a safe value. But we took in the maximum amount of if they were even doing that, they were just taking the maximum amount and saying, that's our request. We always want to have that available. That's not how Kubernetes works. That's not a good plan. You will spend a massive amount of money. So you can do something like here where I'm using a statistical method to determine a request. And here's an example of that. And as I said, I'm going to show you the link to the. And this is my blog post on that specific part. So what I talked about in Detroit, this blog has all the information on that statistical method. Now, this is a great method. Machine learning should not replace this. What I'm showing you today is more something that can complement it as you're doing resource planning. So now let's move into the machine learning side. So I like to use a model called long short term memory when doing capacity planning. If you are a data scientist, just like I have the observability people in the back ready for me from the limits, I have a feeling that data scientists might do the same. But I do like this particular model. Some of you may be familiar with the Transformers model that's being used in Gen AI. It does something very similar to this particular model. But I find that this model is easier to work with and actually works really, really well for the type of statistical type of modeling and prediction and gradient that we're trying to create when we're doing capacity planning. So it uses recurrent neural networks. So if you're looking at your screen up here, that's the part that it kind of feeds back into itself. The Gen AI like LLMs do the same thing. They just do it with a different approach. This one is, like I said, a little bit easier to work with. And then the things on the bottom are what we call gates. I'm not here to explain the in-depths of machine learning, but these gates are what keep the memory going. Do we need to forget this particular piece of information or is this information pertinent to the model? And it just keeps repeating itself over and over again with the data. Like I said, this is easier to implement over a transformer model, especially in this use case. And it's a very highly stable model over just a plain RNN. An RNN typically fails often when you're dealing with a lot of different data. This normalizes that data. And because it's repeating itself and has those different gates, that normalization helps it make sure that it stays stable. The process here would look something like this. You would have your Prometheus data feeding into the model. And then you would use that model to calculate your prediction. When you're using time series models, you're typically predicting pretty close. So it's not like I'm predicting in six months. I'm usually predicting in a day or two. The more far further out you get, the less reliable the model becomes. And that's why this is actually why it's not as pertinent to use in things like gen AI. It's usually interested in the next in the series. And then through that information, you can modify your resource capacities. You could set in your deployments your request. Or you could use this information to update your alerts around the types of services that you're monitoring. I've talked a little bit about some of these challenges. I'll just glance over it. There is a loss of accuracy over time. I did mention that. Like I said, you wouldn't predict what your resources are going to be in six months. It's going to be what's my resource plan for tomorrow or next week more likely. This does take a lot of data. So this is the chicken before the egg problem. So you would need to make sure that if this is a greenfield application that you're using some type of testing and performance environment to start to get some early benchmarks of what your capacity planning is going to look like. It doesn't predict seasonal changes. So if you're like a retailer and things pick up around Christmas or Black Friday if you're in the US, this isn't a good model for that. This is looking more for changes in your data consistently over time. So if you have more users who are continuing to use your application, or maybe you have less users, maybe someone's using a new API and there's a decrease and you want to reduce that request, this is perfect for that type of workload. It shouldn't replace traditional methods. Using statistical modeling is still a very powerful tool and it actually is probably what I would do first. This is just a way that you could add on to that and build more understanding of how to do capacity planning. This one's fun because it's very visual and I'll show you a demo here in a moment. So if you were presenting to, let's say, stakeholders on why maybe you need to change your requests and maybe your operations and you're requesting more equipment, maybe this type of thing would be good to show long-term increase. And with that, I think that is the demo. So let me switch over to that. This is a Jupyter notebook. I'm not going to go into the nuts and bolts of this, but it's basically a way to interface with different machine learning libraries, typically in Python. So the code you'll see here today is in Python and it's using a library called PyTorch. PyTorch is usually more for experimental non-production environments, usually for research purposes, where TensorFlow is usually the one that people use for production models and so forth. So this works well for me because I'm just showing a demo. But just know that all the big data science libraries will have an LSTM library. So here I'm bringing in the Prometheus package. And just so you know, this cluster is not up anymore, so don't try to hack me. And here I'm pulling in the data. In this case, I'm doing segments of two hours. The reason why I did that is because I only have this cluster up for about three weeks and the data worked better at two. But you would probably do it in segments of an hour or maybe 30 minutes. That's where you'll probably get better optimal results. Or you might do a day or you might do half a day. It really depends on how you want to quantize the data. But in this case, I got the best results for myself at two hours. We do a transformation of the data. I won't go too much into this. This is just getting it into the necessary data structures. Here I'm loading the data. I actually saved the data here because just in case we didn't have good connection or wasn't able to reach out to that cluster, I already have some data here. So I know it's how I want it. So all this is doing here is normalizing the data. When we work with data and machine learning, we're typically wanting values between either 0 to 1 or negative 1 to 1. So all this is doing is transforming the information into values that work best for the model. This is just creating the data structure. I'm not going to go into this. Just know that this is what's defining the model. The nice thing about LSTM is that it's optimized for CPU. You might get some slightly better results on a GPU, but it's not like an LLM where some of them actually even require GPU to do the training, or you're just not going to get an order of magnitude improvement. So just know that if you want to try this out, you don't have to have a GPU on your cluster. This section here is going to start the training. I have 100 different iterations. I'm not going to go into the nuts and bolts of what these values mean, but the fact that it's increasing here just tells me when I need to cut the learning or modify the different parameters. So here I'll go ahead and show you what the results look like. This is based off of the training data, so we would expect this to be very, very close to the model. So we use this data to train the model, and then we're just replugging that information back in to see how it correlates. So if this wasn't accurate, then I would know something was very off. And this is right now in that normalized form. So I think it was 0.049 CPU that I was using on average. This is a normalized negative 1 to 1 version of that data, and we will change it back into the format that we had that will be applicable for our scenario. And then let me real quick go. This is just doing the same thing with the test data. The test data here was the last five days that I removed and just put it into test. So this data was not used to train the model. This was the last five days after the training data. This is the test data. And you can actually see we get some pretty accurate results here. It may not get it exactly, but it actually gets the trends, which is what's probably most impressive. And that's actually what's most valuable for us when we're doing capacity planning. We want to understand our trends. We want to understand our valleys, our slopes, and have a good understanding of how our applications are consuming resources. So I didn't have enough time today with this segment to show you how you could tie that all in. But the way you could do that, it would be to have a pipeline. So Kubeflow has Kubeflow pipelines. And then Open Data Hub, which is the Red Hat upstream version. We have the same thing, but we use Tecton under the hood. So you could actually set up pipelines that are updating. Updating this information, you could put it into a new alert, or you could have it even updating like your requests through that pipeline as well. I wouldn't recommend that for production, but it might be a good way for like in your development environment to be keeping track of different resources. So this is one way that you could do this. You would have to rerun this pretty frequently as you were getting new data and revising this model as new trends are formed. But it is a good way to see, in this case, this is about, like I said, about five days worth of data, get an idea of what those trends would look like. So with that, I am open for questions. Right here. And the microphone for questions is over here. Hello. OK. Hi, my name is Flaviano Christian Reyes. I work at Bloomberg. Kind of a two-part. So one, is this deployed into some production setting for other teams? And two, what was your production SLA? Like how fast did you have to calculate the next time series value in order to action on it? Yes. So typically, you would find this deployed in a performance test environment, or you would be pulling in an experimental. So it's possible that the capacity planning team has their own cluster that they're using for testing. And then they could pull down that data from production. From what I've done with this deployed, I have not deployed this in production yet. But the example I gave at the beginning, I have. And the way I did that was through a custom alert monitor. So the Prometheus would have its own promql. And then when it triggered, I could then tie that back to an alert monitor. I've also done it with Kata, where Kata was configured with a promql for the Z score that I showed earlier. And you could do something similar to this. Like I said, this is very experimental right now, more for information. But yes, you could find ways to deploy this into production. All right, can I just get a ballpark estimate for how long the model took? I mean, we ran it with two weeks of data, and it took just a couple of seconds. If you're talking about maybe six months with the data, you're not talking more than maybe one to two minutes. It's not like an LLM where you're training for days. And I trained that on a pretty moderate amount of CPU. I think it was capped at like two to four CPU. All right, thank you. Hello. Hello. I'm Salih, working for Get Your Guide. One of the questions that I have was, how does this would work like a GitHub-based approach, since we are kind of rightsizing the resources in place, would you recommend like a feedback loop that opens a PR, like how automated can we get this? Yes. So once you get past the experimental phase, which I just showed with like a Jupyter notebook, you could put that into like, let's say, a tecton pipeline. And then that tecton pipeline would just run the model. And then you would have that all defined in GitOps. That's actually the example that I gave the gentleman just a moment ago with the statistical side. That's exactly what I did. I had all within Argo and managing everything through tecton pipelines and different alert monitors and different jobs that were running to keep that going. I think we're at time. We can take the questions offline. Thank you.