 Welcome, everyone, on the second talk from our analysis, testing, and automation track in session one here on Saturday on DevCon CZ. Hope you're having a great time. Let's welcome our speaker, Shrey Anand, whose talk is data-driven resource tiers for OpenShift apps. Shrey, for his yours. Thank you, Peter. Hi, Anand. I'm Shrey Anand. I'm a data scientist with Red Hat. And today, I'll be talking about data-driven resource tiers for OpenShift apps. This project is a team effort of the AIOps team in the center of excellence at Red Hat. And let's get ahead. So I'll talk about the backstory of the project to give you a little context and the test environment that we work in. And then I'll formally define the problem that we worked on. And the solution that we came up with. After that, there'll be a small demo where I'm going to walk you guys through the notebook where we implement the solution. And finally, there'll be a discussion session. So before we define the problem, it's a good idea to know some of the open source environment that we work with. The first one is the Operate First Initiative. So I'm not going to go into details. But the basic premise here is that software is available through open source. And the difficult part now is to operate them. So Operate First Initiative helps in developing and operating cloud applications. So as a part of it, we have a cluster deployed with an Open Data Hub instance. So Open Data Hub is a blueprint for building an AI as a service platform on OpenShift. So if you want to build AI application, you can do that using ODF services. In this case, we are using the Jupyter Hub service to run the model notebooks. In the Operate First Spirit, this project leverages data generated from operating the Jupyter Hub application. And if you want to know more about the Operate First Initiative, you can just go to operate-first.cloud and get all the information regarding that. So we, like I said, we are working with the Jupyter Hub application. And for those of you who don't know what that means, it's a cloud application for data science development. So it allows data scientists to train model and then do inference, and then visualize your models as well. So since we are trying to operate this application, it's a good idea to know a little bit about how it works. So we have a front end of the application, which starts with a spawn page that has some configurations for us to tweak. And I'll quickly show that to you. So this is what a Jupyter Hub homepage looks like. So the first configuration here is the notebook image. So in this case, we have selected Operate First Jupyter Hub Analysis notebook image. What that means is when we select this, we'll have all the dependencies that we require to run the experiments in this project. So once we select this, the other important configuration here is the deployment size. And this is precisely what we are working on in this project. So here you can select from small, medium, and large. And what that essentially means is that you have basically choice of what CPU resource you want and what memory resource you want. So traditionally, these values are arbitrarily selected by the cluster admin. And what we're trying to do is introduce some data-driven decisions here. So that's the front end of the application. The backend has, when we start the server, selecting those configurations, on the backend, we have OpenShift pods that gets spawned. So for each user, there'll be one pod that gets spawned with the memory and CPU that you selected. So this was a lot of information in a very short time. But the main point here is that we are working on the Operate First cluster and the Jupyter Hub application. And then we are trying to come up with these configuration tiers. So why we are working on this project and why do we care? So we had an incident on our cluster where we had a CPU pressure. So what that means is we had a large capacity of CPUs on the cluster, but still new users couldn't be on-voted because the existing users had requested all the CPU resources that were available. When we took a deeper look into it, we found that the actual CPU usage by existing users was way less than what they were requesting. So the existing choices that were presented to the users were something like one, two, four, or eight cores, and four being the most frequent choice. But it makes sense if you have physical systems that do a lot of things together. But if you're running a notebook, a lot of the times you don't need that much resources. So what we did was we tracked what the usage of users were. And using that, we came up with some smart choices for request and limit parameters. So before we formally define this problem, we have to first look at request and limit. These are two parameters that a pod specifies before it is spawned on the OpenShift. So request is the resource that is guaranteed to be available for the pod. So if it requests one CPU, then it will have one CPU forever. And limit means that resource that the pod may use subject to availability. So if the pod says it needs four CPUs as limit, then that means if the other three are available, then it will be allocated to that pod. So what we want to do is, given patterns of resource usage where resource can be CPU, memory, GPU, or PVC, we want to recommend requests and limits for configuration tiers. Configuration tiers were the small, medium, and large that are showed on the spawn page, such that the difference between cluster resources requested and used is minimum. So what are the pros of doing this? One of this thing here is that we are going to have less wastage of resources. So if the difference between what we're requesting and what we're using is minimum, then obviously we'll save resources. And if we save resources, we save money and energy. The other benefit here is the opposite case, where we are actually requesting less than what we actually need for our application. And in that case, we will see performance degradations and application throttles. So in that case, we would want to recommend requesting higher configurations. So this project can also find application in other services like SuperSet and Spark, and also some managed services like Roads. So now, diving into what we actually did to come up with the requests and limit, first of all, the data that we are looking at is telemetry data of AIOps data science team. So we tracked a group of data scientists for three months, and we got the resource usage for every second. And then for better computational analysis, we down sampled it to five minutes. So we have three months of this data set, and we use that for the analysis. So the basic idea here is that we want to group users based on the resource consumption. So what that means is, let's say we have 10 users on this application, and then three of them are power users on, in a way, have high CPU requirements. Three of them have medium CPU requirements, and four of them are just reading some notebooks and not really doing a lot of computation. So if we are able to automatically group these users, we can use that to come up with configuration tiers that can minimize the difference between the source requested and what we're using, which was our problem statement. So next, I'm going to go through the notebook that implements the solution. So this is the repository where all the code lives. It's AICOE AIOps and operate for Jupyter Hub Analysis. I'll have all the links towards the end of the presentation if you're interested. So I'll walk you through important parts of this notebook. If you're more involved, you can have a detailed look at the EDA and also play with the code yourself in the Jupyter Hub environment that I showed. So this notebook is for CPU analysis, but we also replicated this for memory and then aggregated the results to come up with the resource allocation policy. So in this notebook, the first important thing is to fetch the metrics or fetch the data of the data science team that we tracked. So we specified the Prometheus URL that helps us to collect that data and have the right authorization. Once we do that, we have three metrics that we're interested in, port CPU usage, port CPU request, and port CPU limit. And at the end, we have a data frame that looks something like this. Just going to zoom in so that it's more visible to you guys. So we have a data frame that looks like this. So this is a pod ID, and it corresponds to a user who's bonded. And then for each pod, we have these timestamps at five-minute interval for three months. And then for each timestamp, we have the usage, request, and limit values for that pod. So once we have this data frame, we can actually go ahead and visualize this and come to this graph. So what this is showing us is the time frame on the x-axis and the metric values on the y-axis, where the green line at the top here is limit, the orange line in the middle is request, and the blue line here is usage. So what this graph shows us is that the usage is around 0 to 1 CPUs on average. And the requested value is around 2 to 4, while the limit value is around 4 to 8. So we definitely know there's a scope of improvement here. Now, the next important thing here is the clustering algorithm itself. So before we jump into that, we have to look at the features that we used for clustering. So if you look at this data frame, we have pods here. And each pod or each user is represented using a vector of quantiles. So intuitively, what that means is, let's say this is pod B. And if you look at this value, so this says that pod B requires less than 0.0025 CPUs, 80% of the time. And if we read this number, it means that pod B requires less than 1.8647 cores, 100% of the time. So this is the maximum this pod ever requested in the duration of three months. So we represent all these users as a vector of these quantiles. Once we have that, we can come up to this picture here, which is a two-dimensional version of the vectors that we just formed. So what this image is showing is that each dot is a user. And the space between them tells us how they differ from each other in terms of their consumption. So we can run the clustering algorithm on top of this. In this case, we use the K-means algorithm. And it will find these colors. What these colors are showing is it's automatically finding the groups and coloring them. So we see this orange group here. We see this big chunk of group here, which has basically blue and yellow colors. And then another chunk here, which has green and red colors. We can see it as five different groups, or we can also club some groups together. So in this case, since these two groups are pretty together, we can club them. And then we can look at the consumption patterns of each group and see if they actually differ. So once we do that, we can see that this is the small group, this is the medium group, and this is the large group based on how they're consuming. And the next thing here is the resource allocation policy. This is where the cluster admin comes. So here we can define how we want to split between the request and the limit parameter. In this case, what I'm doing is I'm saying that whatever the pod is requesting 99% of the time, that should be covered in the request. While whatever they're requesting 100% of the time, that should be covered in the limit parameter. So what we see is that for the small tier or the small group, the request is 0.003, and the limit is 0.07. For the medium tier, we have request as 0.12 and limit as 3 CPUs. And for the large tier, we have request as 0.5 and limit as 9 CPUs. So if that wasn't clear, we can also look at the graph here. So what this is showing is the density of the CPU usage. And we see that we have a big spike here. And the black line shows the request parameter that we set. So that means 99% of the time, the workload will be given the CPU that it's requesting. And the limit is towards the 100%. This is for the small tier. And similarly, we have the graphs for medium and the large tier as well. So what's the impact of coming up with these tiers? And so we looked at the percentage of resources saved if we use these configurations. And we found that we saved about 91% in requests. So essentially, we are requesting a lot less, which correlates with the actual usage of the pods. And if we look at the overall utilization, which is the fraction of CPU that is actually utilized over what is requested, we see that there's a jump from 0.7% to 10.9%. So that's the conclusion of this notebook. We have got the configuration tiers that we were looking for, small, medium, and large. And we see the resource reduction or the pressure reduction that we were aiming for. So what actually happened with the analysis? So this recommendation of configuration tiers is actually integrated with the Jupyter Hub instance that we use now. So the container sizes, small, medium, and large that I showed in the beginning, are actually specified with the analysis that we did. And so this project is an example of data-driven decision making and also an example of AI ops where we help operations of application using AI tools. So the next steps here is to automate this approach to work with any downstream cluster or service. So right now, the operate-first cluster would act as an upstream project where we are using this tool to come up with the configuration tiers. But whichever cluster downstream or whichever service downstream wants to use this should be able to write. So we want to automate this approach and then also test this with other applications like Spark and SuperSight. That's most of it. There's also another concept that is similar, a vertical pod autoscaler. It is different from our project in the sense that it works on a single pod and it works on optimizing the resource usage for a single pod, whereas we are looking at the application level and all the users that are using the application and also plan to look at the cluster level. So different applications deployed on the same cluster. So I think that's my time. And I hope you guys found this useful. If there are any questions, I'd be happy to take them. Also, the project material is available here. We have the repository. We have the Jupyter experiments, the operate-first support repository, data science lack, and my email address as well. Cool, Shrey, thank you very much. For the audience, feel free to ask questions either on chat or on the Q&A channel. I don't see any questions for now. Let's give people a little more time. I personally found this presentation very interesting because we have something very similar to ourselves in OpenShift CI, where people are submitting us tests, basically what our CI test jobs with some requests and we build a similar system that watches their consumption over time. And we actually build a mutating admission webhook for we don't suggest people what kind of resources to consume. We overwrite whatever they're specified with the actual consumption. So I'm... So one of my questions would be, how do you deal with... What if something is eating resources like beyond your top tier or is your top tier unbounded? So the top tier that we define has a limit of what the maximum user ever requested in the three month period. But even after that, if your workload is something that is not covered by the standard tiers, you can always request for a custom resource. And we have on the operate for support repository, you can just create an issue and there'll be the cluster administrator making a custom resource for you. And based on the availability of CPUs or GPUs or whatever you're requesting, they can allocate that for you. So that will be an exception case, right? And for your point on just overwriting, the consumption things, I think that's also what the vertical pod autoscaler does. And it's troublesome for applications like Jupyter Hub because it has user pods. So if you override the consumption and kill the user port and then restart it, it might just stop whatever the computation that they're using. So AI can always recommend, it cannot make decisions for you as of yet. All right. All right, I don't see any other questions. If you want to reach out to Shrey on the work adventure after the session, feel free to do so. And thank you very much for the presentation, Shrey. It was very interesting.