 Hello everyone and welcome to this session of the DevCon 2022. I'm Roberto Carotela and specialist solutions architect from Red Hat. And I am Herman Montalvo. I am an architect in portfolio field enablement also in Red Hat. And we want to talk about predictive after-scaling patterns in communities. But first, we want to talk about a common problem, or issue that we are seeing in a lot of customers and developers team. So determining the right size for your workload and estimation this size and the evolution of the resource that your business application requires is challenging. Along with the different observability tools that you can have and the development teams can have, it could still be difficult to currently estimate the application resources needs. Because this mining, on the other hand, the best resource consumption of the application discretion for defining the right and correct capacity planning of your Kubernetes clusters. And for talking about the resource consumption, we need to first review two different aspects of Kubernetes that are very, very important, that are request and limits. At its core, the Kubernetes schedule is built around the concept of managing CPU and memory resources at the container level. So every node that you can have here, for example, is assigned an amount of a schedule memory and CPU. Each container has a choice. So the request that we can see here specifies the minimum value you will be granted. And also the request value is also used by the scheduler to assign ports to nodes. So a node is considered available if it can satisfy the request of all of the containers in a port. And you need to remember, so in a nutshell, the request are used to ensure which node is the best fit for the given workload, evaluated in the scheduler in time. So the request are evaluated at the scheduler in time. On the other hand, the limit values that we can see here specifies the maximum value you can consume. So in a nutshell, limit are used to limit the maximum amount of resources that a container can use at runtime. And also, when we are talking about the request and limit and how we can define it, we need to talk about also the importance of all of the that you can have and you can assign to your different ports. So why should you care about this request and limits? Because Kubernetes defines a number of different quality of service types based on how requests and limits are specified. We have three. One tick, the stable and the separate. One tick is when the request and limits are equal. So these containers never get killed based on this. On the other hand, we have the stable. They're curious that it's a container with a request and value less than its limit that could or not to be defined. We'll be killed after best effort containers when research are limited in the users. Exist the request value. Also, we'll be killed if the warranty curious container need to run in this content. And finally, the best effort. This is when we are not setting request at limits at all. And the best effort containers are the first to get killed when the resources are limited. And what happened if I said wrong or incorrect request and limits definition? So if I said wrong or incorrect request and limits or the limits and request, for example, that the application requires changes, I said statically, but we have not adapted these requirements in the given the deployments of the definition. We can face or we will face this possible issues. For example, out of memory or home killed containers are home killed because they are not allowed to use more memory than its memory limit. If a container allocates more memory than its limits, the container becomes a candidate for termination. If the container continues to consume more memory beyond its limits, this container will be terminated and restarted by the Q-flip affecting our application. On the other hand, we can face also for workload performance if these containers consume more memory and CPU than expected and have wrong request and limits can suffer for application performance for your end users. And also can produce a wrong resource allocation because if you do not have proper estimation and definition of the amount of resources defining the request and limits, this could be affecting also the capacity planning of the different clusters, making for example, wrong resource usage and not following the best practices that you need to follow for resource and management allocation of your workloads. So in a nutshell, your SRVs are not happy at all in this situation because it's giving them more burden and extra burden in their background, more things to take care and also can impact the different SLAs and SLOs of your applications for your end users affecting your end user experience. So what is a solution or a possible solution? Entering the Vertical Port of the Scala. Vertical Port of the Scala or BPA frees the users from the necessity to setting up today resource limit and request for the containers in the post. So it will automatically set the request based on the usage and the metrics defining the proper thresholds that will allow proper scheduling onto your different nodes. It will also maintain ratios between limits and requests that were specified in the initial containers configuration and will adapt the containers definition predicting the resource consumption along the different time of the life cycle of the contents. So when we can use BPA, in which cases we can benefit from the BPA. From the developer perspective, you can use the BPA to help ensure that your ports stay up during periods of high demand. Imagine that you are in a Black Friday period or you are having a period that your different critical applications or business applications are receiving more and more load. You can use this BPA to ensure that your ports stay up during these periods of high demand. And scheduling ports onto the different nodes that have the appropriate resources for each port. And also help to estimate the resource that your application will require during the life cycle. On the other hand, for administrators, you can use the BPA for better utilize the cluster users, such as for example preventing ports from preserving more CPUs resources than needed. The BPA monitors the resources and that workloads are actually using, adjusting the resource requirements so the capacity is available for your other workloads, for your other applications. And always it's acting the same mode. BPA have the automatic mode? No. We have three different modes in BPA. We have auto, initial and off. Automatic mode is to automatically apply this recommended resource on ports associated with the console. So BPA terminates the existing ports and generates new ports with the recommended resource and limits automatically. On the other hand, we have also initial that it's more or less the same as the automatic, but only it applies the new workloads created. Never update the ports that are running. And finally, you have the off mode that it can be used and can be useful only for generating resource recommendation for the ports, but never updating, restarting or applying new ports when they are created. And now we can see an awesome demo that is presented by my colleague Herman. Cool. Thank you so much for your presentation, Roberto. It's really awesome. So now for the demo type, we're going to jump into two examples. The first one deploying an application without BPA and how the same application is going to behave in an environment with a vertical port autoscaling. So let me share with you my terminal here. Okay, and let's jump into the demo. So first of all, we're going to deploy in applications in an environment without BPA. So in this case, we're going to create an MSPACE, no BPA. We are going to ensure that we don't have any limit range, which is actually we don't have any one because it is a newly created name space. And then we are going to deploy an example application. This is a very simplified controller, which is actually a deployment with the required labels and the name of stress. And then we have two interesting blocks here. The memory request is going to set for 100 and the memory limits is being set to 200. While this container is going to use up to 250 megabytes, which is above the threshold setting by the memory limits. So if you take a look at the pod status, you'll see that the pod is being restarted or is being shut down, is moving into the grass loop back off. If we look for that reason inside the pod, we are seeing that the application is reporting a known kill back to the cubelet. So that means that the application is saying, hey, I'm not allowed to use more memory that is specified by the limitations, which is the decided, which is actually the decided behavior. If we try to do the same in an environment with BPA, in this case we are going to do mostly the same. We're going to create the test BPA DEF COM 22 namespace, and then we're going to delete any existing limit range. So in this case, OC delete limit range, and we are seeing that we don't have any resource available. And then we are going to define their requests of 100 megabytes and the limits of 200 megabytes. So the application will use 150, which is actually between the minimum and the maximum, the request and the limits. So the application must stay up because the usage of the memory is inside the accepted value. So as you can see here, we have the limits and the request, and we have 100 and 200 while the application is still using 150. So if we list all the parts, we are going to see that the pot is running because we don't have any reason to get that poll kills. So we are going to ensure that the request and the limits are as expected. So it is 100 for the memory request and 200 for the memory limits. And then VPA in this environment, we use the metrics that comes from the observability tools. Remember that the VPA needs to predict what is going to happen when an application is trying to consume more resources that it is specified into the memory limits or the CPU or the CPU limits. So if an application tried to use more resources than it is assigned or that is limited, by default, it's going to get killed because we don't want to accept more resources. With the usage of the vertical pot at the scalar, we are going to use the observability tools, the observability metrics to get access to the real consumption that we already have for that specific application. So in this case, we are going to use OCADM top pot with the namespace testvpa.devconf. As long as we have a single pot, we are going to see the metrics that this pot it is using. In this case, 1200 cores and 100 megabytes that this application is using. So we have to create a VPA, the vertical pot at the scalar. So first converse, the VPA is only available if you already installed the vertical pot at the scalar operator into your cluster. So that means that the VPA is actually a custom resource defined by the custom resource definition, which is only available through the usage of the VPA operator, which is available into the operator hat. So as you can see, it's very similar, for example, to the horizontal pot at the scalar because we have a target ref, which is in this case, which object is going to be affected by the VPA. In this case, the deployment with the name stress that belongs to the apps version one, the core apps. And then we have some extra specific block, which is the container policy. So I'm affecting to all containers inside the controller affected by this VPA. And then we have the mean allower and the max allower. So that means that we can use VPA not to tell the application, just assign as much resources as you need. We are going to tell, hey, you can leverage your application using the VPA, but you are not losing the control because you are setting the margin in the same way that you would do with the cluster of the scalar. With the cluster of the scalar, you are with the with the vertical part of the scalar, sorry, you are using the minimum and the maximum. In this case, we are using a minimum of 100 millicores of CPU and a maximum of 1000 and the minimum memory is going to be 50 megabytes and the maximum is going to be one gigabyte of resources. So the resources, the limits and the request is going to move between these values. So you will never lose the control of the resources consumed by your application. So if we check the VPA that we just created, we have one which is the stress VPA, we have set the mode of auto. So going back to the slide that we just took, we are appending the best of the recommended values for the resources. And in this case, we have some columns, which is the CPU, the memory that we already have and if it is provided or not. All of that information can also be shown into using the JSON path, which is actually the preferred way. This is where the VPA is going to get the values. What it is important to mark here, it is the target section. The target section is going to tell the VPA how much resources that my application is really using at that specific time. So in this case, these values for the memory and one for the CPU. So if we get the limits for the memory, in this case for our application, we are going to see how we have a memory limit of 200 megabytes. So we are going to simulate how the application, what happens if the application is start consuming more than what you have in limited. So in this case, we are going to set or we're going to patch the application to use 250 megabytes using the stress binary. So we patched the deployment and we are going to change the value of 200 to 250 megabytes. If this happens on a non-VPA environment, then that means your pods will be killed. But in an environment with the vertical port auto-scaler, it will take the VPA values, the metrics value and the recommended values to adapt their resource to the newly values. So if we take a look at the vertical port auto-scaler, if you remember that we have set in the maximum values. So if your application is start to using two gigabytes, then the VPA is going to tell, okay, just send the termination for that pod because you are using much more that I am accepting to adapt for your deployment. But in this case, we only need to increase this value in 50. So the VPA is going to look for the recommended values and then it's going to notice that you need to increase the value. So if you describe the pod, you will see the VPA observer container stress and the VPA updates. Pod resources update by stress VPA, container zero, CPU requests. And in the name of space, this VPA, they've come 22. So if we describe the pod, the new pod that we created, the one that is using the 250 megabytes. We are going to see how the limits has been set to this number of bytes, which is actually above but not that much above of the value of the real value of the real memory usage that our application is using at that specific moment. So this is how can we leverage our application using the vertical pod autoscaling. Just to recap a little things, we can use the VPA to add flexibility to our application. That doesn't mean that we are not losing the control about what the application is doing. That means that we are controlling the margin. We are not setting an static value of my application have to use 500 megabytes of usage at all times because sometimes maybe you don't need that much. And then you can use a better usage of the resources that is available on your cluster. That means that from now in advance, we are going to give a margin of where the application is working. So you're going to use at least this amount of resources and to match to these values. And then that flexibility will give you to do a better usage of your cluster. So that's all about. I have to tell about this demo. I hope you enjoyed and that's all. Thank you so much. Thanks for this awesome demo, German. Also, you can check the demo and run in your own Kubernetes cluster. Everything is in this GitHub repo. Also, we will share the video of the session and the slides. And now we can have the Q&A.