 Hello and welcome to KubeCon CloudNativeCon EU 2021 virtual. This is resource requests and limits under the hood, the journey of a pod spec. I'm Kasslyn Fields and I'm a developer advocate at Google. I'm also a cloud native computing foundation ambassador and a member of the Kubernetes special interest group for contributor experience. I love to explain fun tech concepts with analogies and illustrations. So I do that on my website at kasslyn.rocks and you'll see a lot of that in today's presentation. You can always find me on Twitter at Kasslyn Fields. And I am Kohei Ota. I'm an architect at Hewlett-Bakan Enterprise. I'm also a CNCF ambassador and in the communities community, I'm the owner of SickDocs Japanese translation. So if you ever speak Japanese and wants to contribute to the KubeCon project, your pro requests are always welcome. I go by inductor on Twitter and GitHub. So that's us and let's get going. Today we're gonna use a fun analogy to help these concepts really stick and help you remember them. So to start off our analogy for our talk today, I want you to think about your app being like a dog. And in this case, your dog is going to be going into the doggy daycare. I've used this analogy before as a way to help explain container derivatives that will come into play here a little bit later. But the main idea I want you to get here is that your application when you put it into Kubernetes for Kubernetes to run is going to be running alongside lots of other applications being run in containers. And our goal here is to make sure that all of our applications have everything that they need to be able to run successfully. However, apps don't always play nicely with each other. So resource requests and limits can help us make sure that they have everything that they need without getting in each other's way. And since we're talking about Kubernetes, Kubernetes is container orchestration which means containers and applications at scale. So we're gonna be talking about our doggy daycare and all of the many people and pieces that make up our doggy daycare and how Kubernetes takes care of our apps, essentially. So let's dive into resource requests and limits. First, here's a look at an ordinary pod spec. You may have seen one of these before. Here is our app. Our dog is gonna be within the image for our container that we're actually gonna be running within our pod. And you can see below our container image, we have defined a resource request and a resource limit. So we're gonna start off talking about the request. And you can see here, we have a memory request defined of 64 megabytes and a CPU request of 250M there. So let's look at what happens when we actually send this pod spec to Kubernetes to run. First thing I want you to think about is that requests are used by Kubernetes for planning. So in the doggy daycare analogy, this might be kind of like planning out how much space the doggy daycare is gonna need in order to take care of your precious app. So let's go through the whole journey that that request goes through when it goes into Kubernetes. So you as the developer, you have your app that you love so very much and you know exactly what it needs and you want to make sure that the doggy daycare is gonna do its very best job taking care of it. So you want to put your app in a pod in the doggy daycare. So Kubernetes says, okay, we can help take care of your app. Our developer says that this app will need two CPU and four gigabytes of memory in order to run properly. Our doggy daycare, of course, knows how to take care of apps and can make sure that that happens. So it makes a note of the app's needs to make sure that they'll be met once the application is actually running in Kubernetes. And it does that by recording to at CD. At CD is the key value store component used by Kubernetes. It functions as a single source of truth for the entire cluster. So any important key notes of how to take care of your app need to be reflected in at CD. So the next thing that needs to happen is we need to figure out which node there is infrastructure underlying this Kubernetes cluster. So which actual compute node is this application going to run on? So that goes to the Kubernetes scheduler. The scheduler says, oh, I see a new pod is coming in and that pod requires two CPU and four gigabytes of memory. I'll take that into account and make sure that I put it on a node that can handle that. So the scheduler is a Kubernetes component that evaluates nodes to assign a pod to. Resource request is one of the parameters that the scheduler takes into account when it's ranking which nodes could possibly hold this pod. So now we've gone through the scheduler. We have figured out which node we're gonna place that pod on. So now we need to talk to the Kubelet which is the Kubernetes control plane component which is running on the node. And we kind of hand off our pod spec to that Kubelet and say, hey, Kubelet go make this pod real, make it a thing, run the application. So that's what our Kubelet does. So all in all, this is kind of what the journey of our request looked like. A new request came into our API server on the control plane and that passed the request off to the scheduler saying, hey, there's a new pod that needs to be assigned to a node. The scheduler figured out which node to assign that to. And then that got passed on to the Kubelet which actually took care of running our pod. Here's a little bit more information. Kohei is gonna tell you about requests. Thank you, Kasselin. Okay, so let's talk about what Kubes scheduler does internally. So Kubes scheduler evaluates node resource usage and it knows capacity based on the node metrics. If a node does not have enough resources, scheduler will not assign a pod with the request value. You can see the node resource information by executing kubectl describe node command if you ever encounter a problem with the pod assignment. So here's the summary of pod requests. Request values are used when Kubernetes creates a pod. Kubes scheduler uses those parameters to evaluate nodes to match the pod requirement. We will explain deeply later, but CPU request is also an important value to limit CPU resource in case the CPU usage is 100%. If a pod uses more resources than the request value and node resources are being used up, that means there's potential of pod eviction. Your pods can be removed from your node. If you want to control the priority, you should understand about QoS class in Kubernetes. So QoS class is used when a pod is being created and evicted. Every pod has a QoS class based on the request and limit values you set. There are three types. If you set the limit and request for all containers in the pod, then limit and request values are equal. Kubernetes sets the QoS class as guaranteed, which is the top priority. If you do not set any resource request and limit, Kubernetes labels the pod as best effort QoS class automatically. This is the least priority. So based on the priority, Kubernetes decides which pod should be removed first when a node resource is about to be exhausted. So now we're gonna start talking more about limits. This is a look at our pod spec from earlier. And you can see that we're now talking about the limits section of our pod spec where we have defined a limit on our memory and a limit on our CPU. Main thing I want you to know about limits is that limits are for enforcing the rules. We want to say that this pod cannot go over this limit of resources. So let's take a look at what happens when a limit comes into Kubernetes. The journey that it goes through to actually happen. So once again, we've got our developer with his precious app who says, I wanna make sure that my pod doesn't consume more than two CPU and four gigabytes of memory. Kubernetes says, okay, we can limit your pod's resource usage. So that gets passed off to the scheduler. Once again, it still needs to be assigned to a compute node. So this new pod needs to be limited to two CPU and four gigabytes. The scheduler is gonna make sure that the node knows about that limit so that it can enforce it. So the scheduler has picked a node to run the pod on that's passed on to the node. And the node sees that there is a limit on the pod. And the Kubelet, the control plane component of Kubernetes that's running on the node actually needs help from another system component in order to do this. So the Kubelet talks to the container runtime that's actually gonna run your containers for your pod. And it says, I have a pod coming in that needs its resources limited. Our container runtime has the tools to actually limit the resources available to that container. So it says, okay, I can use C groups to make that happen. We'll dive into that a little bit more deeply here in a second. So at a high level, I want you to remember that requests are for planning and limits are for enforcing. So now let's dive a little bit deeper into limits and C groups I was just talking about. So let's summarize what resource limits does. So resource limit is used to limit resources on a pod. So if a pod tries to use more resources than the value, you will encounter some issues. If a container process tries to use over this limit of the assigned CPU, you will have the CPU throttling problem. This will not immediately delete the pod. However, the process gets stuck and the program has to wait until the CPU resource is ready. If a container tries to use more memory than the limit, it simply gets out of memory error and the container will be killed by the Oomkiller. Kubernetes and the container technologies use the Linux kernel feature called C groups to implement the container resource limitation. We will explain more about this later. Let's see how resource limit value is being used in Kubernetes. Once pod spec is registered to Kubernetes, Kube scheduler fetches the new pod specs, then assign a node to the pod you want to create. Kube scheduler has the node information in SD, but the limit value is not directly used at this moment yet. KubeNet on each node runs the sync process to fetch the latest information of the pod that are assigned. KubeNet sees the limit value from pod spec, then converts the CPU cores value into CFS period and quarter milliseconds. Then, KubeNet calls container runtime interface to create actual containers on the Linux host. Once container runtime gets the entry from KubeNet, it executes the container creation by calling C groups. Now, let's deep dive into the three parts here, CFS period and quarter, container runtimes and C groups. When we talk about containers, it is actually implemented with Linux kernel feature that is called C groups on namespaces. Namespace is used to isolated process on Linux host and C groups is used to limit resources. This time we talk about resource management on Kubernetes, so we are not going to talk about namespaces. So just like this Kube diagram, dogs, as in your pods, will try to fight over resources like toys and food. So we can solve that by allocating toys and food to each dog. Let's see how Kubernetes does in the technical aspect. C groups is a Linux kernel feature and you can limit resources such as CPU, memory, network bandwidth, or you can even combine those. CPU request value that you set in Kubernetes will be converted to CPU shares in C groups world. CPU limit will be converted from CPU core numbers to CPU time value of CFS, then stored in C groups at CFS period and quota. You can also see those C groups value for your running pods on each node by seeing files under slash sys slash FS slash C groups directory. Memory limit is also stored in C groups. One difference between CPU and memory here is that only CPU has the request value that is stored in C groups. So that also means that memory, it does not have request value in the C groups. I was really confused when I noticed this for the first time. And here is the reason. CPU share is a relative value. If there's four existing containers with one and a half, half, one and one CPU limit on each, they will share the whole allocatable CPU on the node. When CPU has enough space and there's no CPU limit, CPU resource can be over committed. But when CPU is busy, the CPU request value will be used to keep the application have enough CPU resource for themselves. If another application with two CPU requests is assigned to this node, Kubernetes recalculate this percentage and reassign all the allocated resources for when it's busy. Now let's learn about CFS. CFS is a process scheduler in Linux. It's the default Linux process scheduler. Container isolation is based on C groups, but C groups actually uses CFS to implement CPU resource limitation. CFS has each cycle to process data and its default cycle period is every 100 milliseconds. That also means an application can be assigned to one CPU core every 100 millisecond and CFS handles it. If you set 500 M, which means a half core on Kubernetes for CPU limits on your pod spec, CFS quota becomes 50 milliseconds, which means it can use 50 milliseconds out of 100 milliseconds of CPU time every period. If you set 2000 M, which is to core, a CFS quota becomes 200 milliseconds, which means you can use two cycles for the process. So what if you don't set any CPU limit? In that case, CFS quota becomes minus one, which means unlimited. And at last, let's look at container runtimes. Kubernetes has two types of container runtimes and they are called CRI and OCR runtimes. You may remember the Docker deprecation use. It was about CRI runtimes, but not OCR runtimes. When Kubernetes tries to create a container, Cubelet executes create container GRPC call towards CRI runtime. When CRI runtime gets the create request, it executes OCR runtime binary with the JSON file that includes all the necessary information when creating a container, such as C-group limit parameters and so on. Then finally, OCR runtime is QC groups API inside. Now you have a new container running. We are not going to any deeper about runtimes here, but you can think of CRI runtime as a runtime that runs at a Kubernetes level and OCR runtime as a runtime that runs at Linux kernel level. If you are curious about container runtimes under the hood, I highly recommend checking out the maintenance tracks about CRIO and Continuity, which are also CNCF projects. So to close this out, I want to leave you with a tool to answer a question that you might have been asking yourself, which is, so now I know what requests and limits are and I know why they're important and how they work, but how do I set the right requests and limits for my applications? For this, I want to introduce you to pod autoscaling. So Kubernetes has two types of autoscalers available. The first is horizontal pod autoscaling, which means creating more or fewer replicas of your pods to handle traffic. So if you have a big spike in traffic, Kubernetes can use a horizontal pod autoscaler to spin up more replicas of your pod to handle that traffic. If you're kind of in a lull of traffic and you don't need so many pods, it can also scale down so that you don't have as many pods running. And then there's also this vertical pod autoscaler, which is about changing the size of your pods. When I first saw this, I was a little bit confused because what does it really mean to change the size of a pod? And in Kubernetes, what that means is giving you recommendations for how to set your requests and limits for that pod. So there are three modes for a vertical pod autoscaler in Kubernetes and those modes are off, initial and auto. In off mode, the vertical pod autoscaler will monitor what's happening with your pod and how many resources it's using. And it'll make recommendations for how you should set your requests and limits for CPU and memory. But it won't implement those for you. You would have to go in and check what those recommendations are and then implement them yourself. Initial mode means that the vertical pod autoscaler will kind of watch what's going on with your pods for a while and get some recommendations on what you should set those requests and limits to and then it'll implement those for you once on those pods. But then anytime after that, if you ever wanna change the requests and limits of your pod, if it needs to change size, then you'll have to go in and check what the vertical pod autoscaler is recommending and then implement those manually. Then auto mode, the vertical pod autoscaler will continuously monitor your pods and try to understand what they're doing, make its recommendations and then implement those recommendations for you, which means that it will be continuously creating and deleting your pods. When it makes a new recommendation, it will spin down the existing pods and spin up new pods of the new size. So that can be a bit disruptive and you have to make sure that you're managing that disruption, pod disruption budgets and the like. But that is also an option for vertical pod autoscalers. So the way vertical pod autoscaler recommendations work is they have a few different types of recommendations they give, target, lower bound, upper bound and uncapped target. So the target is the value that the vertical pod autoscaler will use when resizing the pod. The lower bound is the lower bound number that vertical pod autoscaler looks at for triggering of resize. If your pod utilization goes below that, then vertical pod autoscaler will delete the pod and scale it down. Your upper bound is the number that vertical pod autoscaler looks at for triggering of resize in the upward direction. If your pod utilization goes above that, then BPA will delete the pod and scale it up. If you're in auto mode, of course, this is saying. And then uncapped target. If no minimum or maximum capacity is assigned to the vertical pod autoscaler, then the uncapped target will be the target utilization for the vertical pod autoscaler. So let's review. Congratulations. You've just been through the container request and limit bootcamp on Kubernetes. So let's see what we learned today. So pod spec is registered once you apply your pod manifest into Kubernetes through Kube API server. Then give scheduler fetches newly-registered pods from SID and assign a node to each pod referring to resource request. And KubeNet fetches assigned pod in every sync period and calculate this between running containers and pod spec. Then KubeNet calls create container GRPC towards CR runtime after converting CPU cores into periods. Then CR runtime executes OCI runtime binary to create a container with OCI spec JSON. OCI runtime manages C groups file system so it can create and delete Azure containers. Then finally, we also learned about pod autoscaling and especially vertical pod autoscaler is not simply a resource that you can auto-scale your applications but also provide recommendations for your applications resource requests and limits. So thanks for joining us today to learn about pod requests and limits. I hope you learned a lot of fun new stuff. If you have any questions for us, please post them in the Q&A. Hopefully you've been doing that this whole time. And if you have any questions after this, do feel free to reach out to me on Twitter at Caslin Fields and. Inductor. Again, thank you for joining us today. Thank you so much. Bye bye. Have a great day.