 This is Peter Pan from .cloud. Today, we will together catch, unlock, and master the resource speaker either. May I introduce my cost speaker? In Chinese religion, his dragon's fifth son, his lamb is taught here. He's very greedy. He will eat whatever he can see and he can touch. It's just like a container, which is too much resource and exhaust the whole system. How could it happen? In our mind, the war between the container are well-protected, well-isolated, with the protection of the land space, sea groups, and those mechanisms. But actually, there is still climbing in the war. So the part will share some specific resource with the containers on the same host and compete with each other. This is where the monster can break out. All right, now I take the site descriptor as an example. Sometimes, you can see error. Too many open files in the system. Maybe in the port of the web server or DB. So below it shows the story. The small white dots represent the file can be opened on the system. The maximum value is defined by a kernel parameter, the fsmax. All right, so a port, the yellow pigment. The yellow part will open many files. And the open files reach the limit. So the other part, the green pigment, has to quiet because it sees too many open files. Okay, what a pity. So besides the FD, there are some other stuff sharing among ports, like the PID, the FS notify, the CPU time slide, and for the network area, the connection track, the RP table, and DNS request. So time is short. You can refer to my uploaded first slide deck for more detail and for each of them. And as well as the thread source code. All right, now come back. How to fix it? Could we just increase? That makes my value of the kernel parameter. Not actually, it's a double each swore. So here are some tips. Okay, for the maximum value on a host, just leave it alone. Unless some operating system put it in a safe, a very small value. And the second, for each port, we can configure its default value by the container D or Kubernetes configurations. Next one, if you remember, the 110 makes part in the same node in a node. So if you take this into consideration, the port value should be more conservative. Tips number four, if a port, they require large kernel parameters, he or she will take the reason possibility to do it in its part, either by the container or do it in a YAML. If you forget all about, you still have the last gatekeeper. The next one, Kubernetes can still do the system level reservation. And the Kubernetes will do the evasion with the under pressure. Okay, so next I will introduce some interesting things. We all know about the first part. Okay, but if they are all run on banks, it means that everyone hit the upper limit, what will happen? So it comes to memory, the performance will not even hung. So we recommend to use VPA to find a reasonable request value. Or for some old kernel, the OAM killing will be slow, so you should make it happen faster before too late. For GPU, VGPU can enable over commit, but be careful, when run on banks, the GPU memory will be swept to a slower host memory. Okay, oh, a Qcat. The black cat is hooking over the whole space. It's just like IO bandwidth contention. So for the disk, when the disk bandwidth running out, best things will happen, right? Like a Kubernetes will be known ready. Thanks to Kubernetes 1.29, introduce the volume of the middle class, so it can name it the PVC, IOPS and throughput. For network, the AI training, the network will be a burst to exchanging some gradients. And the neighborhood work load, the latency will be impeded. So we can leverage the CNI traffic shaping to relieve it. All right, DNS, if a port requests too many DNS requests, it will become a DDoS to call DNS, it will be crash or OAM. So the solution, either to use the local DNS as the low-level caching, or use the Istio sidecar as the port-level caching. All right, sorry, here's some ways to tune the color parameters in different level. Port to do system control in its YAML using the security contacts. Second, do the hosted-level system control by kubespray or kubespin during the provisioning. Or you can choose the operator to do it. All right, to fight with the monster, they are rotating up. The color parameter defines the system-level total, and the C group provides container-level isolation and limit for those shared resources. Then we can leverage the underlying power and configure the limit from the upper layer. Last, don't forget to monitor and alert before to lay. All right, thanks to all above, the monster has been sealed and becoming a good pet. So welcome to my first slides and a sample call for more detail. And this is the monster hunter from the cloud. Thank you.