 is Lubik, it's a new open source project. And the second is QoS Guaranteed for containers. Okay, let's continue. And before my sharing, I want to show you with Openura and my shelf because I think they are very important. That's my background. Okay, the first one, Openura. It is an innovative open source OS and covers all scenarios. That's our vision. And it has many, you know, innovations on this kernel and it is all the infrastructure. Currently it's incubating and operating by the OpenUrton Foundation. That's Openura. It's an OS. And the second, my shelf, Jin Xiaoluo. Currently the project leader of Lubik project. And my focus is on container infrastructure and in Linux. So we have a connection on OS. So I will try to explain this topic, CPU utilization from the perspective of OS. Okay, why, why, why, why the resource utilization in data center needs to be improvement? As you know, our data centers are going rapidly. That's why we have many data centers building every year. Then you know that the CPU allocation are reserved for the customer very high in the data center. But if you check on the monitor system, you'll find that the actual CPU utilization is low. Maybe 10% or 20% on average. So in industry, we are trying many ways to improve the CPU utilization. Okay. Currently in the container world, we are trying this co-located deployment to improve. Before that, we are in a siloed architecture. That means we have many different types of workloads, AI application, big data application that are deployed in different resource pools with different software stacks. Okay. In this years, we have cooperatives. Okay. Most of our applications have already containerized and can be scheduled by cooperatives. We have a unified cloud infrastructure this years. Okay. That's the basis of our today's topic, cooperatives. We have a unified color infrastructure. Okay. And as this has to be, we define different stages for this evolution stages. The first one, level zero independent deployment, that's the SES stage at the last page. And the level one to level four. There are different stages as these two be. Okay. That's something different between those levels. Okay. The first one, level one, we define as shared deployment and level two is hybrid deployment. The difference between level one and level two are, we are already in the same software stack and share the same resource pools. But the service type in the resource pool may be different. Okay. On level one, maybe we only use the software and the service type only the only one. Okay. And next, we in the hybrid development, we have many multiple service types. And level three. Okay. And level two, we are selected the different kinds of service types in the same cluster. But at level three, we hold that all workloads are deployed freely from the OS perspective. We hold this, it is in the black box stage, which means that we at the OS side, we have to detect which kind of workloads it is as a black box. Okay. Level four, we hold that it is in the whole multi-cloud deployment. We have a technology to schedule all kinds of workloads in all clouds to improve the whole CPU utilization. That's our different stages. Okay. And the functions need for the OS, I have listed many. Okay. Those led parts are for operating system. We have multi-level resource isolation from level one to level two. And from level two to level three, they are QS scheduling or quantification in the fair control or to level four, the most built-in capability. Okay. There's some key functions and I will try to explain or introduce functions to you on today. Okay. We have these functions, and we have this architecture on the other part, upper part. They are cooperatives in this cluster. We open URL as an work node is located in lower part. It is built on Linux kernel as I mentioned before. Then the Lubic, the Lubic is learned as a step in this cooperatives cluster. It provides the different kinds of works in here. That's our architecture. You can see that in this timeline, we have finished all of them and we will try to build more on this next year. Okay. That's our architecture. Then from level one to level two, as I mentioned before, we are trying to build this multi-level resource isolation. Okay. Why this it is not needed? Once we build this hybrid deployment for containers, they must base on the consumption that our workloads will be deployed with multi-level priorities. Okay. We have different priorities for those workloads. Okay. To simplify our modeling today, we call it, we divide it to two. One is a service and the other time is of live service. The online service, they are sensitive to QS and latency. Maybe they are for short-time learning and the offline service, they are off-site. Okay. So there are some difference on these services. So we divide the different kinds of resources in the isolation communities for those workloads. As you can see in this table, we have CPU QS, memory QS, cache QS, network QS, and this QS. Okay. We have built such technologies and they are all already open source in open era. Okay. I won't share all of them today, but try to introduce three of them. They are our elect features. The first one, as I mentioned before, we have online and offline workloads. And normally the online service needs more CPU resources and in our Linux kernel, we are using this CFS scheduling algorithm. And CFS is a fair scheduling. Okay. We cannot have a balance on online service and offline service. So we add this multilevel split-empty scheduling to CFS. So once the online service wants the CPU resources, we can suppress the offline tasks to obtain the CPU resources for online service. Okay. This is the first feature. And the second one is multilevel load balancing scheduling, load balancing. Okay. And it is because once this online service is running, we may have many idle CPU calls. At that time, we can load balancing those offline workloads to the other CPU calls to decrease the number of tasks, context switches and regulations. Okay. That's the first and second features. They are both on the basis of hybrid on online service and offline service. But what about if we have a hybrid deployed with online service and online service? Okay. Based on the qualities of online service, it will normally apply for sufficient resources. Okay. That means we will have low utilization on CPU and most of time. So to build this dynamic core affinity feature to try to overcome this CPU resources in C Group. Okay. That's the features level one to level two. And welcome to level three. Okay. Level two. As I said before, we are using the customer selected workloads to deploy in the same cluster carefully. Carefully. Okay. We have such mission that we try to, we hold the customer can hybrid workloads freely as a black box from our OS point of view. So build such method of life changing to get a model on such specific online workload. And we can use these models for our prediction on live network. Okay. For online changing, we try to run this online workload in the test map. We have different kinds of magic collection from Linus Kernel or the hardware. And we run this online workload in high pressure. And then at the same time, we still bring different kinds of pressure to these online workload and capture all kinds of interference is suffered. At that time, after the testing, we analyze these interference and the curious of these online workload, we can get a model. Okay. Then after many, many testings, we have these different kinds of modeling for our later in online prediction. After that, yeah, we have this library as I saw here. And then we have hybrid deployment with online services and offline services. Then we have this logic learning as a demo. We try to collect all kinds of matches in the background and that's many kinds of quantification for it's curious and analysis interference for online services all the time. Once it has got any kinds of interference, you try to locate this source of interference. Normally they are from offline service. So if we located those interference from offline service, we can try to control it by everything those offline services to another cluster or another node, okay? Or suppress it for this kind of CPU resources. That's online prediction. And level four, remember that? We are in level three, we are deployed in one cluster or one cloud. But at level four, we hold that, we can use multi-cloud or multi-data centers. So the customer may face one scenario. He has a pod request as CPU ports and CPU cores. Boom. He may get different computing capabilities on a different cause, which increases his maximum costs because he has to maintain different configurations in different cause, okay? Why it is happened? At least the possible reasons, maybe it is by different generations of processors or different architectures of processors, or maybe they are using the same processor from different configurations, such as HT or frequency, or different kinds of frequency. So we propose this idea, can we unify the computing capability for this processor or for this physical server? So in our Linux kernel, in OpenULA, we try to normalize computing capability and in Lubica, we try to conversion this capability to a unified NCU that's normalized computing unit for the EC management of this cluster for scheduler. That's all key functions we define for this utilization. For the EC using, we recommend that you use Lubica. It is a bridge connects our cluster Kubernetes and OpenULA. It runs as a demo set in Kubernetes cluster, okay? In the work knob, you can click experience with this arm. First, you fetch it's YAML and then you apply it on the knob. Then you can enjoy it. For more details, please visit our code rep, okay? This is our first case. We already use our technologies in our internal cloud platform. In the sample cluster, we have already reached 70% CPU utilization, okay? This is case one and case two. Our community friend, Sina Weibo, they already use our features in their lift network in production. They already reached 60% more CPU utilization, okay? This is our cases. And what's more, we recommend that you use volcano and Kamada for all cluster features. As you know, for this CPU utilization, we are talking the whole data center or multi-cale. But on OpenULA and Lubica, we are basis on the only specific knob, okay? If you are going to the cluster or the multi-cale requirement, we recommend that you use volcano and Kamada, okay? Welcome to jazz. We are learning to cloud native tick and it's less in OpenULA. And it's not the, it's a lightweight container engine we develop in OpenULA. And welcome to engage us. Thank you. That's all for today. Yeah, yep, yep, yep, yep. A bunch of latency also on network, even if it's not really sleeping, right? But how does it work? Yeah, yep, yep. This is a good question. Okay, the first thing, we are still trying to develop this feature, maybe on the late of this year or next year, okay? And what we're trying to do is actually, we are only focusing on the CPU resources now. So the first thing, we are trying to learn a microbenchmark on Linus kernel or on the Linus. We try to test the capability of this CPU, okay? We got a score for this CPU, then we unified in Lubbock. After that, we can get a result, maybe from zero to 100. It's a score for this CPU then proposed to the custom scheduler. Okay, at that time, it may introduce a confuse to the customer because normally for our daily use, it configs different with CPUs for our APPs, okay? Or for our container port, maybe 10 CPUs. If with this one, maybe there's a number, but not the CPU calls you configure. But with this number, maybe you can get the same computing capabilities in different calls. That's our reason. Any more questions? Yes, we are trying to define a small benchmark to test the capability of the capability of physical CPU. Yeah, yeah, yeah. Sorry, I didn't. One second. Do you mean to say this one? Yes. I didn't prepare the PPT for this feature today. Okay, we're trying to use the EDT and EBPF in Linux and try to, how to say, just if EDT and EBPF has a small config in the user space to control the network bandwidth. Okay. I think it's hard to describe it. Maybe I can show you something later. Okay, we have, this also is still based on the OS profile view. In other cases, it's different cases of resources such as CPU cache and memory, but different kinds of workflow, they may, maybe I can take a sample of NGIS, okay? NGIS, it is, most of the time, it is only using the CPU, okay? So you have a specific model, it has a high CPU usage, okay? But it didn't sensitive to other resources. That's kind of CPU sensitive or CPU intensive workflow, that's one kind of model. And such as we have many, we can define different kinds of models, maybe network intensive model of such as, thank you. Thanks again for your talk.