 Hello everyone. It's great to be here, and I hope you enjoy the conference. So my name is Kevin Wan, a little bit of my background. I started contributing to Kubernetes back to 2015, and the first ever I worked on is actually scheduling, so I'm really the scheduling guy to give this talk. As you know that today there are a lot of workloads that need more GPU, and especially the increasing usage of AI workloads, especially there are more and more Chinese workloads running on top of Kubernetes. And we know that actually the GPU price is kind of very high, and which leads resulting the very high training cost. And also we know that the GPU because they are very expensive, right? We are always trying to improve the efficiency, the utility, no matter from the time perspective or from the allocation perspective. And also because some of the process perspective issue, actually the procurement of the hardware is kind of taking a long time. So it's also kind of a problem to really provide the GPU resources for the people to use. So I would say for me, to my perspective, the challenge of GPU utilization, mainly coming three ways. One is about the location of the GPU resources, because you might have some of the GPU resources in the on-prem data center, but you also need some from the, for example, public cloud, because when it goes into the cloud bursting scenario. And also in some of the organization there are GPU resources owned by the different team, whether it's a kind of one infra team with multiple application team, business team, or directly managed by different team. Sometimes when one of the application gets very high pressure, how to kind of borrow the resources from the other team is really a problem. And also there are kind of, you know, because the hardware, also the driver software, the CUDA, they are keep upgrading. So you might have different types of GPU and also the corresponding CUDA in your environment. So then it all becomes a problem. How should we develop a platform to deal with that? And especially when we are moving to a multi-cluster architecture, multi-cloud architecture, how to efficiently manage all the resources and how to, you know, let the application, the business team help themselves. It's really very important. Yeah, so I summarized the sum of the capabilities. I think it's very important to an ideal platform. It's not all of the problem, but I think they are very important. So first of all, it's about actually we need a kind of unified abstraction for defining or for indicating the workloads on top of a multi-cluster, multi-cloud architecture. This is kind of very basic way help you to describe your requirements about your workload, how many resources and as well as if there's any requirement about the location, the topology requirements, sort of things. And also when you're scheduling GPU, there are kind of more things need to think about. We know that in the single cluster scheduling, for example, GaN scheduling become a very basic feature, right? But moving from a multi-cluster level, you know, how to achieve GaN scheduling on multi-cluster, it's a little bit more complicated. And also from a time perspective, we know that if you are located part of the resource, but the whole job still need to wait all the pods, all the instances to be ready then load the data and start training. So from the time perspective, it's kind of better you have some Q mechanism to make sure you don't waste the GPU time. And also ensuring the cluster failover to support the workload migration between clusters is very important. And also we move to the multi-cluster or multi-cloud architecture. The one-level scheduling is kind of not very easy to implement because you are caching too much data of the whole environment. But when you choose the two levels, scheduling the consistency and the efficiency become a very big problem. And also today a lot of people are exploring the implementation of the GPU sharing, right? This is also a very interesting topic. So an overview of the architecture looks like this. So we have multiple Kubernetes clusters, no matter it's in a data center or it's in a public cloud, no matter it has a floating IP address or not. With Kamada on top of it, it's quite easy to unify the management and also it will capture the cluster status as well as the available resources automatically. And with that also it's kind of quite easy to enable some of the single cluster compatible APIs on top of the multi-cluster layer. It's also one of the key features provided by Kamada. And in the in-cluster scheduling actually a volcano today already do a very good job to help you schedule the AI machine learning big data workloads together with the long-running workloads. So it's actually quite straightforward when you are moving from no matter single cluster or multiple kind of individual cluster architecture to a unified multi-cluster architecture with volcano and Kamada. And also actually we are kind of relying on Prometheus to collect the real-time usage data to help scheduling decision-making. This is kind of also really reduced the waste of the resources especially when people don't set very accurate resource requirements. So let's move into a little bit more details about how this whole thing can work. For the job abstraction actually we need to deal with several things. For example a typical AI workload always consists of different components or rows. Like the TensorFlow they have the parameter server, the worker and also PyTorch. They all have different pod definition for each component. And also from the resource allocation part the kind of scheduling and as well as the other resource scheduling requirement become very important. And also we know that inside one of the for example the training workload the communication between the different components is quite heavy. So we'd better schedule them together from for example the network perspective or from the physical location perspective to make sure they don't waste too much time waiting the request. So that resulting in a very unified abstraction to deal with that and especially you know that today there are a lot of people already implemented some part of that on top of single cluster architecture. So we think that if we can move very straightforward it would be very helpful. We don't want to ask people to change anything when you migrate from single cluster to multi cluster. So today in the single cluster environment for example we already have the volcano job definition. It's kind of the unified API abstract to help you create or define a TensorFlow training job or PyTorch or the other kind of job. And with Kamada it's quite easy to enable it in the multi cluster layer level because Kamada supports the definition agnostic CRD among the multi cluster architecture. And also for the gun scheduling within this architecture is also actually quite clear quite easy to implement. So the way we implement is make sure every workload is scheduled to one cluster because when you schedule it into multiple parts to multiple scheduler then the scheduler inside each cluster is not quite easy to collaborate with each other. So if you can schedule everything into one of the scheduler things become very easy. And the way we do inside the Kamada layer is that we actually count the resource requirement for the whole workload as well as each part. And then inside Kamada there's a component called a scheduler meter. It will help you calculate the actual replicas of the workload able to run in the cluster. So in this example you can see like cluster one they have exactly 10 replicas that they were able to run according to the requirement but the cluster two and the cluster three they can only run part of them. So that's kind of quite easy and when it goes into the certain cluster there are still some of the race condition might occur. So Kamada made the decision at the certain point of the time but in the next cycle the HPA controller in the cluster scaled out some of the other workloads then it becomes a kind of conflict. You may not have enough resource to run it. So the in cluster gun scheduling is still kind of guaranteed by the volcano. So this is actually a little bit more about the features of Kamada. So Kamada actually automatically clad the resource usage of the cluster. It's not just about the summary of the whole cluster but also the resource profile like how many nodes have different available resources and also it helps automatically to capture the cluster healthy status to make sure every decision made is kind of able to run the workload. And also from the multi cluster layer besides the resource usage we also support the topology requirement and helps users to kind of achieve for example the zone level availability requirement as well as the other topology level thing. And also in the other case there are requirements about the preference of different cluster or different data center. For example one of the users they always prefer to use the on-prem resources and then if there's no enough resource go to the public cloud. So with Kamada there's a feature called the cluster group. It means that you can basically schedule among different cluster group in a time order. You can always try schedule to the on-prem clusters first then try out the second group. That helps users to easily achieve the on-prem preferred scheduling among different clusters, different cloud environments. And also we know that today there are kind of a high chance to meet some failover thing, the disaster thing. For example if one of the cluster made some problem we need to efficiently to detect the disaster and schedule all the things to the other cluster. So with Kamada, Kamada provides the different layer of the status management of the cluster so it's quite easy to detect whether it's kind of for example the DNS down or some of the hardware down issue and make sure you can schedule all the things to the other cluster. And also in some case you know that the application may be kind of potentially down for some specific reason. It's not possible to recover inside that cluster. So Kamada also help provide the capability to define and detect that status and help you to migrate just some of the applications. And for the scheduling thing actually you know that in a multi cluster architecture there are a lot of information in the cluster status. So in Kamada in the whole architecture we actually implement in a kind of different layer to achieve that. For example Kamada provide the node summary. It's kind of a very general summary of the whole resources inside of one cluster. It's quite easy to help the Kamada scheduler make a quick decision among different clusters which one is the best to go. But the accuracy is kind of not very high because we lost some of the details about the resources. So especially when the resources is kind of very fragmented in the cluster it becomes a problem. So the resource model is kind of more detailed way to simulate the cluster resource status. It will let users to define different degrees for the resources. And also it's able to take a different type of resources into consideration. And Kamada will use this resource information, the resource modeling information to help make the decision. But still it's kind of more detailed but it's a summary. So it's kind of in the middle of the accuracy and also the efficiency of the scheduling. And also with the more powerful mechanism is the scheduler estimator. It's actually kind of part of the single cluster scheduler running on top of the multi cluster layer. So when you got a workload each of the estimator instance will kind of simulate a link in cluster scheduling and return the result to check which cluster is able to run the whole replicas of the workload. And definitely it requires more resources in the control player. But as I mentioned earlier that there are still some risk conditions may occur. So we have some rebalancing mechanism enforced by actually the Kamada D scheduler. The key challenging thing is that we still prefer the cluster to restart the workload by itself. And how to detect the middle status that it's not going to be fixed inside one cluster and we need to kind of migrate workloads across cluster. So Kamada D scheduler provides different input requirements, input rules for users to define the trigger status and then it goes to evade the replicas from one of the clusters and then the Kamada scheduler will retry scheduling. Okay, from the GPU visualization part actually today GPU sharing is a very hard topic and actually it's already supporting the scheduling level in the control plan and it has been quite a long time. The key challenge is the isolation of the GPU memory as well as the computing power. So currently there are multiple ways to achieve that. For example, someone may prefer to implement at CUDA level and the others try to implement at the driver level and there are also some of the implementations try from a MIG way, the multi instance group. And actually these three ways have different pros and cons. In volcano we provided the solution at CUDA level which is much easier to implement and easier to maintain as a community implementation. Because of the time is limited, so that's all about my talk and if you like to dive into more details there's another talk will be delivered by my colleague. It's on the Friday afternoon. We will introduce more details about this whole architecture. Okay. Thank you for listening. Any questions? Yes, there's a microphone. This is Abhishek from IB Research. So I have a question. A workload consists of CID and the workload itself. So what does this whole mechanism do to transport the CID to the target cluster? So the question is the CID support on top of this architecture, right? Yeah, I mean what if one of the clusters does not have a TensorFlow CID installed? Yeah, actually this is kind of a requirement. You need to install the CRD in the underlying member cluster. Yeah, and then actually so Kamada helps you to propagate the custom resource to the underlying cluster. But actually you can also use Kamada to install the CRD. Yeah. Hey, great talk. Thanks so much. Question regarding volcano, is there any other scheduler that you think is like, you know, why did you pick volcano versus other schedulers? Yeah, so you know that in the early days the Kubernetes scheduler is more supporting like the micro services. And also you know that actually by the, there is a kind of implicit mechanism that the smaller part always succeed in scheduling when the cluster is lack of resource, right? It's about the fair sharing. And also in the early days Kubernetes don't have the ground scheduler mechanism and don't have Q mechanism for the batching the workloads. So that's why we started the volcano project. So actually volcano project comes from the sub project on the Kubernetes called the Kube batch. Kube batch is just a batch scheduler on top of Kubernetes. But we also need some controller mechanism to help implement the queue. For example, you can define the queue and bind the queue to some certain set of the resources inside the cluster. So that's why we have volcano. Yeah. Oh, sorry. So volcano is a project from your team? Actually it's a study from my team, but today it's a CNCF incubation level project and it's already, you know, maintained by maintainers from different organizations. Thank you. Last question. Hi, hello. How would you deal with like very big workloads? Let's say you have like today LLMs that need multiple GPUs. Do you think you can divide them between two cluster like not gang scheduling but schedule the jobs for two different clusters? That that is also we are kind of exploring. I think it depends on two part of the things. First is the how is the workload really able to be divided into different parts and kind of that's a very basic part, right? If you are not able to divide it, we're not able to resolve it, right? And a second part is actually from the underlying hardware level. Actually, there are some of the hardware company. They are exploring the path to kind of combine multiple, for example, multiple GPUs or the other type of MQs to work together to act like a large GPU. So I think these two work together can help resolve this problem. Okay. Thank you all for listening. If you have any questions, I will sit back.