 So much and welcome everybody and thanks for joining. So my name is Guy Salton. I'm a solution engineering lead at RunAI. And today we'll talk about Kubernetes for AI workloads and we'll see what works and what doesn't really work. And we even have a demo so we can run some jobs live. And so I hope that that will be cool. We created a public GitHub repository with the example that we're going to show in the demo. So we'll post it later on the chat. So let's begin. Let's start with covering what we're going to cover today is part of the agenda. We'll start by talking about the world of AI and how it's adopting containers. So I assume that most of you are familiar with containers and maybe mostly for deploying containers and microservices as part of application deployment. But then we'll talk about how the world of AI and deep learning is also adopting containers. And then we'll talk a bit about Kubernetes and how and if it can help us orchestrate containers for running AI workloads. And so we'll see what works and we'll also then see the limitations of Kubernetes and what doesn't work. So to begin with, I think maybe we'll just start with a quick poll. So I'll pop it up now. And I'd love if you can just get your familiarity with AI and deep learning to see how deep we should go. And I see, okay, I see a lot of people are pretty familiar, maybe not very familiar. Some of you, you know, heard about it, but maybe didn't dig too deep into it. So this is great. Thank you for participating. And great. So what we've seen in recent years is, and yeah, I think I'll end the poll now. So again, thank you for answering. Is that the AI ecosystem is adopting containers. It's built around containers. And for those of you who are familiar with AI and maybe even played around with it a bit, you might heard about some of these tools and frameworks that we show here on the right hand side, like TensorFlow and PyTorch and Keras and Jupyter. Maybe some other frameworks that are designed, especially for Kubernetes like Kubeflow and Argo. So these are now like, you know, the most popular tools in the world of AI and they're all built on containers. And even NVIDIA, and we'll talk a bit more as part of this session about NVIDIA GPUs that are highly and very popularly used for AI workloads. NVIDIA released their NGC, which is a container registry with pre-trained models, Docker images, Helm charts, things like that that are meant especially for AI experimentation on containers. So we see that, yeah, not only the world of software development and microservices deployment is using containers, but the AI is also adopting containers. Yeah, very much in the recent years. And as you probably know, and without lunch, another poll is that, yeah, one of the popular or maybe the de facto standard for container orchestration is Kubernetes. So, yeah, let's see how familiar you are with Kubernetes and launching another poll now. And here, yeah, we see that a lot of you are pretty familiar as was some of you are very familiar with Kubernetes. And that's great because we are going to show Kubernetes today in the demo as well. So for those of you who haven't heard about it or not know much, Kubernetes is an open source tool originally released by Google. And it's commonly used today in many, many companies for container orchestration, right? So companies that are developing software, developing applications and using containers, they usually do it with microservice architecture. Each microservice is deployed as a container, but they want to have things like availability and portability, being able to easily deploy these containers to different environments, maybe some on-premise, some in the cloud. They want to declaratively define the state of the application and they can easily compose that using YAML files. And they want to easily scale to support the load from users. So Kubernetes helps with that as well. So I think I recently read that CNCF said that them from companies that are using containers in production, more than 78% are already using Kubernetes. So Kubernetes is the fact of the new standard for container orchestration, again, designed mainly for microservices deployment. But today we're talking more about AI so AI is a bit different, right? AI, first of all, in terms of the accelerators that the AI workloads run on, you would see that it won't be only on classic CPUs. There are new accelerators that are more suitable for AI workloads, mainly GPUs, maybe also ASICs in the future, as well as that the whole development and execution of AI workloads is based on experimentation. And we'll talk about that. And this is different from just deploying microservices. So let's go and see it live a real example and we'll go into a demo and we'll even run some workloads today and train a deep learning model on Kubernetes. So this link is a public GitHub repository that contains a containerized deep learning model and we'll go and show it to you. So it's now also is available on the chat. You should be able to go and you can clone it. You can fork it if you want. And let's see what we have here in this repository. So this is a repository. It contains a deep learning application, which is again containerized. What does that mean? It means that we have a Docker file. And we can see in the Docker file that the base image of this Docker file is NVIDIA CUDA 10. CUDA is the framework that NVIDIA released for developing AI models especially for AI models and for running on GPUs. It's based on Ubuntu 18.04. And you can see that the data science framework that is used here is TensorFlow where we're going to install TensorFlow in our container as well as Keras, which is another framework. We're going to use Python 3.6. And yeah, we won't go over everything here but we're going to install some additional packages and dependencies that our model actually needs. And then we would go and run our model. So you see that there is some entry point script here that will run once the container is up. You can also see that we'll talk a bit about what does it mean to train a deep learning model. So a deep learning model at the end of the day is developed with code Python in this example. And once the model is already developed, we want to train it on a dataset and then help it learn and then be able to give us results. And then eventually it will be deployed to production. It will be able to serve requests from real users and give us some interesting analytics based on some data. So here we're going to copy a dataset called CIFAR10. And this is a very popular dataset for AI. It contains a lot of different images. So this will be the data that we're going to train our model on and we're then going to copy our Python file and run it. So we can even go pretty quickly through that. That won't be the main focus of the demo but here is the actual Python file. And you see that we're going to define here things like the image size, how many images we're going to use for this execution and the batch size. And of course these can be overridden to the containers as environment variables. But of course this repository is public so you can go and play around with it and have a deeper look at it. But yeah, exactly. This is what we have here. So we have our Python code defining the model. We're going to mount a dataset into it and we're going to run it all from a Docker file. And prior to this webinar I already built a Docker image from this Docker file and pushed it to a Docker registry that we have in Google Cloud. So we'll see once we go and want to run this model on Kubernetes I'll show you the actual image that was already built and pushed. So let's go and open the terminal and I'll show you what I prepared here and we'll also again run a live example. So here I prepared a few files and we're going to explain a bit about Kubernetes. I know that most of you are familiar in a way with Kubernetes which is great because we're going to talk about some basic concepts of Kubernetes. So first of all, let's say you are a DevOps engineer or MLops engineer and there's a research team in your organization and they want to run AI workloads. Maybe you already bought a few GPUs whether it's on-premise GPUs or cloud hosted GPUs and you want to help manage this operation and let the researchers in your organization run and train their models on GPUs. So what we have in here I have a Kubernetes cluster that is deployed on Google Cloud and I can show you if I run kubectl getpods. Sorry, getnodes. I'll see all of the nodes of my cluster. You know what, I'm actually looking at a different cluster. So let's change the context. We now should look at the relevant cluster. Sorry, did I do? Oh, right, no, I'm at the right cluster. And so I have my Kubernetes master node. I have another CPU node and I have two nodes that have GPUs. And just to show you, I'm going to use a different CLI than kubectl something that we developed internally in run AI called run AI top node. This will show us exactly how many GPUs I have on each of my nodes. So on the master node, I don't have any GPUs as well as on this node. I don't have any GPUs, but I have two nodes that do have GPUs. This worker GPU three and worker GPU four, each of them has two GPUs. Okay, so I want to use these nodes to run my containers on them and train my model. Now, let's say you have a team of researchers and you want to somehow let them fairly share the GPUs in the cluster. You don't want one researcher to use all the GPUs and then cause starvation where the other researchers won't be able to use any GPUs. So the way that you can do this in Kubernetes is you can create namespaces. Okay, so I already created some namespaces. So I can go and run the kubectl get an S sense for namespaces. And I have a bunch of namespaces here, but let's just focus on the two ones here in the bottom. One namespace is called team A. The other one is called team B. Let's say I want the researchers from different teams to use different namespaces. So researchers that belongs to team A, I would want them to run their jobs on the team A namespace. And the researchers that belong to team B, they will run their jobs on the team B namespace. And then you can also, if you just don't want, you want to make sure that researchers from team B don't run jobs on the team A namespace and then use GPUs that are not assigned to them, you can create a service account and roles and then role bindings in Kubernetes. This is, I won't go to this in the webinar today, but this is just to define your users in Kubernetes, which can be defined as something called a service account and then make sure that specific users only have access to a specific namespace. So users, researchers that belong to team A will only be able to run jobs on the team A namespace. They won't be able to do anything on other namespaces. So you can define this in Kubernetes and I can share maybe a link on how to set this up, but this doesn't really relate to AI. It's more like a general thing in Kubernetes. But also, even though we already created these namespaces, and let's say we gave some users access only to team A and then some other users access only to team B, we can then also use another thing in Kubernetes called resource quota. So a resource quota can be used to limit the amount of compute resources per namespace. So here, when we say compute resources, we will talk mainly about GPUs, but this can also be run memory, it can be CPUs. So I'll show you what we have here. If we look at the other YAML files that we have here, we have this GPU quota team A and GPU quota team B. If I go and yeah, I'm gonna show you these files. So this is defining something called a resource quota in Kubernetes. And like I said, it's going to, I'm gonna give it some name. This is just the name. It's the name of this resource quota entity. I call this resource quota GPU quota team A. It's going to limit the resources for the namespace team A. And then I chose which resources I would like to limit and what would be the limitation. So I said, okay, for team A namespace, I only want to allow using one GPU. Okay, I don't want to allow more than that. And then same thing for team B. So for team B, there would run on their team B namespace and they will also be limited to using one GPU. This way, there won't be any starvation across the teams, right? We won't be able to, one, like a researcher from team A won't be able to use all the GPUs from all of the cluster and then cause starvation to the other users. So to apply these resource quotas, I can then just use the kubectl apply command and then give this YAML file. And this would now apply and create this resource quota on the different namespaces. I can apply both YAML files. You know, then maybe I'll just do it full screen. That'll be easier. I see that it says that they are configured. We should now be able to see these. So if I go and run kubectl get resource quota on the namespace team A, okay? I will see that I now, I have a resource quota. The name of the resource quota is called GPU quota team A and it would limit this namespace to use up to one GPU. So I see that the limit is one GPU. Currently, zero GPUs are being used. Same thing. If I look at the resource quota entities in team B, I'll see a similar thing. So we created namespaces. We then provided resource quota to limit the amount of GPUs that can be used for each namespace. Now let's go and run a job. So, you know, we saw the GitHub repository and we saw the Docker file. We saw the, you know, the Python code and the data. We then built an image from this Docker file. So this image contains the code. It contains all of the packages, all of the frameworks, all of the dependencies, the data. Everything is inside and it even has an entry point instruction inside the image. So it knows what to do. Whenever like the container starts, it would just go and start, you know, a script that will install the dependencies and then run the Python code, which will actually train my deep learning model. And now I have this image. Let's see how I run this image on Kubernetes. So I have another YAML file in this folder, which I called AI train job. And let's go and see the content of this file and then we'll run a job. So you see, and I don't know how you guys are familiar with a job in Kubernetes, right? So this is, it's not a pod. It's not a deployment. It's a resource of Kubernetes called a job. And, you know, for microservices, development and deployment, you would usually deploy something called a deployment or, you know, that behind the scenes will deploy pods and replica sets. And because these pods are some things that, you know, they can just run in the background and serve requests. But here we don't want to do that. We want to define, you know, a job that will run to completion and unattended job that will run to completion. Why? Because, you know, we have our model, our deep learning model. We have the data. You know, once we run our Python code, it should start training the model and what this training activity will eventually be completed. It will finish training on all of the dataset. And then we want this job to terminate and free up the resources. So for this type of workload, a job is more suitable than a pod or a deployment. Okay. And now let's go and see what else we have here in this file. So I'm going to define, you know, give a name to this job. I call it job one. I choose which namespace this job needs to run on. And I chose a TV namespace. And then under spec, I can choose the image. So under the containers, I say I want to run this image. So this quick start image is the image that I built from, you know, the Docker file I showed you earlier and I pushed it to GCR, right? Google Container Registry. And this, by the way, is also public. You can also pull this image and try to run it yourselves. I can also paste it on the chat. And once I define the image, I can then ask for how many resources and which resources I want to allocate to this job, to this container. So here I said I want to use nvidia.com slash GPU. This is how you refer to GPUs in Kubernetes. And so unlike other compute resources, if some of you are familiar with, you know, allocating CPUs or memory to a job or a pod, with the other resources, you can set a request and you can set a limit. So a request would be, you know, how much I want to make sure that for sure, you know, my pod or my job gets this amount. So if I say let's say eight CPUs, then this would be the minimum. And I can also set a limit of like, let's say for like 16 CPUs, meaning, you know, don't run if you don't have eight CPUs available, but also make sure that you don't go over 16 CPUs. With GPUs, Kubernetes doesn't currently have this support. So there's no point in setting both limits and requests. So you can just set one of them and it would be the same. Like the limit would mean, you know, if I ask for one GPU, the job won't be able to run if I don't have a GPU available and it will never use more than one GPU. So it's both the request and the limit. I see some questions here. Let's see if I can answer them, you know, during the webinar. So regarding FPGA accelerators, we, especially like myself in run AI, we work with many enterprise customers and companies that are doing deep learning on Kubernetes. I don't think I've ever seen FPGA accelerators being used. It's currently mostly, mostly GPUs by Nvidia, but we are sure that there will be other accelerators, you know, Intel are working on their own AI chip and Google are working on their own called the TPUs. They're also another company in the UK called Graphcore. So these we believe will slowly enter the market, but currently it's mostly Nvidia GPUs. So yeah, I see that there is also a request here for the definition of service accounts and cluster roles and roles. I'll check and send these same materials later on. Another question from Asim. Since we already have a quota for the namespace, why do we need to put a resource limit on the job? So yeah, think about it this way. Here I set the resource quota as one GPU and in the job I'm asking for one GPU. So maybe you don't really see the point here, but potentially I can say, okay, the, I can say that the namespace would have a quota of two GPUs and then two users can each run their own job and ask for one GPU. But here I define the GPU for the number of GPUs for the job and in the resource quota, I limit the number of GPUs that would be available at any given time for a namespace. So it can be the same number or it can be that the resource quota on the namespace would be larger. But if this number is larger than the resource quota, then obviously you won't be able to run your job. Another question is, do we need to install NVIDIA drivers on the node before using GPUs? Correct. So there are a few things that you need to install before called the NVIDIA container toolkit as well as the NVIDIA drivers to make sure that you can run jobs on GPUs. And so you can look it up. We even have this document that didn't run AI documentation. I can share it. And this also answers the other question. Yeah, I see a question here about NVIDIA Docker. So this is also a very great command line to use. If you don't want to run on Kubernetes, you just want to run a container on GPUs. Instead of using the regular Docker CLI, you can use the NVIDIA Docker CLI, which is built for running on GPUs. I see also an interesting question here about using schedulers like SLARM instead of resource quota. This is interesting. For those of you who are familiar with SLARM, SLARM is an open source scheduler that was built a while ago for HPC world, for the high performance computing world before the people actually started talking about AI and deep learning. So SLARM is a scheduler with a lot of capabilities. It wasn't really designed for cloud-native technologies and Kubernetes and containers. It does support containers, but it does not support Kubernetes. So I would say it belonged a bit to the old world instead of the new one. And yeah, I think we have a lot of questions, but maybe we can continue with the demo and I'll get some of these at the end. So we saw that we have our job defined. It's going to run a container from this image, allocate one GPU on this namespace and then train my deep learning model. So how do I run my job? I can go and clear the screen. I'm going to run the kubectl, apply dash F to say the file, and then I'm going to say I want to apply this YAML file. So when I run this command, kubectl apply, this will then create the job resource on my run AI, oh, sorry, on my teammate namespace. And so I can go and then run kubectl get jobs dash N for namespace on team A. And I see that job one was created 14 seconds ago and it didn't complete yet. And this is expected because this job can run for a couple of hours. So it started, but it didn't complete. Now a job in Kubernetes behind the scenes also runs a pod. So a pod is like the smallest entity in Kubernetes that will go and actually run a container. So if we go and run kubectl get pods in the teammate namespace, sorry, we would see that we have one pod running. Okay, so this job job one created a pod for us and this pod is in running state and it's running for 47 seconds. And I can go now and run the kubectl logs. Oh, you know what? Let's first describe this pod. So we see what's happening here. So I can run describe pod and the pod name. And I can see that you see both the limit and the request are set to one GPU. I see that it successfully pulled the quick start image. It's going to schedule it on this worker node on this GPU 3D worker node and the container was created and it already started, right? So I can then go and we can now go and look at the logs of this pod. So let's just get the name of the pod again. Get pod because it has like a random name here. I can then run the kubectl logs and pod name then dash n team A and I can add the dash F for follow. This will follow the logs as they run. So this shows me that indeed my model is now being trained. So these things that you see here, the losses, this is what happens when you train a neural network, a deep learning model. It will go and try to guess what should be the correct answer for each of these images and then it will go and correct itself. So we should see that the accuracy is getting higher and at the end of the day, it has a lot of steps and different epochs and it will eventually just finish. Once it finishes training on all of the dataset. So this can now stay running and have a few cool things about jobs. We saw that we created a job and the job created a pod. Kubernetes for jobs does have some kind of auto recovery mechanism out of the box. So if I go and wait, let's go and see the pod again. If I go and I'll try to delete this pod and this is in the team, a namespace, this will delete the pod, but it will not delete the job. So the job will still stay up and it will see that the pod was deleted and it will then automatically create a new pod for me. So this is good because sometimes you would have, if you want to run on spot instances or maybe there is some reboot on the node or something happened, then Kubernetes will make sure to spin up a new pod instead. So it has this capability. The job is you can look at it as like the manager of the pods. And so we were able to run a job. We saw that it started training and it's used one GPU and you see if now we run the get pods again, we now have a different pod name. And this is now is running for 10 seconds. So this is the auto recovery that I told you about. If we want to completely delete this whole activity, we should delete the job. So we can go and run kubectl delete job, then it's called job one. And once I delete the job, indeed, if I look at the get pods, I won't see this one is terminating and then we won't see any new pods. And this will now be deleted. And if I look at the get jobs, yeah, I won't see a job anymore. And so we saw that it's possible, right? You can train a deep learning model on Kubernetes. You can use GPUs, right? And you can have a separation between teams. You can give different quotas to each namespace, assign different users to each namespace, and then make sure that you don't have starvation between the teams. So this is what works, right? Now let's see what doesn't really work. So, and somebody here mentioned Slarmy previously. So this can be relevant and interesting. So going back to where we are, we looked at the demo. So what's missing, right? We saw what's working, now what's missing. And so this is interesting. This is a graph that we got from a real customers of ours. It's an automotive company in the UK. And this shows the usage of GPUs for multiple data scientists in the organization across 24 days. And what you can see here is very interesting because you can see that user one for a few days didn't use any GPUs. Then suddenly he wanted to use about 16 or 17 GPUs. Then the day after he used like five or six, then he used like two for the rest of the time period. While user two started with zero, then it used about 17 or 18, then again went completely to like two or one GPUs. And you see that the patterns are very different. Like this guy didn't run anything for like three weeks and only then he started using, but using much more GPUs, right? Using like 22 GPUs. So the point here is that with Kubernetes, it only provides a way to give static allocations of GPUs. You can set a static limit for the namespace of using two GPUs or three GPUs or whatever. And then each user can then, when he defines the job, ask for how many GPUs he wants. But this is not good enough because these static allocations does not really make the researchers productive because let's say one day, we have one user that wants to use more GPUs and the other user is not currently using any. Why not give this user all of the GPUs in the cluster? It's just a shame that they're sitting idle because first of all, they're very expensive. Second of all, he has a model that he wants to train and he can't. He can't get more than what was assigned to him, right? And just to see that indeed this is clear, if we go and now let's go and edit this job YAML again, and let's see what happens if we try to run on two GPUs because currently, we deleted the job, no jobs are running on the cluster and we have two GPUs in each node. So I want this guy from TMA to run a job on two GPUs. So I can go and run the kubectl apply on this AI train job. And the job was created, right? So if I run the kubectl get jobs, I should see it here. Okay. And it didn't complete yet, obviously. But now let's see the pods in this namespace. And we see that there are no pods, right? And let's see why. Let's describe this job one. I'm going to describe the job, job and then job one. And I will see here in the events, you see that the container didn't start. Why? Because we exceeded the quota, right? We asked for two GPUs. However, we are limited to using only one for this namespace. And this is a shame because there are free GPUs in the cluster. But the configuration in Kubernetes does not let me use more than what I were originally assigned with. It won't look and see that currently there are more idle GPUs in the cluster. So we would just let you use more. So this is one of the main limitations. But let's go back to where was that? Okay. Too much to the slide. So we saw that the patterns are very different. And like you understand by now, Kubernetes is the de facto standard for container orchestration. But it lacks a lot of capabilities for AI and deep learning. For example, automatic queuing and dequeuing, managing multiple queues, some fairness scheduling algorithms, setting priorities and policies, preemption of jobs. So what does these things mean? It means that ideally what we would want to achieve is to set priorities for different users based on who we think is more important. And we want to make sure he gets more GPUs. But then let's say if the cluster is currently empty and even if somebody has a lower priority, why not just let him use all of the GPUs? But if suddenly somebody with a higher priority goes and tries to use and allocate GPUs, then we will see that this guy has a higher priority and we would like our scheduler to automatically preempt a job of the guy that has a lower priority and allow the new job of the guy that has a higher priority to run instead and use GPUs. We want something more intelligent that will help us utilize these very expensive resources better. And these things does not come with the default Kubernetes scheduler. So for a bit deeper into Kubernetes, Kubernetes comes with a built-in component called CubeScheduler. And this is the scheduler that is in charge of scheduling pods on nodes. So this scheduler, based on the definition of the pod or a job, it would see if the job is requesting, let's say one GPU. It will look for a node that has a free GPU and then will schedule it on it. But it does not have this automatic queuing and dequeuing. It cannot manage multiple queues. It does not provide you a fairness algorithm to make sure that the researchers fairly use the GPUs in the cluster. So just to dive a bit deeper into this and explain Kubernetes, again, it was not designed for this. It was not built for running deep learning train jobs. It was built for deploying microservices. So it's just not what it was meant for. However, containers obviously provide a large advantage and obviously is very good also for AI and deep learning. And we saw that all the AI and deep learning popular frameworks are all adopting containers and it's great. Like researchers like using containers, it helps them easily run their jobs and make sure that they will run exactly the same way on their local environment, on maybe some cloud cluster, some on-premise cluster. So it has this portability. It's very easy and quick to spin them up. So they want to use containers and Kubernetes is the best container orchestration, but again, it was not designed for this. So what does it mean scale out or not scale up? So scale out, it means that Kubernetes was designed for deploying microservices and making sure that these services stay up and alive and they can easily scale and they can then support requests of users. But it was not, it does not scale up, scale up systems. They enable workloads that require a lot of resources and sometimes distributed resources to coexist and efficiently, some of them might want to use fewer resources, some of them might want to use a bigger group of resources and it does not build for this. So to go a bit into detail, what's missing and something called beanpacking and consolidation. So Kubernetes doesn't currently take care of that, but we would want to make sure, because we know that some researchers might run very large job that will ask for a lot of resources and some other researchers might run smaller jobs that will ask for less resources. We would want some beanpacking algorithm to make sure that, let's say I have a node with two GPUs two nodes with two GPUs each, like we had in our case and somebody asks for one GPU and then another guy asks for one GPU. It's better to schedule both of these jobs on the same node to free up the other node. And then if suddenly comes up a larger job that asks for two GPUs, it will be able to run where if you don't have any beanpacking and Kubernetes would just do this randomly, it can schedule the first job on the first node and the other job on the other node and then we won't have two GPUs available. We only have one GPU available here and another one here. So this has been packing. Another thing is, this also relates to backfield scheduling also elasticity. So sometimes as researchers or as data scientists, if you have more resources, a lot of more GPUs available, that's great. Maybe your model can run faster and it will finish faster. But you don't want your job to just wait in the queue until 10 GPUs are freed up. You want to say, okay, if currently there are only five GPUs available, no problem, start my job on five GPUs and let it expand and shrink based on the availability in the cluster. Kubernetes does not provide this elasticity. Another thing is gang scheduling. So sometimes because we might have multiple nodes and we don't have and we have a limited number of GPUs on each node, let's say we have two nodes with two GPUs each and somebody wants to run a job with that ask for four GPUs, then you can just run one container and use four GPUs because each node has only two. But there is a concept called gang scheduling or distributed training and that again, Kubernetes does not really support out of the box. But ideally we would like Kubernetes to help us schedule multiple containers, each container on each node and then each container will utilize GPUs on the node that it's running on. And we would want Kubernetes to make sure that these containers are starting together then ending together. We want to aggregate the logs and results. So these are things that are required for deep learning but are not available with Kubernetes. And the second challenge is scheduling for batch jobs. And so again, batch jobs is the concept of like we talked about before, we want a job to run to completion. We don't want to run a deployment or a pod that will just run forever in the background. So yeah, Kubernetes does have the job resource and we used it and it's nice. It has this auto recovery mechanism and it does free up the resources upon completion but it does not allow you to queue and launch when resources become available in an intelligent way and manage multiple queues. And you can set fair share policies and priorities and decide which jobs we wait in the queues and which job we start instead. And lastly, there is another concept called topology awareness and this can also affect the performance of our jobs. We talked about GPUs but obviously these nodes they have both GPUs and CPUs. And for deep learning some workloads and even in the same workload, at the beginning you might need to do some data processing or something that does not require GPU. This will run on CPU. Then for the training itself, it will run faster if you have a GPU. And there are some internal topologies within a node also in a cluster and like with internal communication. And if your scheduler is aware of these topologies it can make sure to schedule your jobs considering this topology and then make sure that the performance would be best. So these are all concepts that they came from the HPC world high-performance computing. We also need them in AI and deep learning and Kubernetes does not provide all of these things out of the box. What we did and I didn't talk about run AI but run AI we actually built a platform for managing AI workloads on GPUs on Kubernetes. So we actually we built our own Kubernetes scheduler. So we said you have the kube scheduler which is the default scheduler of Kubernetes which is what you have in default with every Kubernetes cluster when you install. We created our own Kubernetes scheduler that can be installed next to the default scheduler and it provides these HPC concepts to Kubernetes. So managing multiple queues, managing priorities, topology awareness, preemption, fairness algorithms all of these things specifically for deep learning and AI we provide them on top of Kubernetes. So if you're interested of course you can go and check this out and read a bit more about run AI. But I hope that this was helpful and you have some better understanding on what you can do with the default scheduler of Kubernetes and what you can't. Let's see, I have some other questions here that we didn't get to yet. So there's a question about what's the difference between run AI community version and enterprise version. We don't have a community version. Run AI is a commercial platform and it does not have a free version. However, if this sounds interesting for you, Gian, then we can reach out and we can discuss and tailor a plan for your needs. Another question from an anonymous attendee is the entire GPU assigned to a pod for the duration of the pod's runtime? What if your job does not need a whole GPU? This is actually a very good question. So this is also something that we didn't discuss but Kubernetes does not allow you to fraction a GPU. So with CPUs you can ask for less than one CPU. But with GPUs in Kubernetes, you can only ask for whole GPUs. And for people who are familiar with GPUs, GPUs are sometimes very, very, very powerful. And sometimes you would have specific workloads that don't really need all of this compute power and all of the GPU memory. So with default Kubernetes, you can only allocate the full GPU to a workload and then your workload might utilize only 10% of this GPU or 15% and that's it. And it's really a shame. With Run AI, we, by the way, do provide a way to fractionalize GPUs and then run jobs on fractions of GPUs. And this can also help you get better utilization. Another question here from Mohamed, do you know any container for CUDA 11? So definitely there's a lot of containers for CUDA 11 and you can go maybe, yeah, we have a few more minutes. I'll just show you if you go, I recommend either go to NVIDIA NGC. Yeah. And or just go to Docker Hub. And then you can look for, let's say you want to use TensorFlow or you want to use PyTorch. You can just search for the image that you want to use. Let's say I look at TensorFlow and then at the tags, you would see the version of TensorFlow and this correlates with some version of CUDA. So I don't know if it would say it here. It will say it in Docker Hub for sure. So if you go to Docker Hub and you search for say TensorFlow, then you will see that if I look at the tags, where is that? Yeah, it should say that, yeah, and you see some of these images are built in with support for GPUs. Some of these already have Python 3 on them. It should also, yeah, of course. But anyways, yeah, you should have different versions of CUDA. You can just search for NVIDIA CUDA and then here in the tags, you would see CUDA 11.4, CUDA 10.2. So you can use any version that you want. You can start with this as a base image. If you remember, we looked at our Docker file. So here I'm using CUDA 10.0, but you can just use the base image as NVIDIA CUDA 11 or 11.3 or 11.4. And then in going install any of your frameworks and dependencies. How about using Kubernetes operators? So this is also something that we actually do in run AI. Our installation is using an operator, but we created some CRDs, so custom resource definitions for Kubernetes. So for example, in the demo today, we saw the default Kubernetes job, which is available for any installation of Kubernetes. We created a new resource, a custom resource definition called run AI job that provides additional capabilities for, I don't know if people here are familiar with it, but for something called HPO, hyperparameter optimization, as well as distributed training. So this is also using operators and CRDs. And yeah, I think we answered the questions. If anybody has any additional questions, I see that the YouTube channel for the Linux Foundation was also posted on the Q&A. You should be able to go and grab it. And I pasted the GitHub repository so you can go ahead and clone it or fork it and play around with it. And yeah, if there are any other questions, feel free. And this is the time. If not, then I guess we can wrap up. Okay, great. Thank you so much, Guy, for your time today. And thank you to everyone for joining us. As a reminder, as Guy just said, this recording will be on the Linux Foundation's YouTube page later today. So you can check back if you want to review it or send it out to other folks. So we hope that you will join us for future webinars. Thank you so much again and have a wonderful day. Thank you guys for joining.