 All right, thank you very much. Hello, everyone. Thank you for being here today. I'm sorry. I said hi. Hi. Hi. Hello, thank you for being here today at this session about KMM, your Swiss Army Knife for Canon modules on Kubernetes. My name is Quentin. I work for Red Hat, and unfortunately, my co-speaker Hirsch Pathak cannot be here today. He wasn't able to travel. But I will cover his section. And indeed, let's get into it. So today, we're going to talk about Canon modules, specifically on Kubernetes. So we'll start with an introduction about what are Canon modules and why we do need them. We are going to talk about the pain points that we might encounter and how we can solve them with the KMM operator. We'll deep dive into the use case that we're going to look at today, which is enabling inter-GPUs in Kubernetes with KMM. And finally, we will have a great demo that will be all live, so that will hopefully work, where we generate images from text using a GPU in Kubernetes that we enable with KMM. So pretty exciting. At the end, we will have some time, hopefully, if I'm not too slow for Q&A. Right, you might start by wondering what are Canon modules and why we do need them in Kubernetes. We're all familiar with what Kubernetes is. If you have big clusters of machines and you have distributed workloads, then I guess you know what Kubernetes is. But what if we have those kind of devices? So what if we have GPUs, some kind of storage solutions in Kubernetes? What if you're using special NICs, special AI accelerators, or maybe distributed file systems, such as Luster? The common point with all these devices and solutions is that they all need a driver. And in the Linux world, a driver is powered by a Canon module. So what is a Canon module, really? It's a piece of C code that extends the functionality of the kernel without having to reboot the system. So the point of a Canon module is that it is modular and that you can load it and unload it at will. The most common use cases for Canon modules are indeed hardware drivers, virtual file systems. And you can also add additional system calls to your system with Canon modules, in fact. What makes them a bit tricky to use is that they have to be built against a very specific kernel version, technically an ABI. But in most cases, you need to rebuild your Canon module each time your kernel changes. And oh, sorry about that. I have no idea what that is. OK. It's stubborn here, isn't it? All good, all good. And yeah, another feature of Canon modules is that they need to be signed for secure boot systems, which means that you have a list of public keys in your system. And your system, if it is secure boot enabled, will only accept kernel modules that are signed with a private key that matches the database of public keys that you have on your BIOS, on your UEFI. All good, so yeah. Basically, for some drivers and some file systems, we do need drivers. Ideally, those drivers should be present in your kernel and should be contributed upstream. The problem is that it's actually quite long for hardware vendor or for a file system writer to actually contribute that Canon module into upstream. You will go back and forth on the reviews. You have certain standards to match whenever you're contributing code into the kernel. So it ends up being a lengthy process. It's not only about contributing that code upstream. We also need to resort to auto-stricken and module sometimes to do A-B testing or to enable the latest and greatest hardware on systems. So upstream is the best solution, but it takes time. Another pain points that another issue that we might run into when using kernel modules is that they're really built against a very specific kernel version. So if any symbol in the kernel changes in the ABI changes, then we have to rebuild the entire kernel modules. And even though we are guaranteed a stable kernel, then we may at some point hit a CVE that will force us to change the ABI. And the last item I have here is those kernel modules not being part of your distribution or not being part of the upstream kernel, there is an issue of deploying actually those kernel modules on the nodes and keeping those kernel modules up to date with the kernels that we're running. In Kubernetes, that is usually difficult because that means that you have to customize your nodes before they go into operation. So the real question is here, what if there was a cloud native way of bringing those kernel modules on the nodes that are powering our Kubernetes cluster? Instead of relying on autobend management for those kernel modules, we would like to have something that is well integrated into Kubernetes and that allows us to manage the lifecycle of those Kmods. And for that, we actually developed KMM kernel module management. So what is the kernel module management operator? It's something that we wrote to bring a standard consumption model for kernel modules on Kubernetes. Instead of everybody, every hardware vendor out there, every CSI writer reinventing the wheel over and over to deploy those kernel modules, we thought that we should have a common solution here. Kernel module management is a signal project. We work with the signal in Kubernetes. It's an operator that builds, signs, and loads your kernel modules on your Kubernetes nodes. So what the operator does is that it monitors all the kernels running in your cluster and loads the right kernel module versions on the right nodes. It doesn't only load your kernel modules, it can also run your device plug-in. And if you're a hardware vendor, that's, in general, very useful. So how do we make those kernel modules, which are really KO files, simple binary files? How do we make them cloud-native? And how do we integrate them into the workflow of Kubernetes? Well, we actually wrap them inside container images. So within a standard container image, we actually put the KO files in a very well-specified file system tree. And that is how we actually deploy the K-mods on all the nodes. So those K-mod images are really the vehicle that we use to bring those K-mods to the nodes. The good thing about this, because container images are file system, then one image can contain the K-mods for several kernels. So with only one image, you can address several kernel versions at once. And I will go into the details later. In addition to K-mod images, which is something that either the user or the hardware vendor or the CSI vendor packages, the kernel module management project brings the module CRD. So I don't know how many of you are familiar with Kubernetes, but a CRD is pretty much an object in the cluster that you use to configure certain aspects of the cluster. In our case, we map kernel versions running in the cluster with the name of a certain K-mod image. And that is how we say that with that kernel, go those kernel modules. And this is how we actually load certain K-mods on certain nodes. You can actually, I will show you an example module resource later, but you can specify those kernel versions you're compatible with either with literal strings or regular expressions. Like I said, you can accommodate many distros at once. And within that module CRD, you actually specify if you want to build your K-mods for certain kernel versions or not. So this is how really you express the desired state of your kernel modules in the cluster. The reconciliation loop is as follows. KMM determines that a certain node needs some K-mod. It actually checks if the image that we configured for those K-mod exists. If it does not, and we enable in cluster builds, then we are going to build the K-mods and build that image that contains the kernel modules. If the secure boot image, which is the same image, but with signed K-mods instead, if that doesn't exist, then we are going to reuse the image we built and sign the K-mod instead. And finally, we are going to run to load that K-mod and load the firmware files that our kernel module might need. So in a sense, this is what the reconciliation loop looks like. All right, so what I have here is really the simplest module CR that we could have. Like I said, I really stripped it down to have the interesting bits here. The interesting bits, in my opinion, are the kernel mappings. So you can see that within just one module resource, you can actually address many kernels. And let's have a look at them. Do I have, I do have a pointer here. Yeah, this is working. All right, let's have a look at the first item here. This literal thing, I mean, literal field here is pretty much saying that for this Fedora 37 kernel we actually want to download that image, extract it on the node and load the appropriate K-mods that are stored into that image. If we also have any other Fedora 37 kernel, then we're actually going to use another container image here. Container image being the name of a K-mod image that contains all the kernel modules and drivers that we need. Finally, we have a selector here that allows us to target only a subset of nodes if we don't want to load that kernel module on all nodes of our cluster. In the previous example, I omitted how, I omitted the build section of the module. This is what it would look like. This build is something that you can configure for some kernels only or for all kernels in your module resource. What it basically does is that it references a Docker file that the user would have to put. And then, like I said, it works as follows. The operator would have a look at all the kernels available in the cluster. If one of the, if, you know, for each kernel that is in the cluster, if the corresponding image doesn't exist, then it would use that Docker file here, create a pod that will build the Kmod image. And then, you know, that pod would build the Kmod image with all the Kmods inside. And it would then be deployed to all the nodes that need it. We, I also have the same for signing. Signing would be, you know, like I said, signing the KO files to be compatible with secure bot. It's also an optional section within the CRD. And you would specify here all the modules, you know, the KO files that KMM needs to sign. I will not go into the details. What's important to remember here is that the user provides the keys as Kubernetes secrets. And finally, I have here a diagram that shows how the whole thing would work. So let's consider here a cluster with six nodes. Six nodes, I'm sorry. Six nodes, and actually three different kernel versions. We have kernel one, two, three, that's running on nodes zero, one, and three. We have kernel four, five, six, which is running on nodes two and five. And finally, we have kernel seven, eight, nine. And let's consider an example where in our module resource, we actually configure only two mappings. One is for kernel one, two, three, and one is for kernel four, five, six. So this node will be excluded because let's say that we don't know how to build the Kmod for it. Okay, at the very center of this whole diagram, we have the operator. The operator, like I said, is watching various resources in the cluster. One of those resources is the module. The other one is the nodes. So it's building this internal map of what kernels are running in my cluster and what modules do I have configured. We are considering a module that has integrated build and integrated build sections. So we have to read the Docker file to determine how we need to build the Kmods. And once we have done that, we are actually going to create the build or the signing part depending on what we have to do. All right, and once we have done that, then we have built all the kernel modules that we need. We have built the Kmod image, which is the vehicle for those Kmods. Oh, I believe I need to speed up a bit. And then we are going to deploy all those spots running that are going to actually load our kernel module. So this is how it works. We are going to deploy all those spots here for kernel one, two, three, or same parts for kernel four, five, six. For node four here, we are not going to do anything. And finally, once we have the kernel module loaded on all the nodes, then we can run the device plugin, which is actually going to communicate with the kernel module inside the node and to expose the kernel module or any special resource it powers to the API server, which makes the whole thing consumable by applications. And one thing that is new in version 2.0 of KM that was just released actually yesterday is that whenever we have loaded the Kmod, then the pods that are loading the module are actually disappearing. And on the nodes, we do still have the Kmod loaded, but the loading pods are not there anymore, which saves a lot of resources. I'm not going to go, I don't have time to go into all the features that KM provides, but something really helpful is that we are labeling the nodes whenever, wherever we have loaded the kernel module. So this is, I believe quite useful. We also copy the firmware files from the Kmod image. So the same vehicle for the Kmods, we copy that on the node so that whenever you load your kernel module, you might be able to, your Kmod is able to load the firmware files it needs. And then we have, yeah, like I said, we have a lot of features and I can go into details in the Q&A if needed. Right, right. I think that that's it for the features. I will not talk about the use case from Intel. Intel has been one of the early adopters of KMM for their dedicated GPU products in the data center. And I will explain, you know, like what's their problem statement and how they solve it with KMM. It pretty much boils down to the pain points that I was explaining earlier. So the thing is, there is, so far there was no scalable and you know, standard approach of loading the Kmods on Kubernetes. So whenever they release that great GPU that they, the flex series that they have, there was no standard way of loading the appropriate drivers on Kubernetes. So it's not only really about the GPUs but pretty much any XPU, any accelerator that you have out there. Like I said, the journey to upstream is very long. There is a lag in downstream and I'm not even talking about the Kmod being available on OS distributions because each distribution does what they want with whatever version that they want to include. So it's quite hard to operate for an end user. The goal for them is to shift left, you know, to have things available as early as possible to be ready on day one whenever they actually release the product. And so I think, I believe the key point here is that KMM makes it easier for hardware vendors such as Intel to accelerate the time to market. So you can, you know, as soon as you release the device then you can make the driver available to the users and loaded dynamically on Kubernetes with KMM. Another use case that they have, and this is due to the very architecture of KMM, is that because the operator watches all the kernels that we have in the cluster, it's very easy to rebuild the drivers and the Kmod image whenever we have a new kernel available in the cluster or it's also very easy to deploy a new module resource in the cluster to try a new version of the driver. And thanks to that very feature, they were actually able to deploy an in-house pipeline where they create their module resources and they add new nodes with the new kernels so that they pre-build actually each version, for each new version of their driver they actually pre-build the corresponding Kmod image. And so they don't have really to hand out a module resource with the build instructions enabled as YAML, they can actually just give the module resource with static kernel mappings pointing to images that already exist. And so that makes downloading those images faster and you don't need to build locally, in a sense. Yeah, I think that's it. So let's have a look at the demo now. So hopefully it works. Like I said, it's all live so I'm crossing fingers a little bit here. So for this demo, what we do have is a single-done mini-cube cluster so it's effectively a Kubernetes cluster but only one node. It's really the latest and greatest CPU from Intel but what's interesting for us is this GPU Flex 170 on their dev cloud. So it's really the latest GPU driver for which there is no introversion. So what we are going to do here is that we are going to create a Jupyter pod with all the runtime libraries that we need. So that's OpenVeno, which is an AI toolkit from Intel. In that pod, we also have installed the runtime libraries that you need to be able to use the GPU. And that pod has 20 cores and 64 gigabytes of RAM which is already plenty. In this demo, we are going to build the driver and load it with KMM. And finally, I will show you a notebook, a Jupyter notebook that runs stable diffusion to generate images. The demo code is available on GitHub here if you download the slides, you can click that. Okay, let's have a look. So I'm connected to the machine here. Let's have a look at, is that visible to everybody? Yeah, it's not too small. Okay, we do not have any pod running. So let me create the Jupyter pod. Okay, so don't pay attention to all the other resources. I'm fixing things with Minikube. Okay, so what I'm going to do now is I'm going to connect to the Jupyter notebook. Password is OSS, that's really creative. Okay, it is loading. No thanks, no thanks. And I'm going to open the OSS notebook here. Yes, that's the one. Okay, let's have a look. Let's run the whole thing. So let's run it step by step. So that's the height and the width of the image that we're going to generate. We are going to download a pre-trained model from Huggingface. So this is a model that's already compatible with Intel OpenVINO. Okay, this should download the, ah, all right. All right, we're done here. What we are going to do is we are going to list all the devices that are available to us for inference. I have not loaded anything. I have not loaded my module resource. So it should only show, yeah, it shows nothing. I915 is, oh, sorry. I should probably put the presentation mode. Yeah, it makes things a little bigger. I915 would be the name of the Intel kernel module or the Intel driver for their GPU. We use LSMOD to load, to list all the kernel modules loaded on the node. It returns empty. There is no Intel driver available. Okay, and you know, OpenVINO, the Intel toolkit for AI does return only CPU as an available device. So this is what we expect. Okay, yeah. Like I said, this dropdown has only one option. We are going to compile for CPU then because that's the only thing that we can use. And we are going to try and generate still some images, see how long it takes. Okay, so we are now finally generating the image. You can see that it's very long, actually. We have 20 CPU cores, the latest CPU available. It's still pretty long. It's going to take one and a half minutes to generate the image. I think that's really too long. I would actually kill it and reduce the number of steps here, maybe to 20. You know, the image will be ugly, but at least we will get something. Still it's going to take some 30 seconds, 25 seconds. And we all know that GPUs are made for AI if you try to do it on CPU. Even if it's the fastest CPU, it takes a long time. I still want to show you that it works, but it takes a very long time indeed. So right now we are generating an image of the Tokyo Tower at sunset with stable diffusion. Yeah, pretty good, but quite long to generate, right? And with some artifacts. So this is all blurry here, and that is actually due to the fact that we reduced the number of inference steps to 20 from 50. Okay, so this is pretty disappointing. This is quite slow, so let's actually load the kernel module. So what I'm going to do now, what I'm doing here is I am actually creating the module resource. I wanted to show you all the parts that are running. Okay, so this is the Jupyter part that I was connecting to, and we actually created, oops, sorry. We created exactly that resource, which should be familiar. It pretty much looks like the simplest module that I was showing you earlier. It has only one kernel mapping. This is an Ubuntu machine, so we are matching this very kernel here, and we are telling KMM to build using a certain config map that is available in the cluster, and this contains all the build instructions for our kernel module. So what KMM did is that it read this module as soon as it was created in the cluster, and it tried to fetch that image here that didn't exist, so what it's doing now is that it's actually building it. Let's have a look at the build log. Okay, so you can see that we actually here, if you've already compiled kernel modules, this should look familiar. We actually compiled all the components of the kernel module. We are now adding the firmware into the image. This is a fast machine, so it's only taking a couple of minutes to download everything and prepare the Kmod image. What we are doing now is, yeah, we are just running depth mode, which is an essential step, and yeah, we are preparing basically the Kmod image. We are going to push it to Quay.io, and once we are done, then the image will be available to KMM. So then what KMM is going to do is that it's going to create a worker pod, a certain pod that is going to download that image, extract it in its file system, and then use ModProb to load into the system the modules that it contains. This step here takes maybe some 10 to 20 seconds. Right, right, right, right. So you can see that we actually pushed the image to Quay.io. As soon as the image was available, then we had this worker pod running. Now it's going to download that image, extract it, and terminate already. Okay, so we built the image, pushed it, downloaded it on the worker pod, and hopefully loaded the kernel module. Let's go back to our Jupyter Notebook. We are going to run all the steps again. Actually, I believe I need to restart the, you know what, I will run until here, exactly. So I'm going to restart the kernel of the Jupyter Notebook, and run until that very step here. Okay, and you can see that the output changed here. So instead of empty, we have quite a number of kernel modules, and these are the kernel modules that our worker pod loaded after downloading the image. Okay, so now OpenVINO reports that the GPU is available, which is great. I'm going to go step by step again. Okay, GPU here. We need to recompile the pipeline, which should be pretty fast, right? We need to recompile it for GPU. Okay, this is, like I said, this is pretty fast, and we can now generate the images. So we don't want an image of that quality here. Let's switch back to 50. So you remember that it took maybe some 30 seconds with GPU, so now we are using the GPU, and it's actually much faster, right? It generates an image of a much better, much higher quality in just 12 seconds. And you will be able to judge by yourselves. It is much less blurry. I mean, okay, the Tokyo Tower is maybe a bit oversized compared to all the other buildings in Tokyo, but I think the next one will be a lot more realistic. For this one, we are generating an image of a temple in Japan at full. And let's see, again, it's really much faster compared to CPU in 12 seconds. Only you have an image like this, which is pretty believable if you ask me. I kind of like it. And finally, I mean, we have to have Fuji Sun here. So right now we are generating an image of a lake and Mount Fuji, again, the same settings, the same image size, and 50 steps for inference. And here it is. I think it's pretty good. However, I'm not sure what is this mountain here, but don't ask me. The rest is pretty good, I think. I still have some, I don't really have some minutes. I would show you that if we delete the resource, we can actually unload the kernel module from the node, which is useful if you want to unload a certain version and load a newer version of the driver. But you'll have to trust me on this, it works. All right, going back to the slides. Is this working? I wanted to say a few words about kernel module management on OpenShift. We actually have a special version, I mean, a dedicated version of KMM on OpenShift. It has all the good features of the upstream version plus much better security. It actually supports the security constraints, objects of Red Hat OpenShift, which really makes it an enterprise product. We also solved a fairly difficult problem, which is how do we get all the libraries, the headers, the compilers required to build the Kmod for a certain version. So we actually have that with Driver Toolkit, which is a component of OpenShift. It's a base image for the kernel version that ships with your OpenShift, and it contains everything that you need to build a driver. It's really great. Finally, OpenShift ships with an in-cluster registry, which is an image registry in your cluster, and it's really useful if you want to run in-cluster builds, which you often do. It has, obviously it's compatible with all the other OpenShift enterprise features, that the CA management, the proxy, the registry settings, it's all integrated in the downstream edition of KMM. Wrapping up, KMM is indeed a Kubernetes operator that loads kernel modules on Node. We actually tried to make it the standard consumption model for Kmods on Kubernetes. It can build your Kmod. It can sign them if you want to use secure boot. And with our flexible API, you can actually target multiple apologies. Trying to sync my agenda. I don't know why. The flexible API that we have in the module CRD makes it easy for you to target multiple kernels and even multiple distros at once. So if you're a hardware vendor, you could just craft a module resource and provide it to your customers and that whatever distro they're running, you could make it work. Because we label the nodes whenever certain Kmods are loaded, then it makes it very easy to consume nodes that have the kernel module or the driver loaded. KMM is also available on OpenShift with a dedicated version. It has all those nice enterprise features. And like I said, we released version 2.0.0 yesterday, actually, so it's all ready to use. You can reach us on the Kubernetes Slack on that channel. This link will take you to the nice documentation website that we have and the repo is here. And then also a few links about OpenShift. It's all in the OpenShift docs, obviously. We have a dedicated repo for it. And the two links at the bottom are from Intel and the solution that they built on top of KMM to enable their GPUs. I think that's it for me. That's it. Thank you. Any question? So in the session you said, let's say OpenSource version, not OpenShift. The kernel module, if it's not available, is expected to build in the build phase with the provided config map that has the Docker file. But how do you actually manage the Docker file itself? Is it because, how do you detect a new version of the kernel module is provided, or like, I'm not sure what's actually inside of the Docker file itself. Is it on? Yes, it's on, so I can. Okay, so you're asking, let's say I have a new version of the kernel module that I would like to build. Is that what you're asking? Yeah, you were talking about the Linux kernel version and also the device plugin version, which is, I understood, but kernel module also has a different versioning system as well. So if you want to release or adopt, maybe like a, let's say like a Git pull from the main branch, something like that, it has a different lifecycle. Right, right. So yeah, my question is, the building phase itself has a different dependency. So I was asking about the dependency. Okay, okay, so obviously in that slide right here, this is the simplest kernel module. It has many more fields for you to configure. There is most notably a version field that you can add and that will describe the version of your kernel module and you could increment that. And then, yeah, it's all available on our documentation online, but then what it's going to do is it's going to look for nodes that have a specific version label because you would want certain, some of your nodes to run a certain version of the K-mod and some other nodes to run the newer version, for example. Yeah. And then that version field will be made available at the build step. So in your build script or in your build Dockerfile, you could have an if based on that, if version three, if version four. Yeah. You know what I mean. Yeah. Exactly. Thank you. Also, how do you care about the build cache? Yeah, the build. So you adopted KaniCo, right? The upstream version uses just KaniCo. Plain KaniCo, we mount the Dockerfile in it. Yeah. KaniCo can integrate with like a container, no, I mean like Amazon S3 or object storage use for their image build cache. There are other like a build kit also. It can use the, like any type of, like it can be object storage or it can be a plain file system. Mm-hmm. Yeah. There are many potentials because building a KaniCo module is not a short time job. It can take some time, depending on the device. Depending on the device. Like for the Intel GPU driver, it took a couple minutes, but it can be much longer, I agree. Yeah. And I'm pretty sure in the realistic use case, people want to utilize the, you know, intermediate cache fail layer. Mm-hmm. I think that that makes a lot of sense. I think, yeah, there are many actually caching options in KaniCo. I don't think we leverage any of those at the moment because to be honest, you don't rebuild the KaniModule that often or at least in the use cases that we've seen. Okay. But let's talk about it. You know, we have a monthly community meeting. If you want to join that and if that's really a concern for you, we would be really happy to talk about it. Okay. Absolutely. Thank you. Any other question? Thank you for the presentation. If my memory is right, the two pods are created after you applied this module, YAML, maybe one is a build of pods and maybe other one is a port-fitch load kernel module. If my memory is right, the port-fitch load kernel module was terminated. Yes. My question is, what happens if the node is rebooted? Will the kernel module be loaded automatically or should we need something manually? Thank you for the question. This is actually very different in the V2 that got released. In V1, we were creating a demon set per kernel version that was running in the system. And that would mean that we would load the kernel module and then the pod would keep running until it's terminated. And whenever the node reboots, then the demon set recreates the pod on the node and reloads the kernel module, blah, blah, blah, you get it. In V2, we actually stored the state in some other resource that I've hidden from you in the demo. But the point is we monitor the ready condition of the node. And there is a timestamp associated to that condition and the short answer is we record the last time we loaded the kernel module and if it's longer than the last time you rebooted your node, then we will create a new worker pod that will reload the kernel module. So the short answer is whenever you reboot the node, yes, we will reload the kernel module. Thank you. Thank you. I think that's it. Yes, we are really past the time. Thank you very much for attending this presentation. I will be available at the red ad booth if you have more questions about KMM upstream or downstream. Thank you all very much.