 You saw Ready to go. Yeah All right Hi, everyone. My name is Arsh. I'm from Intel excited to be here And my name is Quentin. I'm from Red Hat and today we're going to talk about KMM Which is indeed your Swiss Army knife for kernel modules on Kubernetes This is what we're going to cover today We'll start with an introduction about kernel modules what they are what purpose do they serve? Also the pain points in using them in Kubernetes environments, then we'll talk about the KMM operator What it is and how it solves those problems Then up to you Arsh. Yeah So then we'll talk about a real-world use case right enabling Intel GPUs within Kubernetes And we'll follow that up by taking KMM on a test drive and we're going to run a stable diffusion gem Stable diffusion demo text to image with an KMM enabled Intel GPU Back to you Yeah, so stay tuned for more explanations about that that image here then a few words about KMM 2.0, which will be Coming later this month and then finally some Q&A if we if we still have time All right, so let's start with an introduction about kernel modules what they are and and why you you potentially need them A kernel module is C code really that extends the functionality of the of the Linux kernel So you you're likely to you're going to need those functionalities for Anything that's a driver So for example hardware drivers, you know virtual file systems Can and modules can also access calls to your kernel. So so this is where you're going to encounter those modules The thing is the the kernel modules are built for a specific kernel version that technically an ABI but Yeah, you need to build them really against a specific set of headers for for one kernel version In most distributions today the kernel modules can be unloaded From the system. So you will be able to load them and unload them. That's not always the case And finally something important is that the The kernel modules need to be signed on secure boot enabled systems So in short if you've been using any of these things So we have a we have a GPU here. We have a new we have an accelerator We have five systems. So if you've been using any of that, then you're likely to need kernel modules in your setup and So the problem is that using kernel modules on Kubernetes setups has proven quite difficult over the years You know all kernel modules Ideally should be contributed upstream But that that's very often quite difficult, you know, it's a long way To contribute in that to contributing code into the kernel It often takes, you know reviewed times and everything So so usually people resort to using out of three kernel modules To enable your latest hardware to the ABI testing these kind of things You will you will very often require out of 3k modes and load them into your system Another item is that using kernel modules in production can be risky, right? Because you build them against a very specific kernel and for a specific ABI if there is for example a kernel upgrade due to a CVE We might have to change the symbols and if we do that then your kernel module will not load or will not work properly So the problem is yeah each time you have a kernel upgrade pretty much you have to rebuild your your kernel modules and That that really leads us to to the let to the last bullet on that slide and how we deploy how do we deploy? And load those kernel modules on the node So we usually resort to node customization So in some cases you will build your node image and you will have to do that each time you have a new k-mode or kernel version Or you may want to use something like Ansible to deploy those k-modes But that it is usually costly and difficult to maintain So what if we had you know a better a more cloud native way of handling things? for Kubernetes setups so But what we're trying to do with the kernel module management operator which is which is a signal project is That we are trying to bring a standard consumption model for k-modes on Kubernetes Kmm is is it is a signal project that that can build your kernel modules and that will then that can also sign them if you're using secure boot and That will ultimately load them on your on your nodes Kmm does that by monitoring all the kernel module version all the kernel versions that you have running in your cluster and Then it you know it it's then able to load the right kernel module versions on the right nodes and that's the ultimate goal really We also have a feature that Allows you to run your device plugins So whenever you have loaded your kernel module and made you know Whatever hardware or fight system available on your node, then we will be able to run the device plugin as well and Yeah, an essential piece Of all of that of the of Kmm and of that setup is the kmod images So to deploy the kernel modules on the nodes We actually wrap them into a standard container image and a CI image and We add a couple more constraints So the kernel modules need to be in a specific location and contain the mod pro binary That the the cool thing is that you can store You know kmods for multiple kernel versions into one into one image and then we will use one image will use that image To actually load the kmods on the right node Yeah, that's it for the Kmod image really and then about yeah at the center of KMM obviously is it CRD the module CRD The nice thing about it is that We do have a kernel mappings list and so that's how we actually specify that with a certain kernel version We map a kmod image really a kmod image name. And so that is how we really specify that Those kmods should be loaded on that node and I will show an example in just a few slides Like I said because we have this kernel mapping list We can actually accommodate many distros at once so in only one CR you can you know express that With AWS or Ubuntu or Red Hat or whatever, you know distro kernels you should map those with specific images and Finally because you know, I told you that KMM can build your kernel modules. It can also use pre-built Images and so you can mix and match as you please you can pre-build for some kernels and Use pre-built for some others that that's totally fine right And this is I mean at a very high level. This is how the the reconciliation works for the operator So at the beginning, you know, we have to determine that a certain nodes needs a certain driver Or a certain kmod because it's equipped with some hardware or it needs to have some file system available So that's that's on the very you know left of that slide So whenever KMM determines that a node needs a kmod and if we have builds configured in the CRD Then KMM is going to check if the image exists If it does not exist, we are going to create a pod to to build that image more on that in a in a subsequent slide And once that image exists, we are then going to look at whether we need a signed image To be compatible for secure boot If we do need such an image and it doesn't exist yet, then we are going to create a pot to build that finally, you know, we have everything ready and Whenever we have that image Prepared then what we're going to do is we are going to create the actual part on the node That part is going to copy any firmware file if we need to and Ultimately, it's going to load the kmod This is the really the simplest module we could come up with Still it shows the value of KMM, right? So the important bit here would be the Canon mappings list that we have you can see here that we have one entry for federal 37 kernel here and The CRD says that yeah for that very kernel we are going to use a very specific image But but because we we can configure either literals or egg exps Then you can see that at the next entry in that list we address all the federal 37 canals at once and And yeah, because we do variable substitution in the container image field Then it makes really the CRD very flexible Finally at the bottom is a selector here So you can you can you could deploy your kmod just on a subset of nodes as needed Right, that's it for that slide. I think I'll go faster on those. This is an example on how you configure the builds In your module CRD So you would specify you you would need to provide the config map With the Docker file and we would use that, you know as a recipe to build your kmods for that version We are going to to inject the kernel version so that in your config in your Docker file you'll be able to fetch the right headers for the right kernel version and We will run all of that with canico So we will create a product with canico and that produce your produce is your final camera image We do support the build arguments we do support secrets And and a couple of register related settings This is this is about signing really same story very similar you have to you have to provide yourself obviously the keys You have to provide the list of KO files that you want to sign in your image and what we're going to do It's also powered by canico. We're going to download that image extract all the KO files sign them and build a new image and That that that's how we Make those those came out available for secure boots That's it I have here a diagram that explains how it works, so let's consider all those six nodes here We have a six node cluster Three of those nodes are running kernel one two three In pink color right here two of those nodes are running kernel four five six and Note four here is running yet another kernel that we will actually not configure In our can channel mappings list at the very top. We do have you know KMM watching on the module and the nodes Let's say we do not have a came on image yet for kernel one two three, so we may have to build it So what we're going to do is we are going to read the Docker file Create the build pod, you know with the Docker file mounted and that's how we are going to produce that came on image containing all the KO files Once we have everything nice and ready Then we are going to create a demon set that's only going to target The the nodes with the first kernel kernel one two three and that demon set in turn is going to create pods On each node and those pods are going to to load the kernel modules So same story really with the second kernel that we have And finally oh well you will know that on node four. We're not adding anything because because it's not configured in our CRD You know Sierra and then once you know once the came out is loaded on all the nodes that Actually needed then we are going to run the device plug-in and You know KMM manages labels and annotations automatically so it can determine where it should be loading the It should be running the device plug-in. So yeah in a nutshell that that's really how it works So yeah, we have you know, we have way more features We're actually labeling the nodes whenever a node is loaded So that makes you know scheduling your applications using the came out very easily We we do copy binary firmware files from the came out image on to the nodes so that's really useful in for hardware and I mean we have many more features. I'm not going to go into detail some on all of those I believe and About the use case Thank you. Thanks Quentin. That was a great overview of all the features. I'll give you a well-deserved break Sorry, I'll give you a well-deserved quick break. Thank you. Yeah. Thank you So let's shift gears a bit. Let's talk about kind of a real-world use case, right? So, you know taking KMM on a test drive, right? So just a little bit of background on this kind of use case, right? So what kind of environment we're dealing with today where you know, it's a fast-changing landscape that's AI driven and You know what what's coming out of it is it's demanding the latest hardware enabling to run You know some of the most industry-leading workloads, right? And if you kind of step into that a bit One of the fundamental components that enable these devices are the kernel drivers And so you know these latest drivers really need to be used to unlock the optimal workload performance and Then the other thing I want to talk about briefly is kind of the chicken and egg problem, right? So there is always this delay between the hardware and the software being ready And in this scenario we're talking about the drivers that are essentially out of tree And so that essentially prevents Developers customers and partners to leverage the latest and greatest hardware in their workloads the problem today is There's not really a facilitated approach that's scalable, right? So this is why we're talking about KMM. That's what it brings to the table a little bit about the production environment, right? So what are the realities is? Lot of the latest hardware is not really truly enabled on day one There are actually some non-trivial operations that are quite complex This is usually managed through configuration The second point being all the complex node and kernel customization that's required to enable these kind of use cases and the hardware So let's go to the next slide So this is delving into you know, why are we talking about it today, right? Why solve it today, right? So a lot of the leading edge XPU devices. So from that, you know, think of GPUs TPUs, IPUs Networking products, right? All of these devices Lot of them require enabling out-of-tree drivers to use them, right? To consume them and Lot of these kernel drivers are unavailable in OS distributions and there's a two-part story to that, right? One of the challenges is the initial journey to upstream of an out-of-tree driver and That cadence is a relatively long period and it's a sequential process So the second thing that follows it is going into the downstream distributions So the goal here is to really shift left and make sure that the hardware and software is ready on day one So that you know, everyone can consume all the latest and greatest features and technologies that are built into the hardware and So the big impact the thing to leave it today is really, you know KMM is really enabling and accelerating the XPU enabling right and the time to market and thereby it's unlocking all the optimized workloads and use cases that and it's actually facilitating use case to use case driven development by allowing customers to really Shift left and use use these features in their workloads So on another item to talk about is KMM and a driver CI pipeline So actually KMM we have a project open-source project Recently that we've started to essentially leverage KMM and power a driver CI pipeline So what do I mean by a driver CI pipeline, right? Essentially what it is is the ability to build a pre-built K mod image or a driver container and this Pipeline really addresses two key scenarios, right? And it the problem here is much more multi-dimensional than that But these are the fundamental scenarios the first one is the idea of a new driver version being available, right? very common Scenarios to see the second one being you know a new kernel version and that could be due to a Multitude of reasons, right? I think Quentin alluded to some from a CVE perspective Some as an extension that to that some of our future work has been on being able to facilitate seamless driver upgrades and So the value of this pipeline for you know, our customers is essentially it allows them to enable GPUs in seconds With pre-built images as opposed to something. That's you know built-on premise, right? So it goes from minutes to seconds from an enabling perspective And so the flow would look something like this. We have a two-step Flow at the bottom here where we the first step is essentially you deploy the pre-built image with KMM And that essentially allows the enabling of the device and then on the second step here We kind of build it out and leverage the GPU device plug-in to extend the GPU resource to the cubelet Essentially for workload consumption on the user space So this is the exciting part You know, we want to talk about the demo, you know take KMM on a test drive So we're gonna talk a little bit about enabling Intel GPUs with KMM on Kubernetes So just to set the context before we get into the live demo So this demo is running on a single load Kubernetes cluster With our Intel Flex 170 GPU We're running one Jupyter pod with open Vino and the associated runtime libraries You can see some of the resource Limits listed here from a CPU perspective 20 cores and 64 gigs of RAM and The demo fundamentally is the idea of first building and loading the GPU driver with KMM Essentially enabling the device and then following that up with running a real-world text-to-image notebook With stable diffusion and open Vino on both the CPU and GPU to see the advantage of it, right? So let's go ahead and take a look at that. I'll hand it over to Quentin. Absolutely bring on the logos Oh, we have the logos here today. We do have the logo. So that's yeah, we're going to use a variety of technologies today We have mini-cube. We have Jupyter. We have hugging face transformers And finally open Vino the resources are important here because we're like Hirsch said, you know We're just going to run it on CPU see how long it takes then enable the GPU and run the same workload See see how long it takes with the GPU. All right, so let's switch to the terminal I'm logged in on that that the Intel dev cloud VM. It does have the the GPU Let's you know, let's list of the pods here. We don't have anything now. Okay, that's fine So let's now create the Jupyter pod Okay, let's let's just do that All right, we're you know creating a couple of objects. We also need a storage class to persist the data Persistent volumes volume claim, but let's not worry about that. Okay. The pod is running here. I think I think we can now log in Okay, here we are. Let's let's log in here. Awesome and We should have the yeah Jupyter interface Wonderful. Okay. I'll I'll just open the notebook here. I'm not going to go into the details of what the notebook does really Staking a bit longer than usual, isn't it? Had to happen, huh? Okay, let's read out that Okay, awesome. It is taking a bit longer, but patience. I will just clear the output of all cells I think I did that already but yeah, okay So what we're going to do first is we are going to fetch the stable diffusion pipelines. It's actually cached for Because of the demo right to run faster Then you know that by touch model. We are actually going to convert it to and something that Can be run with open vino. Yeah, exactly and then we're going to compile the models first We are going to compile them for CPU and then for GPU But we do not have anything loaded here. I can actually show you that I think yeah, I think I need to do that right around All above selected so yeah, so then I will stop here Yeah, exactly. Okay, so we're going to run everything and Okay, so we do a last mod grep. I am one nine fifteen. So that's the name of the kernel module That's the Intel GPU driver So because we grab here we don't have anything so it's empty module is not loaded, right? And that's what open vino reports, right? The only device available for inference is the CPU right now. Okay, that's fine That's fine. Still we want to generate one image with it. Okay, so all these steps, you know, I'm not going into the details there I think I'm just going to go and to that one and And Yeah, exactly. Okay, I Don't know if that's going to run everything No, probably not. Okay, so we are compiling the models now. This should be really short. Okay, and Already we are generating the first image which should be a valley in the Alps And that's only using the 20 cores of the CPU that we made available to this spot And you can see that this is going to take something like a bit more than I think 33 seconds Yeah, 30 seconds. I think we're already yeah, we're already halfway Okay, this takes a lot of processing power as you know Obviously Yeah, this is this is taking some time in the meantime any guesses for how long the GPU will take You can make your own guess Okay, so this is I Think the Wi-Fi is maybe a bit slower in that room because it should show the image Okay, here we are here. It is indeed a bit slower But nice image right and this was just generated with the with the CPU Okay, but we don't want to stop here obviously we want to enable the GPU You know load load its driver make it available to the notebook Back to the terminal to do that We are actually going to create a config map that contains the Docker file to build the Intel driver We are also going to add a secret that is a pull secret to my query poll because we want to build that image Uploaded to quay.io. So then it's available for KMM to be loaded on the nodes I'll do just that now I think it's mod Exactly, I can I can show you what the module consists in really Really quickly. It's it's very simple. I think it's pretty similar to what you saw on the slides Yeah, I don't know if that's really visible and maybe I'm in front of it, but really so We are mapping that image to just one kernel, which is the kernel that's currently running on the VM And we are you know, we are using the build option here Which means that because the the image is not already available on quay We will have to build it and upload it. Okay. Let's take a look at the logs logs. Yeah. Yes We want to build logs. So they're on the right. Let's see what's going on there Okay Yeah, so we are I think yeah, we're downloading all the packages now No, we did download all the headers and the compilers everything that's needed to produce that came out image right now We're actually compiling the object files. Yes To produce the KO the final kernel module So I think this this machine is really fast. I think this completes within two minutes. Yep Something like that So so yeah, what one good aspect really of this build system that we have embedded in KMM is that If we have many nodes that share the same kernel and we need to build an image for that very kernel We are only going to build once Which is which is actually quite handy In that case really we're not so we're done compiling everything. I think yeah, we're we're actually pushing To quite a tire so within seconds what you're going to see on the right is the the module loader pod The the pod that's actually created by the demon set that came in creates And that pod is going on the node to run mode probe to load the K mod And this should happen as soon as the the push is complete right there It does take a few seconds. I think I'm not sure why because the image is quite small It's only mod probe and a couple chaos. Yeah, I wanted to re-highlight that you know Building once and deploying anywhere given that the node has the same kernel version. I think that's really a valuable takeaway so it is and As you can see, you know we pushed mm-hmm KMM keeps reconciling obviously And you can see that that that very pod actually has loaded the kernel Okay, so let's let's go back to the to the Jupiter notebook So now you're telling me the GPU should be enabled the GPU should be enabled indeed Okay, let's we I think we need to restart the kernel Yes, we do need to restart the kernel and run up to select itself. That's exactly what I want. Okay Okay, let's let's have a look at this cell right here, right this ls mode grep i9 915. Oh, so there's there's some output now So that Intel GPU driver is actually loaded into the car into the the nodes kernel now It's available to the to the Jupiter notebook in the pod and actually open Veno reports that you know It's saying available devices CPU and GPU. That's that's really wonderful and Yeah, the good thing is that we can now select GPU here and and have a much greater performance. Okay, let's Let's run that Now we need to compile the models for the GPU. I think it's still quite short. Yeah, isn't it? It's relatively fast. Yes. Yes, okay, so this this will again take a couple seconds I think I will advance manually now Exactly, so it's it's completing Good good Okay, so let's Let's advance and we are going to regenerate exactly the same image of the Of the alps nice valley in the alps. You can see that it's actually much faster It's three times faster because we're using the GPU and it completes in only 12 seconds instead of 35 with the 20 GPU cores that we had in the pod almost 57 percent faster I can see you've done the math. Yes, I have I have not I just know it's faster. Okay Yeah, the image is it's a bit slow to load in this location. I'm not sure why but here it is Okay, I think we have a couple more that we wanted to show we're in Chicago So we wanted to show the Chicago cloud gate. You may have seen this image before You may have seen this image before if you paid attention The very beginning of the talk. Yeah So it's actually faster to generate the image than to download it from the pod with this Wi-Fi. It's interesting Anyway, again 13 seconds for a completely different image, right? The seat is different the prompt is different And yeah, you know what I should probably stop that one because it takes so long. Yeah, here we are Nice, it's not it's not bad actually. It's not bad and and the final one we want Chicago at sunset And the GPU is only using 16 gigs of memory right compared to 64 gigs from the CPU So you're really seeing how well it can crunch it. That is right. That's right That's right. This doesn't use any any CPU to generate the images now and since we have a bit more time I think I'm going to show you guys. Okay, so here we are again It's I think it's not that bad But that's it. That's it. We went to we went from, you know, zero GPU enabled in the in the pod to You know having the GPU fully provisioned and available for inference In just yeah, probably something like five or six minutes. I just need to compile the image push it And that's it. It's available One thing I wanted to show the audience because we have Some time is okay, if we you want to get some audience input Do we want do we want to do that? That doesn't anyone want to suggest something We can do that. He wants to said Can you get to the microphone? Yeah, because landscapes are super nice, but they are so boring So I would like to see Santa Claus and Easter Bunny at the family dinner I'm sorry. What's that's hold on hold on now. I think we'll stick to Santa Claus Because I'm not sure I've been trying some you know human shapes Not always with you have a favorite number do you? 2023 let's see what happens. Okay generating. I hope it's not too disturbing. All right Let's let's see what happens. Wow. Here we are. I bet it all here. We are and I think he has a full hand not only four fingers. You can Yeah, something's going on near the bird, but but I think this is this is still pretty good Not sure what's happening with the hat also, but yeah, okay. Anyway regardless Yeah, just showing you guys quickly that not only can you load the the canal modules and the driver? I Will clear this So we do have the Jupiter pot here and we have the Module loader pot here if we do LS mod Rep I 9 1 5 then we have you know all the Intel stuff loaded right now We can actually if we kill the workload first, so I will delete that pot here Okay, so we have nothing using the K mod which would actually prevent us from unloading the the the canal module So right now I will actually delete The KMM module resource Okay, it is deleted and When when we delete it actually it kills the module loader pod which in terms, you know runs mod probe dash R again to just Unload the kernel module and now if I if I LS mod grip I 9 1 5 then the module is not there And we have successfully unprovisioned the the kernel module Do we want to take it one step further and deploy the pre-built image? No, I don't think so I think we have a couple more slides Just a few words about KMM 2.0, which we're still you know finalizing this will be out At the end of this month Improvement here is that we're not using demon sets anymore to load the kernel modules on each node We are actually creating short-lived pods that are running more prob once and then exiting This should you know provide a better reliability and a much lower footprint because nothing is running whenever Whenever we have loaded everything We will this will also improve the binary firmware Supports because that the worker pod will be able to load to to set some kernel parameters I'm won't go into details there But yeah, it will be available later this month Wrapping up again KMM is really a Kubernetes operator That is that was designed to load the right kernel modules on the right node It can also build those kernel modules sign them if you want We we really intend it to be a standard consumption model It you know, it has a flexible API you can embed the the module resource in your operator You can have your operator deploy it to load the kernel modules and abstract that away and that is really the goal here The latest available version is v1.1 We're working in v2.0. It should be available later this month and if you want a more really great demos of inference using the Intel GPU provision by KMM, but on open shift because we have an open shift edition of KMM Then please pass by the Intel booth Yeah, yeah those awesome definitely check it out check it out. That's it. Thank you