 All right, I think it's time we shall start. So hello, everyone. Welcome to our session. I know it's quite late and I had to ask this party. But thank you for visiting. Let me introduce myself. I'm Alexander Kanievsky. I work for Intel. Last formerly, people call me Sasha. Why I'm here, why I'm talking about this subject because in my work for Intel, for many years, I was involved in enabling different kind of accelerator devices. And with all of that activity, I'm also co-chair of Container Orchestrated Devices Working Group under the tag runtime in CNCF. We supposed to have one more speaker, Patrick. Unfortunately, in last moment, he was not able to travel. So reminder to everyone, keep safe, keep healthy. That I hope I will cover work for his part of work by Max. And let me go in to introduce my co-speaker, Kate. Hi, everyone. My name is Kate Goldenring. I'm a software engineer at Fermion, and I come from more of an IoT background. I'm co-chair of the CNCF IoT Edge Working Group and was the maintainer or is the maintainer of a project called AUKRI, which kind of answers this question of how do we bring small IoT devices into our Kubernetes clusters, which we'll talk a little bit about today. So I want to start by talking about the evolution of Kubernetes. So Kubernetes started as a tool to orchestrate workloads across servers in the cloud. And this environment, as you can see, is fairly homogeneous. All these servers have similar static system hardware. And you can kind of compare that to a neighborhood where all the homes have the same set of features. They all have bathrooms, bedrooms, kitchens. And your choice of where to live is boiled down to a decision of how many of each of these three features do I need to make that choice to live there. And this is comparable to Kubernetes, where you have your workload or your pod is trying to choose which node it is best suited to live on. And in Kubernetes, out of the box, those features of those resources are compute resources, namely CPU, memory, and huge pages. And behind the scenes, what happens is once you request that amount of resources in your pod spec, so in this example, this streaming app needs 100 megabytes of memory and 10 CPU units, the Kubelet on that node will say, OK, that's reserved for you, and will ensure that you do not exceed a limit also specified. Make sure that you don't use more than a certain amount of resources. However, the use cases for Kubernetes continues to expand. And we're finding that not only are people wanting to, because people are wanting to do more with Kubernetes, we're finding that hardware needs to become a little bit more specialized. So now you have more of a choice here. It's not just how many kitchens, bathrooms, and living rooms do I need, but it's also do I want hardware flooring or carpeting. And Kubernetes had to evolve to meet this expectation of wanting to have that same declarative experience around what a workload needs for a more varied hardware. And that's where the Kubernetes device plug-in framework came in. And that came about in Kubernetes 1.10 in around 2017. And what this does is it enables you to create extended resources, so new node level resources in your cluster. And the way that works is you create a device plug-in. So in this example here, we have that same streaming application, but it wants to do some inferencing on the streams that it has. So it needs a GPU. And it wants to make sure we want to make sure that it's scheduled to the node that has that GPU. So in your pod spec, which is where we're saying this user experience for Kubernetes and devices is, you want to be able to request a GPU there. So on your cluster, you create a device plug-in on the node that has a GPU. And what this is doing is saying, hey, Kubelet, I have a GPU here. Tell API server that. And then now you can request it in your pod spec. And that ensures that your streaming application is deployed to node A, not node B. And behind the scenes, what also happens here is the device plug-in ensures that there is any additional information needed for that application in order to use that GPU. But the question becomes, is there more we can do with the device plug-in? And is there more that we need to do in general for devices in Kubernetes? So as Kate described, we have nowadays user experience in pod spec to specify, I want a GPU. But the modern world changes how the hardware is looked like, how it's going to be utilized. And I would like to use two examples from our fellow travelers in community from Google and NVIDIA. So something that was presented in KubeCon Valencia earlier this year. Very simple example. You have Cloud provider. You have multi-instance NVIDIA GPU cards which can be split it in multiple compute units, different sizes of memory internal to an accelerator. And user wants to request specific amount of GPU or compute units or a specific amount of memory. With current device plug-ins, it's not really possible. So people coming up with different kind of workarounds. One example here is what like a cluster administrator statically provisions the node, when labels the node with particular type of the device. And when user needs to know the cloud specific label to use in the node selector, say, I want GPU, but I also want to have it this kind. Doable, maybe not so user-friendly. Similar second example is you have a device. It's expensive. But if you do a long running job, it justifies the cost. But if you have multiple data scientists who are just doing something in notebooks and from time to time running short period task which require computation power, it becomes really expensive. So people are thinking of how we can share one physical device between multiple containers, multiple pods. Again, people come up with different workarounds. So instead of presenting the physical GPU, what device plug-in says, oh, now I have like 10 virtual GPUs. And again, you have labels. You have a label which says, this is time-shared device. This is how many slices in the device which can be used. Again, doable, maybe not so good. Next thing, all of us know what we have like service measures and service mesh does in the background, crypto. And it can do crypto on the CPU, how its majority are running. But it also can use some accelerator. And imagine what if you want to use in a way what give me accelerator, I can use it. If you don't have, OK, fine, I'm full back to the CPU. With current device plug-in APIs, like if you try to say, my request is zero for crypto accelerator, but limit is one, it's not going to happen, where admission controller will fail. Next use case is for more complex devices. The example of FPGA, field programming gate array. So it's something what you can say, I want this particular type accelerator, but when please load the function. Like, this hour I want to use compression, next hour I want to use like some crypto acceleration, something else in the later on. So user needs to say two things, what kind of card plus what function. We come up with, again, with all walk arounds. So we created this here, which describes these are devices, this is the function name. We can map them together. We can have admission web hook, which mutates your pod spec from user-visible string of resource request to something like cryptic, as you can see. Like, it works. But again, it's for end user not the best experience if he needs to debug something. So we need something better. And that's like local devices. And Kate, you know something about the IoT devices and edge devices. Yeah, so as Alexander mentioned, there's ways that people are using, trying to get parameters, optionality, and sharing with local devices with the device plug-in interface and some work arounds for that. Another space that people are trying to expand the idea of devices in Kubernetes is IoT devices. So if we go back to our neighborhood, thinking about those resources like the satellite dish attached to your home or the pool that's external to your home that you know you have access to, how can we figure out that we have access to those resources? And so in Kubernetes, an ideal user experience or pod spec for this would be, I want to access that thermometer or that robot arm or that IP camera. And I want to be able to request that as a resource, just as I would request any other resource in Kubernetes. And at Microsoft, we were looking at this problem. And we were trying to build a demo that was using IP cameras and do some inferencing with IP cameras from a Kubernetes cluster. So we wanted to be able to dynamically say, I want this IP camera. We found we can't put a cubelet on the camera. So how else could we bring that camera into the cluster? And in general, when you're trying to expand the functionality of Kubernetes, you have two options. You can either build a Kubernetes operator, which is something that's deployed separately of Kubernetes. And it's basically a custom resource definition. The word resource is very overloaded in Kubernetes, but a declaration of state, essentially, and a controller, something that's reconciling that state that you want. The other option is to amplify Kubernetes itself. And there's a process for that. It's a Kubernetes enhancement proposal. And you work with the community to do that. And it requires a clear understanding that this is something that everyone needs. And it's a very important issue to bring to Kubernetes. So we went with the operator model. We wanted to get this out there and make it something that you can add to Kubernetes in the case that you're doing Kubernetes on the edge and want to access maybe the IoT devices of local devices and environment. And that came to be Project Aukri, which is a CNCF Sandbox project. And it stands for a Kubernetes resource interface. Because what it does is it really aims to extract away the details of device discovery and use in Kubernetes. And device, once again, overloaded term, but in this sense, those IoT really small constrained devices. And so what it does is it'll discover the devices, whether across the network or locally on a node or attached. And it will create device plugins on your behalf. So instead of you having to create a device plugin for each type of device you have, it will create them on discovery. So you immediately have these new node level resources. And then it also can optionally deploy workloads for you immediately afterwards. And if we look at, once again, the YAML user experience of this, we have a second piece of the YAML here. We have our custom resource definition. And in Aukri, that's an Aukri configuration, very simple way to say that it helps you configure what you want to find. And in that configuration, you say what you want to find by specifying what protocol you want to use to find that device. So out of the box, Aukri supports three on-vif for IP cameras, UDev for local devices in the Linux device file system. So that could be a USB device, for example, or OPC UA for industrial machinery. And then in that configuration, you can also specify some filters to narrow down the number of devices that you find or to tail into the ones that you want to. So for our IP camera scenario, that could be MAC addresses, IP addresses, or scopes for specific camera names, for example. So once you've applied this to your cluster, Aukri will find it, tell the cubelit about it, and all of a sudden you have these other resources that you then can add to your pod spec. So now you can see here, we now have an IP camera we're requesting in our pod spec. And if we look at the flow of this similarly, the first thing you do is you apply Aukri to your cluster. So this is an operator. So you have Kubernetes first, and then you add your Helm chart, which is Aukri. And once you've applied that to your cluster, you then tell Aukri what you want to find. So we're back at that configuration creation stage. And here we're saying on-biff, because we see that IP camera and we want to find it. And then Aukri finds it, and it represents that device as a Kubernetes resource. So it tells cubelit about it, which tells the API server about it. And then it also creates our second custom resource, which is an Aukri instance. And that instance helps Aukri understand the device and regulate the use of the device across nodes, because this is a shared device. It's not physically attached to one node. It's shared by multiple nodes. And then, once again, you can go in and create your pod spec, apply it to your cluster. And now you have your streaming application that's using that camera. And one thing to note about the device plugin interface, I hinted at this earlier, but when you allocate a pod to a node, what happens is the cubelit basically calls into the device plugin. So it's calling into Aukri and saying, hey, I want to run this pod here. And it's using this device that you're controlling. And then it also asks, is there anything else I should give it? And so here we say, yes, here's some connectivity information. This is how you connect to the device. And we pass that as environment variables. So now the streaming application, when it starts up, it can just read those environment variables. And in this case, it knows the RTSP URL for that IP camera. So it immediately knows which one to connect to and can start doing the work that it was intending to do. And if you don't want to create that pod spec yourself to point to the exact camera, you can also have Aukri optionally deploy the workload on your behalf. And that's just in that extra section of the configuration called the broker spec. So here we're just saying every time you discover the device, every node that can see it, it gets this workload. And if we look at the instance more in detail, so this is the representation of the device for Aukri, you can see how it handles that resource sharing. So when we created this configuration, this example, we said only three workloads are allowed to use this device at once. So we set device usage to three. So that means if you try to deploy four pods to use this device, one of them will not run because that resource is only available to three. And so that helps with not overloading these shared IoT devices. And you can also see there, under broker properties, that's what's injected as environment variables to any workload that requests it. And so back to what could be better, the user experience. So Aukri's pretty imperative if you're choosing to create your own pod spec. So you can see here on the right, you have to specify exactly what camera that Aukri discovered and created the instance for, you wanna use. Ideally, you could be a little bit more descriptive in your pod spec. Say, I want the robot arm that's moving fast or I want the precise robot arm. And being able to do that all in the pod spec would be what's ideal. With Aukri, we're getting closer to a more generic pod spec experience. We have some proposals out there for that. And that would be through changing the way our controller works. And if you use that to deploy the applications on your behalf, you're able to have more of a generic experience. But there's still a ways to go. And one of the exciting things about the way that Kubernetes is evolving is as Alexander will explain, there's now a KEP to kind of change some of the ways that we're using these devices. And we could even see Aukri being put on top of that, which is dynamic resource allocation. So you saw in current device usage pattern inside Kubernetes, you have multiple problems like how we want to use. And we have multiple workarounds, how to work around the limitation of current design. So we try to step back and think what will be ideal or more extensible way to specify where devices in Kubernetes would be. And just for you to understand what kind of scenarios we are trying to address. First of all, separation of claim of a device, describing I want this and here's my list of parameters. And you have a name for it, you have unique ID for it. And when you have multiple containers or multiple pods, which can be your reference and you can say, okay, for this allocated device, now I'm going to use it. And this separation allows us to cover pretty much everything what we can imagine right now. So like two pods using two devices, it was same host, maybe network devices. Two pods sharing one device, two containers within one pod sharing one device or like any other kind of things, like multiple pods sharing multiple devices split it dynamically instead of like statically I showed in the previous example. Devices might be interconnected, those pods might be running on different machines if there is a fabric between the devices, like outside of Kubernetes and so on. So this is what our working group is actually trying to do and the process for that is not fast. So all of those discussions started between like us in Intel and Nvidia, like almost four years ago when the key point in time was the KubeCon North America in San Diego where we had a round table, people from different projects like Monal, from Cryo, myself, Mike Brown from IBM, from Container D, Renault from Nvidia. We sat together and said, okay, let's think how the devices will look like in the future. Result was creation of a container orchestrated device for this working group in 2020. It's part of the CNCF tag runtime and we're at two activities what we are working on. Like the first activity is low level part where container runtime like so-called CDI interface. This is the description on the container runtime level what device actually means. So you get ID like Nvidia GPU zero and when you have a file, just on schema to describe what this Nvidia GPU is actually with device node or with set of device node, with volume mounts, with libraries, with set of hooks and so on. So all the implementation details what Kublet should not know about. And while it works on the low level, so you can use it in Podman, you can use it in current run times. We need somehow to expose it to the end users like to the Pod spec. And the second part is dynamic resource allocation. It's the cap what my colleague Patrick is primarily driving together with other community members. Like I already mentioned on previous slide like Nvidia folks, Kevin and even on our side they had Christian and many hours. So we are trying to get this cap done in a way what you will get more possibilities, more flexibility how to use the device. Details on that cap. So if you know the storage world, you probably know the pattern we are trying to use. You have a claim, you have a parameter. This claim can be as a separate object. This claim can be templated inside your Pod spec. The actual allocation, again, it might be immediate. It might be like once we pod arrive it, so delayed allocation. But the thing is you get a claim, you specify it, and when you specify what my container is going to use with claim. Where is the small differences or maybe not so small differences, how it's actually implemented. So we don't have a separation like in storage we have PVC and PVE. Here it's one object. It has all the status information, all the communication between scheduler between the driver which implements this cap claim. It all handles through API server just updating, patching with this object. And that simplified life and simplified implementation of I think a lot. Another big difference is set of parameters. So in PVs you have the parameters embedded. So the core object inside the Kubernetes API enforces some set of fields which needs to be present like size and so on. Here we try to do it more generically. So the parameters are stored in separate object and you have a possibility to reference like core objects like config maps but you also can reference your CRDs. And your CRD schema can actually specify which parameters are mandatory, what kind of types those parameters needs to be and so on. So again, for your driver we're trying to give you as much flexibility as possible. How it looks in reality in the current implementation. So this is example of pod spec which has embedded claim. You see two containers. One of the containers is actually references I want to use my resource. And down below you see the template of description of my resource. It says resource class something. It says parameters and has a pointer. The type of object config map, it could be your CRDs and the name where these parameters are located. Those two objects is example here is what class is mapping the name of the resource class what you have in your cluster to actual driver implementation. So for example like vendor foo can have like the driver foo GPU and you can have like gold GPU, silver, bronze GPU and so on. Handling by the same driver. These parameters as I mentioned here we are using the simple core object config map and you can specify the free text of the parameters which will be handled by the driver. The second variant of usage is again similar to the storage way. You can create a standalone separate claim object by yourself. For example, I can allocate with GPU for my researcher team. I want it to be like shared into 10 units. Again, same thing like you have a class what is GPU from particular vendor. You have parameters saying like 10 slides, 10 slots and so on. So this allocation is happening and for example like the cloud service provider start billing of it. But when actual usage of that resource will be in multiple pod where you just say again, you have a pod spec, you specify what this pod is going to use, particular already allocated resource and here's which container is going to use or set of containers is going to use it inside the pod spec. So this is how it looks, how we are working with it. So the cap was initially merged into 125 and when it was slightly updated to fix some of the API shortcomings in 126. The pull request right now is open for review. Hopefully it will be merged soon, but who knows. The pull request is quite big because we need to touch many of the components in the system like API, obviously. We're a controller manager because it handles lifetime of way embedded claims objects. We need to touch scheduler to teach him how to use this API server as a backend to talk to the drivers. Obviously, we need to touch the Kublet actually to ask prepare the resource and then pass it down to the runtime. But all of this doable. You can look how it's done in the pull request. It's not that complex or not that scary as it might sound. So how it affects the vendors. Vendor obviously don't, not going to use, we need to re-implement the support for the device. It's not a device plugin. It uses a completely separate API. You implement something similar what Kate described in Acre. So you need to implement your controller which knows the logic for your device. Your allocation logic, your discovery, your resource tracking. Node component is simple piece of code. Documentation, how to use it as well. But again, it's not that scary as it sounds. From our side, what we are doing is we are providing the helper libraries which is part of the pull request what I mentioned. We provide the test driver which is going to be used in N2 test. We have another example of more complex driver from our fellow travelers from NVIDIA. Have a look, it's really powerful mechanism. It's really allows vendors to expose all the features of hardware without spending too much time or complicating too much code of core Kubernetes. So with that, I would like to say please, if you are interested in particular like enabling the devices or just using the devices, please join our communities. Like we have nice Acre community for IoT devices. We have generic tag runtime or container orchestrated working group. Please reach to us, myself, Kate, Patrick. We will be happy to talk about your use cases, your problems and see if we can help to do the things. With that, we have few minutes to the questions and we have QR code which you can scan later and provide the feedback. Yeah, happy to answer any questions that anyone has about any of it. Can you continue to use the device plugin framework in addition to using the dynamic resource allocation? Can you use together and how would that look? Yes, absolutely. So it's not something what we'll immediately replace with device plugins. Device plugins still stays. So it's just a new API which allows you to a bit different usage of devices. So if you have some device which you are happy with very simple like bin counting of like, I have one device, another device, another device and then I don't care like more detailed property of the device plugin is perfect for that, use it. Yeah, it's just another interface that's being added to the Kubelet manager. So device plugin will continue to exist and DRA adds a different way of using kind of, like Alexander was saying, kind of the way storage volumes are handled but doing that for resources or devices as claims as well. The question is like it's more where as a device vendor you would like to go. What we know from the feedback from Nvidia colleagues this is the way where we want to go because we already collected certain technical depth over the years. We collected certain user stories which we cannot solve with existing scenarios or it's more expensive for them to solve with existing device plugins. They would migrate to that as soon as it will become available. Awesome, well. If for another question I may ask my question to audience, how many of you are using actually any kind of accelerators? Okay. Non-GPU accelerators? Okay. All right. Interesting statistics. All right. Any other question? We'll be around. So if you have any questions you wanna ask separately we'll be down here. Thank you everyone. Thank you.