 Hello everyone and welcome to this panel introduction to the container orchestrated device group or COD working group. My name is Renault Cabare. I'm a software engineer in Nvidia and contributed to Kubernetes as well as the Nvidia container tool. I'm Mike Brown. I'm a member of IBM. I work on open source development in a group that gets a chance to work on all the latest graded stuff. It's a lot of fun. I'm particularly a maintainer on container D OCI projects. I work with the SIGNO team in Kubernetes and maybe a bunch of other CNCF projects. Hello, and I'm Alexander Kanievsky. I work for Intel as cloud software architect. My main focus and areas where I'm mostly involved is enablement of different accelerator devices and was specifically node resource management like the complex problems. Hi, I'm Renault Patel. I work for Red Hat. I've been working on containers for a long time, been a maintainer of container RunC OCI runtime spec, cryo, and I participate in SIGNOTE upstream. Hello everyone, I'm Rishi Monani. I'm an engineer at Red Hat working in the container runtime space and I'm a maintainer of cryo and participate in SIGNOTE as well. So let's start this panel with a brief introduction to the Cod container orchestrated device workgroup. We are a small group of device vendors, runtime maintainers and contributors, as well as Kubernetes SIG members. This working group falls under the CNCF runtime umbrella, SIG runtime umbrella, sorry. This is because we interact with many projects, such as Kubernetes, Nomad, containerity, cryo, and other projects like CADDA or in the HPC space, Singularity, or SARAS. The charter of this group really is to improve the support of devices in the cognitive space and across it. What that really means is that we're trying to enable new workloads. We're trying to smooth out the experience that you have as a user or as a cluster administrator. And we're also discovering and trying new ideas, such as defining the security boundaries for devices. And what that really means, what does it really mean to try and enable new workloads? Any questions that I'm going to be asking and looking very intensely at Alexander Framendo? Thanks for such introduction of working group. And indeed, the usage of devices in the past five years or so is growing in the cloud. And like variety of workloads are utilising devices nowadays. It's machine learning, it's data plane acceleration, encryption, compression, whatever things what can offload with CPUs. And devices which actually we are talking about, just to name a few, like we have NVIDIA GPUs, one of the most commonly used in the cloud. And from our mental side, we have FPGA, Quick Assist, also GPUs and some other devices. From different vendors, we have SmartNigs and obviously, all of those vendors who have devices wants to enable them in the cloud. And the way how it's enabled, it's evolving. If previously it was just like Docker run, dev, my device. Nowadays, the users are expecting to have it in a more complex setup, for example, like Kubernetes distributed orchestrated workloads. And by the way, Kudos to Renault and NVIDIA who drove the current implementation of device plugins in the Kubernetes, but actually simplified a lot usage of NVIDIA GPUs in deep learning model training. The main thing, actually, which it helped, it helped to change the mindset of cloud users, what devices is not something rare. It's something, it's a commodity what we can be utilising inside way workloads. And changing that in the mind actually brought a lot new subset of problems, both from a use cases perspective and from user experience perspective. So, if previously user wanted, I wanted the device. Nowadays, we are talking about a lot more complex scenarios. Like one scenario is we have a VM based run times and we want to use devices where we have devices, we have devices which have different properties. So, for example, users say, I want FPGA with particular bit stream loaded. Or we have connected devices where smartNIC says, I want to be connected to specific network and RDMA pipeline. When we have even more complex scenario, when the multiple devices within node is connected as a pipeline, like FPGA, NIC, special, Isaacs and so on. And when the worst case scenario is something what we are trying now to solve is multi-node deep learning. So, where you have devices connected between the nodes and you have a problem with the device become a resource, not as a single node, but a cluster resource. And to utilise that in the workloads, we end up with existing all extension points is not enough. We need to do a lot more the way how we describe in devices, the way how we prepare in devices, the way how we prepare in the workload and how we run it. And based on all of that, based on all the discussions that we have in the working group, we realise that what the different APIs between the CNCF components needs to be improved. Well, what a user experience needs to be improved to enable all of this. That's some pretty cool use cases. So, special devices are now commodity. Are they special devices now? What do you think you're rushing? I've heard about this cool ID called CDI, the container device interface. Could you talk to us about it? How does it help solve these different use case that Alex from Intel just talked about? That's a really good question. So, the container device interface, or CDI for short, describes the mechanism for container runtimes to create containers which are able to interact with third-party devices. So currently there is no standard for device support in runtimes and orchestrator engines. For example, Kubernetes and Nomad both have a concept of device plugins, but they have very different frameworks for these device plugins. Docker has an entry plugin mechanism, while Fodman has a concept of hook. And this holds true for other container runtimes and orchestrator engines as well. This lack of standard results in vendor is having to write and maintain multiple plugins for the different runtimes. So with the CDI, the plan here is to have a standard way of supporting third-party devices, so that the user experience is consistent regardless of which runtime or orchestrator you use. Portability between runtimes will be easier, and there won't be a need to resort to various hacks for the different runtimes, which in turn means that an ability will not be a nightmare, or at least much less of a nightmare. So on the next, it is quite straightforward to expose a device node into a container for a simple device. All you need to do is pass a flag and the device pass to your runtime. For complex devices like GPUs and FPGAs require much more involved operations, as Alexandra mentioned earlier. These operations can range from things as simple as compatibility tests, to my container run on this device, to device-specific operations like reconfiguring an entire FPGA or memory management and GPUs. Now, the CDI is only concerned with providing containers with the ability to be aware of devices. A task like resource management is not part of the scope of the CDI. This narrow scope greatly simplifies implementation of the CDI spec and provides good flexibility for runtimes and orchestrator engines. The CDI follows the model of the container networking interface. A JSON file is written to a well-defined path on the machine. The CDI contains vendor-defined information on device-specific as well as operations that the container runtime needs to support. And this information is used to transform the OCI spec accordingly, which in turn results in admins having a seamless experience while setting up dedicated devices. So basically, the container orchestrator device's working group is aiming to improve the supported devices in the cloud-native space. And CDI is our first effort in the field. That's a really cool introduction to CDI. Thank you very much, Urveshi. You just said that currently the space is really fragmented and that different orchestrators have different mechanisms. Rinal, you are a contributor to the OCI, which is a standard spec. Could you tell us a bit more how you thought or how you think this is doable or this use case is addressed today? Sure. So if you think about the OCI runtime spec, the way it works is you write a config.json that describes how a container is created and run. So think of the properties of a container, such as its root file system, like it could be based on Ubuntu, what you see inside a container, all the files, that's a root file system. The process that you run, it could be like a bash shell, the namespaces from Linux that are used to define the container and the security aspects of it such as SC Linux capabilities and so on. So when the OCI runtime spec was created, networking was one aspect that was considered as out of scope. And the reason being that there are tons of ways to set up networking for a container and different vendors would have different solutions at layer two or layer three. And the runtime spec didn't want to get in the way of trying to define every possible way to set up networking. So the idea was to introduce hooks which can then be used to join the network namespace of a container and then set up the networking as you see fit. So that was one use case. And another use case I can think of is enabling system D. So if a hook sees that someone is trying to start the system D process in a container, it can set up the right mounts, the right C groups for system D. So system D can start up seamlessly without any additional setup by the user. And the final most interesting use case is enabling GPUs. So GPUs have a lot of setup that is required beyond just adding the device node in the container, like setting up additional mounts, running LED config and so on. So initially this started with a custom hook that performed all these steps, like, but writing this hooks is not easy. You have to know the knowledge, you have to know the internals of how containers run, how RunC works with mount namespaces and so on. And then you have to join all these namespaces and perform these operations. It's a very imperative low level bug prone approach, if I would say so, and there are also some security risks. So if you think about it, RunC or any container runtime is better stupid to perform these operations. So writing all these hooks, you can see a pattern emerge that most of what these hooks are doing can be done by RunC. So that's why the idea of CDI is to be declarative. So CDI allows you to declare what additional changes you want to make your container, like you want to add a mount, you want to run a hook, you want to do some additional configuration. So instead of your hook doing all these operations, CDI is allowing you to make changes to the spec file. So RunC can perform those operations for you. So that's how CDI helps and simplifies the status quo. Thank you for this example and just explanation of what happens in the low level. Talking about low level, I know that Mike has contributed to the container. And I've been following a bit that space and I heard about this new ID that's called NRI. Could you talk to me about NRI? Is it the same thing as CDI? Is it something different? What do you think? I think it is somewhat different, but I think there's a huge opportunity right now to integrate the efforts between CDI and the NRI effort, as well as work with the Signo team with Kubernetes and coming up with additional integration. We haven't mentioned much about the container runtime interface. The CRI was put together by the Kubernetes team to specify how the container runtimes should manage these pods and containers. And over time, they needed more. They needed to manage the resources that are on these nodes themselves directly in some cases. They needed to monitor their health with probes. And they went around the container runtime and that became a problem in some areas. While we've all agreed to use CNI for networking integration at the container runtime level for the container runtime integration implementations, we think that's been a good model. It hasn't been the way we've done all the other resources. I think the issue, Rinal, is that some teams see this from a pod spec level or a container specification level, and other teams see it from more from a low level hardware implementation level on the node, and how do I gain access to those devices I can use outside of a container inside the container? And Rinal certainly had a really good way to solve that problem with the hooks. It is very low level. And Mr. Crosby, who created RunC originally, came up with this idea after looking at all this and asking a lot of questions. He came up with an idea to do a new node resource interface that would provide for the capability to add plugins that container runtimes would implement or be able to be able to run that would actually access these hooks in such a way that the container runtime knows that they're happening. And hopefully with CAPS that we'll do for SIG node, Kubernetes will also know what's going on and be able to manage those resources from a higher level perspective by talking to the plugins through CLI. A lot of work to do here and a lot of opportunity for all the groups that hopefully are listening to this. We need help. If you go to container D slash NRI, you'll see a brand new repository that Michael created, a bunch of code, some samples. And we're looking for ideas. It doesn't support pods yet, but another PR will push that in so that you can actually create plugins to manage the resources for your pods and containers using the hooks underneath the covers that Renal and team put together and run C. And hopefully we'll be implemented in the other runtimes, yet we'll have to work with the runtime teams, like like Cata containers to make sure that when we at the shim layer where we host these instances, you know, for VMs or containers that we've got, you know, the ability to manage these hooks, to manage these access to these resources in the containers. So when the container writer just wants to use it, it's just there, and it's easy to use, get a better user, you know, a better user feel when they create their pods facts and things just work. So what I'm hearing from you is that there's a lot of work, but it's exciting. All right, let me go back to, let me go back to the roadmap. I think it's really important that we talk about this groups roadmap to finish the presentation. So, because we know intimately the problems, we've decided to tackle it through a later approach. Right now, we're solving this core problem, which is how do we expose a device to a container. And the solution that we've come up for that is CDI. And as runel has mentioned, it's, it's a very declarative approach. And it's something that we are really excited to show. As we continue through this process of solving these problems. The next step here is the node level. How do we select which device needs to be assigned to which container. And the problem that we're trying to solve here really is when you have workloads that are very sensitive to performance when you have workloads that have multiple devices talking together. You want to be very conscious about what CPU, what memory, what Nick, what GPU, what FPGA, what ASIC you're selecting. If you have a CPU that is very far away from your Nick, or a Nick that is very far away from your device, you might destroy the performance to the point that it might not be even worth talking to that Nick. And so selecting which device needs to be assigned is a very difficult problem. The next step here is when you start looking at it from a cluster level. When you have workloads that need to run on multiple nodes and communicate together, or when you have devices that are over the fabric. Typically, you want to have these devices talk to each other, that are talking to each other be very close, for example, on the same rack. And so the problems that we're trying to solve here, they require the ability to figure out where the right knobs to expose to the users where the right plugins to create and plugins systems. And they're really exciting problems, but also very challenging. And so there's still a lot of space to be discovered here. And this is the conclusion of this panel is that we are still a very new group. And there are lots of really exciting and challenging ideas. We'd like people to contribute and help us figure out where some of the solutions that could be done in the area. We would like feedback, some of these ideas are going to interact very, very strongly with other ideas out there. And how do they intersect. So if you think that CDI might help you, we'd be more than happy to have feedback from you. If you think that CDI might be something that that you could be used, that you could use, if you think that CDI is something, or some of these ideas that we're talking about you want to be able, you want to integrate, let us know, we'd be very happy to hear about your use case. And the way that you could let us know is go to the Cod working group meetings. They happen every other week on Sega Runtime Zoom. And we're really waiting to hear from you. And with that said, I'm going to open the floor to questions from the audience.