 So while we are sorting out technical difficulties with our display, let me just get started and introduce you to the purpose of this talk. So my name is Pertik Oli, I'm with Intel, this is Kevin Kluis, he's from NVIDIA. We are here to give you all an update on what dynamic resource allocation is. The goal ideally is that when you get away from this talk, you know why it's important hopefully and why we are discussing how to do certain things and what the alternatives are going forward. Because there are a lot of intense discussions about how to do things right for the projects in the long term and not everything is yet perfectly clear. So we need to have more discussions and we hope to enable everyone here in this talk to participate in those discussions when they come up with vendors, with hardware, with users who want to do something with AI in Kubernetes. So I'll just get started without slides, that's fine for a while. Dynamic resource allocation is a Kubernetes enhancement. So it started about two or three years ago even when we discussed how to do a new API that addresses all the shortcomings that we were seeing at the time with the traditional device plug-in interface. So the device plug-in interface is what's being used for accelerators today, but it's very limited. Only a single container can access an accelerator. There's no way of sharing it intelligently in a workload. Additional parameters are problematic, they just don't fit into the existing API. So what we started designing is a new API that explicitly models a resource, something that you can ask for in the cluster and you reference the resource claims from different ports to describe your workload requirements. The initial plan was less ambitious than what we are doing today. The original plan was to enable hardware vendors to write drivers that then influence scheduling and do all the things that is necessary for their hardware in their own driver. And we're still sorting out the issues. Should we try my laptop? Maybe I'll have all the demo stuff. Yeah, that's what would be a pity. So the original, yeah, but now it's the time where some slides would be nice. So we have two caps pending currently. One is the original cap with what we now call opaque parameters. And we do have some material on how that works, how it influences scheduling. We also have a newer thing that we call structured parameters, which is just the new thing that we started in 130. And that's doing the scheduling differently. And the difference is also what we will have on the screen at some point. And we are now in discussions with John and other folks about what exactly are we expecting the structured parameters to handle. Because it long term, it's probably the future for this whole feature. We just need to figure out what we need at which point. And that's where we need feedback also from end users. At the desk, now I'll just use this one here. And I'll have a peek at the slides while I'm talking. So we'll not really get started. I think I tried already doing introductions so we can skip over that. Let's talk about the opaque parameters approach. So this is a few slides forward. So I talked about vendor behavior. We now are at the point where we need to reach out to more people to write DRA drivers because Kubernetes itself, a DRA in Kubernetes, doesn't manage any hardware. It's all delegated to drivers to some extent. And without a driver, it doesn't do anything. The usage model that we had envisioned is that the vendor also defines the parameters that go with the hardware. Let's go to the next slide. The user then creates parameters for their resource claim as a standalone object using a vendor CRD. And that CRD describes what the user wants, basically influencing the scheduling. And it also has configuration or a setup parameters that need to be passed down to the actual driver on a node to set up the hardware. All of those goes into these claim parameters. And with the opaque parameters on the next slide, the handling of those parameters is done entirely by the control plane component of a DRA driver. The driver also gathers information about its resources from the nodes using some mechanism that it finds useful. Usually it's done with CRDs again. So the vendor driver then knows about the resource, where it's available, where it could be allocated. But at the same time, we have the scheduler knowing some other parts about the pod, like where is memory, where is RAM, where are CPUs available. So we kind of need to coordinate that. This is mostly for what we call delayed allocation, where we really wait for a pod to start being scheduled. And then we look at the situation in the cluster, figure out where a suitable node is as far as the scheduler knows, and as far as the DRA driver knows. The communication is through this pod scheduling context that you see in the middle. This was one of our ideas at the beginning, how we wanted to, well, only enable DRA drivers to do their thing and not have to modify Kubernetes too much. That was the original motivation behind this whole design. The problem with this approach is that we then later on figured out that, well, we really can't do cluster outer scaling, for example. But before we get to that, perhaps let's try to do the demo, because we can show off all of this working in reality with an example driver that you can try out on your own local Linux laptop. So Kevin and Alex say, who's also here in the audience somewhere, they have worked on this example driver. We've been talking about this a few Kupkans already. And as I said, it only needs a kind cluster that runs on Mac, in this case, or on a Linux machine. And we'll show how this works in practice, what it looks like for the user. And it's very similar to real GPUs. That's what it's simulating. So if you go to this URL, it's under Kubernetes 6. We have an example driver for DRA that you can play around with. It runs on both Mac and Linux. And what I was planning on doing right now is just kind of walking through this so you can get a feel for what it looks like to actually play around with one of these drivers. If you start from the top, go down. First thing you want to do obviously is you want to clone this repo and then you can CD into the DRA example driver demo folder. Once you're there, you can run a script to actually build the driver and then eventually create the cluster. So the cluster that we create is run in kind. And this will bring up a kind cluster with all of the necessary pieces in place to be able to eventually deploy this DRA driver and then use it to allocate resources. So I've already gone ahead and done these two steps because they take a few minutes. But it's basically just builds the driver inside a Docker container and then brings up a kind cluster with all the configuration we need. And you can see over here where I've already done all that. And the next thing I would want to do is just quickly show you, if you do kube control get pod-a, you can see that there's currently nothing running there other than the standard set of services that you imagine having run right after you brought up a kind cluster. And the next thing you can do is grab this home command and deploy the driver itself. So this is the first thing that I'm actually going to do live here. So you can see that that deployed successfully. And then if I come and grab this command, then we should be able to see the two different components of the DRA driver actually running. So this is set up in a single node cluster configuration. So I've got, you know, the control plane and the worker node running or the worker both on the exact same node. So if I go ahead and paste this in, we should see both of these running. So we've got the DRA driver controller which runs on the control plane and then a single kubelet plugin that's running on the worker node itself. And if you had a multi-node setup, you'd expect to have multiple kubelet plugins that's running as a demon set one per worker node. Then we didn't go into the details of how these two components communicate with each other, but at least in the opaque parameters model, the communication that goes on between them in order to advertise resources and then eventually allow the controller piece to allocate those resources is completely custom to the driver that you've implemented. And so in this example driver, the name for what we've built is called the node allocation state or NAS for short, the NAS. And so if I go ahead and paste this in, I'm basically going to print out what the current node allocation state of that node is. And the things to note here are that there is a section for allocatable devices. Our example driver just operates on a set of quote unquote generic GPUs where the only attributes that are associated with them are the product name and the UUID. And so as the kubelet plugin comes up, it's going to advertise this as a custom CRD that we call the node allocation state. And the controller will eventually pick this up so that it can make its allocation decisions later on. Next thing I want to do is I want to then just deploy some actual workloads against this. So if I just paste these in here, we see those being applied. And before we actually go and look at the output of those, I want to just jump over to the actual YAML for these themselves, where there's a couple different tests that I ran. You can see that, you know, GPU test one, two, three and four. And each of those match the pictures that you have down here. So it looks like I currently have GPU test four open. And that's the one that's in this configuration where I have a single pod with a single container with a single resource claim. And that claim is asking for access to four separate GPUs. And normally I would build this up from left to right, but let's just start with the most complicated one because that's what I have up on the screen right now. Where the basic idea is that the end user would, you know, produce this YAML file that you see here where there's a vendor specific claim parameters object, which we've called GPU claim parameters. This is defined by the driver itself. It's a CRD that gets installed so that users can create them and use them to both select and configure the resources that they eventually want allocated to them. So in this case, the end user would have created a GPU claim parameters object, set a count to four saying that I want four GPUs to be allocated to me. This is a, you know, a custom CRD and then the entry CRD that we have called resource claim template is where you link these two things together. So if I instantiate a resource claim template called multiple GPUs, I tell it what resource class that's associated with. This is also something that gets installed as part of the driver, which you can actually see back in this other tab over here. So as part of my Helm install, I would have installed this resource class, which gives the resource class the name GPU.example.com. It links it to the name of my driver such that when my controller and my Kuba plugins come online, they will register this back with the scheduler so that he knows what the name of the driver is, that he can communicate that with that to make scheduling decisions. And so with this resource class in place, you basically link the resource class name to the driver so that when I go and create this resource claim template, I just say what resource class that this is, what vendor specific CRD I'm using to make my selection and configuration of these devices once they come online. And then down inside my pod itself is where I actually reference this resource claim template. And so there's a new section called resource claims inside the pod spec. You give it a name, you tell it what its source is. In this case, it's the multiple GPUs resource claim template that I've created above. And then any containers that you want to get access to this, to the resources allocated to this claim, you put inside a new claim section of your resources spec. So in this case, we're using this local name GPUs that we created for this and referencing it here. And so you can see how this matches what I've shown in this picture here where I've got a single pod with a single container who's requesting access to these four GPUs, which goes through these two levels of indirection here to give access to that. So yeah, so if I jump back to my first tab where I've actually ran these guys, I'll go ahead and quickly just run, you know, kube control, get pod A to display, show that they're actually running. The output here is a little bit convoluted, but you can see at the very top is where all of these GPU tests are. And you can see that all of them are currently in the running state. If I then come and I run this, you know, long somewhat convoluted command, it's the one that's going to print out some output from each of these to show that they were actually allocated properly to the, to the different containers in this configuration that I showed here. So in GPU tests, one, we expect to have two containers, each accessing a unique GPU. This is something you can already do with the device plugin today, but you know, just for parity, we wanted to show that it's also possible with DRA. So you can see that each of them has access to a unique GPU versus GPU test two, where I've got two containers in the same pod requesting shared access to the GPU. You can see, you know, single pod, they're both called pod zero. I've got container zero and container one, but the GPUs that they have access to are identical, which you can notice by the UUID that's printed there. GPU test three is two separate pods with a container each. They want shared access to a GPU across the pods, which you can also do with this. And that's what we see here. So pod zero versus pod one, they both have a single container, but they're accessing the same underlying GPU by its UUID. And then GPU test four, which is the spec that I walked through, you can see that four separate GPUs are allocated to that. And so if I move on, you know, there's the output that we would have had printed there. And then the other interesting thing to actually look at here is to look at the NAS once I'm in this state. So, you know, that node allocation state object for the custom communication between my controller and my, and my Kubla plugin. So I go ahead and grab that and print the output of that. We can see that in addition to the allocatable devices, which is all that was contained in here previously, there's now also some sections here for allocated claims. This is information that the controller would have written back into this once it's made an allocation decision about what GPUs it actually wants to hand out. And then once they actually land on the node, the node will prepare them and then they fill in this section. And, you know, this doesn't have to be the communication protocol that gets followed. We did this mostly for demonstration purposes inside a single CRD so that we can print out one thing. We see it all in one place. But in reality, what's that? It's changing too. Yeah, we'll jump to the future. So again, this is, this is how things look today with the opaque parameters model that Patrick was just talking about. And so if I just was to complete going through this demo, I'll leave this as an exercise for you guys to do later on if you're interested. You can delete these and then you'll see the NAS object get cleaned up where all of this extra information about allocated and prepared claims eventually disappears. So let me jump back to the slides and we'll move on from there. So we have a few slides explaining how this works. But I think we'll just skip through that quickly. It is partly where the name comes from. Next slide. So the dynamic part is actually what I was thinking as the back and forth between scheduler and the drivers trying to figure out a suitable node. There is back and forth updates of the pods. We could just keep going. So there is some back and forth where the pod scheduling context gets updated with new information from both sides until eventually the scheduler makes a decision to try to allocate for a specific node. It tells the driver to allocate. If that works, eventually we are ready to run the pod on a node and it gets scheduled. But that's the old thing. That's the old approach. We are now reconsidering this whole approach because the ideas that I had about supporting cluster autoscaling with this approach just didn't fly that well. It centered around rebuilding autoscaler binaries with custom vendor logic compiled in. That would have been one option or having a GRPC mechanism between the autoscaler and DRH drivers because the pod scheduling context object gets live in the cluster. The autoscaler can't use that as a communication mechanism like the scheduler can. So we had to come up with something else. And the new approach that is now called the structured parameters. It's currently a separate cap. But depending on how we decide to move forward, it may become the dynamic resource allocation proposal. And the other stuff might get delegated to some secondary thing that will keep alive while it's still needed. But we can come to that. So the key difference now next slide is that we do have, that was one slide too many, but anyway, we now have built-in types for resource claim parameters and resource slices. So the resource claim parameters, that's the built-in type that kind of mirrors what was done with custom CRDs before, but it's defined and understood by Kubernetes. Same with resource slices. That is something that Kupelet publishes on behalf of the demon said on the note to advertise which resources are available. And all of that is done in a format that we call model that is understood by Kubernetes. And what that model is is kind of what we are discussing right now. But because it's defined by Kubernetes, we can write code and compile it into the scheduler. The scheduler plugin that handles resource claims now knows how to interpret these parameters, how to match the requirements of a claim against the available resources. It makes the decision also itself based on its knowledge and goes ahead. There's no communication back and forth anymore during pot scheduling between the scheduler and the driver. For the user, it almost looks the same. They still create a vendor CRD with their parameters. And the job of the vendor driver now is to translate those CRD parameters into the entry format. It doesn't need to be that way. We could also allow the user to create the entry object directly. But we see some value in doing it this way. So that's the current approach, the current process. I think I've talked about that already. So this translation step basically makes it completely transparent of user. There's no user visible difference. It's almost like an implementation detail. And it ideally should be as capable of what was possible before. Next slide. So this is what we had before. The pot scheduling context being the object where we communicate scheduling decisions. And now it's a lot simpler in a sense that the scheduler just gets informed about what the claim needs, what the cluster has, and then it makes the decision itself. It can also do that rapidly. It can basically assign in memory some resources, schedule one part, move on to the next part and do the same thing that it does today with entry or built-in resources also for resource claims. So it's conceptually a lot more like Kubernetes currently works, except that we now have to define this model that actually captures real-world use cases around accelerators. So yeah, so the very first resource model that we started working with, what I'm calling here, the reference implementation for 130 for structured parameters looks kind of like this. So this is an object called a resource slice. This is what the cooblet will eventually post back to the API server for the scheduler to be able to look at and figure out what resources are available for allocation. And this information that I'm showing here is just an example of what we've implemented for structured parameters in our NVIDIA GPU where the basic idea is, and the reason it's called a named resources model, is that it's basically a list of resource instances that have a name and a set of attributes associated with them. And each of those attributes can be of a certain kind. It's either an int, a string, a quantity, there's various different types that you can associate with the attributes that exist there. And so that when you go to try and do selection on the GPU that you actually care about, you can list these attributes to help you narrow down the set of resources that you actually want. And so yeah, this as I have written in red here, this becomes basically a flat list of named resources with a list of attributes. And this is what your cooblet plugin will pass back to the cooblet so that it can advertise this to the scheduler. And if you're familiar with the existing device plugin, this is basically what you have to add to these cooblet plugins to enable us to send this back. Where basically you grab a structure that represents your model, you open up a node list and watch connection back to your cooblet, and you send these resources over that interface. Exactly like you would have done with the existing device plugin API, except that now you're not just passing back a list of opaque strings, you're passing back a list of those strings and attributes that are attached to them. And instead of the cooblet, just taking the list of strings that you pass back and converting that into a count that it writes back into the node object, it's actually going to create one of these resource slice objects in the API server and dump all of your information into it. So the names and all of their attributes associated with it. Yeah. And then, you know, first you stream the, you send the initial bit and then, you know, you would continue to stream them if there were any updates in terms of, you know, your resources going unhealthy or whatnot, which again is exactly the same thing you do in the existing device plugin API, just with less information. And so from that picture that Patrick showed a minute ago, this is kind of the bottom half of what we're now doing with the structure parameters model, where our cooblet plugin advertises these resources over that streaming interface to the cooblet. The cooblet uses those to advertise them back to the cooblet scheduler using this resource slice object in the API server. And then the second half of what we do is being able to continue to support this vendor specific API for selecting and configuring your resources. And so the example that I'm showing here, we would have a controller for our DRA driver that instead of just using this vendor specific claim parameters object directly, it's going to generate an in tree object, which you've now added as part of the structure parameters model for resource claim parameters. And so your vendor specific resource claim parameters gets converted by your driver controller into this in tree resource claim parameters so that it's it's a standard format that the scheduler can actually use to make its decisions. And so if we start with what we have here on the left, you can see that my vendor specific object is called GPU claim parameters. It's in this, it has a specific API version and it's called a 100 in this case. And if we then generate this resource claim parameters object that you see on the right, we're going to add a section called generated from which refers back to this object. And so you can imagine how the controller is actually built to do this. It's got an informer that it's watching when all of these resource claim parameters objects get created. And anytime one shows up, it generates this resource claim parameters object. And then the scheduler is watching these resource claim parameter objects, the in tree ones to figure out when it can actually pick that up to make a scheduling decision for these resources. Associate with this in the in the in the vendor specific API, we have a way of doing selection on our resources. In this case, I'm asking for some product that has the string a 100 inside of it. And I'm asking for a GPU that has less than or equal to 40 gigabytes of memory. And when we translate this into the vendors into the sorry, the generic resource claim parameters object, we turn this into a cell expression that the scheduler is able to interpret and use to select the proper GPU. And the parameters that exist in this are the same ones that we listed out as the attributes attached to our resources that were being advertised by the by the underlying node through the through the resource slice. And so, you know, the cell expression that's here is basically encoding a Boolean expression of the attributes that need to be attached to a GPU. In our case here, in order to to allow that to be a candidate for selection when the when the resource allocation actually happens. And this is the selection piece of of a resource claim parameters object. And then on the flip side of that, there's also a configuration piece of these resource claim parameters object. This is something that only the node level needs to know about. Once this this GPU has been requested for allocation, it's been it's been requested, it's been allocated by the scheduler, it gets passed on to the node, the node needs to know how you want this thing to be configured. And so we we pass along this configuration information via the the entry resource claim parameters object so that the kubelet knows what to do with it once it lands there. And in this case, we're basically saying, you know, I want an A100 GPU with less than or equal to 40 gigabytes of memory. And once it lands on the node, and it's going to be shared amongst multiple containers, make sure that it's using time slicing and the the long time time slice configuration rather than the short or medium time slice configuration. And then this is, you know, from that picture from before, this is that upper half of what Patrick showed where we've got our resource driver, it's got the controller piece. It's got this, you know, this this mechanism now to turn these vendor specific claim parameters objects into generic resource claim parameter objects that the scheduler knows how to to read and interpret. And with both of those in place, putting this back together, the scheduler now has all the information that it needs in order to actually select the node for scheduling and eventually you know, schedule that pod to the node. And when it does that, it follows the same procedures it would have done for opaque parameters. So we schedule the pod there, then the pod lands there, and then the Kublet goes through this, you know, back and forth with the Kublet plugin to have it actually prepare the resources, and then eventually pass that on to the container runtime and launch the container. As I mentioned earlier on, we have this example driver. I went through the demo of that. You know, currently in the code base for that, we only support the opaque parameters model. But I've started looking at it. I just haven't quite finished it updating this to include the ability to do to support the structure parameters model. And we plan to have that ready by the time kind releases, you know, an image that supports Kubernetes 130. At the same time, I actually already have gone through the process of building structure parameters and have a branch for it in our DRA driver for NVIDIA GPUs, which you can play around with if you're interested in seeing this work in that use case. Speaking of which, in this current model that we have so far, this named resource model, some of you may have read this document that I put together called the NVIDIA GPU use cases for dynamic resource allocation, where I outlined 12 use cases that we want to be able to support with the with DRA going forward. And with this simple named resources model, we actually already support half of those use cases, which is which is pretty good for how simple it actually is. And if you these are the same use cases that I outlined in my talk from from Chicago last November, where I walked through what I've called unlocking the full potential of GPUs for AI workloads on Kubernetes, it walks through all the different use cases of what you can't do with the existing device plugin API. And what we'd like to be able to do with DRA and demonstrates how that's done with the opaque parameters model, where in the current name parameters, sorry, the named resources model that we have put in place today, these are the six use cases that are supported. And then those are the six use cases that are unsupported currently. I'm not going to go through the details of all these. You guys can look at the slides later and click through these if you want. But the main point of this is just to say that by 131, so by the next release, we hope to be able to support what is it? I can't see for 10 of those 12. So there's two left over that that we still probably won't be able to support by the next release. And the main ones that we're missing that will enable this is the ability to partition resources to be able to support something like Meg, if you guys are familiar with that, which is a mode of operation for GPUs that allow you to partition it up into smaller pieces. So we basically have no way to partition these resources. We only have a way to name the kind of the top level resource at the moment and the attributes that are associated with that. So once we come up with a scheme for how to do the partitioning, which we have ideas around how to do, so I'm fairly confident we should be able to get this by 130, sorry, by 131 to push this a little bit forward. And then the last one is the, this notion of what I always call management pods, which have access to all of the resources without actually allocating them. So this is the idea that, you know, the device plug in itself or say some monitoring pod that you want to run, it needs access to these so that it can perform monitoring on these resources, but you don't want to allocate those to that pod because you want to make those available to other pods for consumption and exclusive way later on. And so these are the two things that we need to figure out the right way to support for 131 to be able to enable, you know, 10 of these 12 use cases. And then the last two here, we have ideas around how to support the first one that's here. This notion of custom policies to align multiple resources, such as GPUs and NICs. You know, just a notion John Bellamart has been working on a proposal for how we can do this in sort of a more generic way. I'm not going to be, I'm not confident enough to say that we can have this figured out by 131, but we have some ideas and I see a path towards eventually supporting this, even in the structured parameters model. This last one though, I don't really see how we could ever support it with structure parameters. And it's the notion of having fully application specific policies for how you allocate GPUs to your application. And so this falls into the category of things like, you know, some very custom policy, which says if I have two pods, whose aggregate amount of memory they need from a GPU fits on one GPU, then give me one shared GPU versus if they don't, well then give me two GPUs instead of just one, right? So there's some very custom policies that might need to go into place to allow that to happen, especially if you also want to map in an RDMA interface, if there's not enough on a single node and so on. Maybe there's ways to do this. I just don't have good ideas around it now, so I don't know when or if that will happen. And just one thing I want to highlight at the end of all this is that at least the way that we've built this out for our GPU driver, from a user's perspective, no changes need to be made to migrate from DRA with opaque parameters to that with structured parameters. Everything's kind of handled behind the scenes. So if you've actually started playing around with this, at least for the six use cases that I listed at the top there with opaque parameters and you decide, okay, well I want to enable this now for structured parameters, you shouldn't have to migrate in code. And if you do, let me know because that's against the design. Yep, and now I'll hand things back over to Patrick. So we talked a lot about the things that we have implemented already, but we might not need in the future. So let's look at some of the options, what we can cut. And French style, we are in Paris. This would be one way of doing it. I hope we can do it in a way without hurting users too much or ourselves, but let's see what options we have. So I put together a few possible things that we can remove going forward. The first one is clearly this whole pod scheduling context concept and object. It currently is needed for delayed allocation in core DRA. Without that, we still can do immediate allocation, but it's not that useful. And the custom policy, the one that Kevin just talked about, doing something very specific for exactly a use case is currently possible with a controller. It will not be possible anymore if we take out that pod scheduling context, but perhaps that's just acceptable. At some point we have to trade off and lose some functionality to move forward. For no to local, the structured parameters clearly are the alternative that we are focusing on. So that's the intention. Network attached is a bit more tricky. We don't have thought that much about it in the context of structured parameters, but at least with a core DRA, immediate allocation would still work. So if you have something that is always available to all pods, regardless of all nodes, then immediate allocation doesn't need the pod scheduling context object and it would still work with a central controller. So next slide. The whole notion of that goes even further. So with cutting the pod scheduling context object, we would still have the core DRA or classic DRA with potentially a control plane controller provided by a vendor. If we also remove that when we really drop all kinds of extension mechanism for third party vendors, which we currently have because it was the core concept of original design with structured parameters, all vendors need to work with what we are supporting in terms of logic and choosing resources. There's no extension mechanism built into that at the moment. Network attached resources is how I would see currently, well, mostly the main use case remaining for classic DRA, but perhaps we can find something how to do that with structured parameters. The other alternative is that the parameters that we allow in the entry parameter object might become again, something that a vendor can provide. And if we then find a way to how to plug the vendor logic into a custom scheduler, we might still be able to do extensions, custom scheduler, custom autoscaler. We have a long history of supporting people building their own schedulers for such purposes. So we might still be able to support that going forward, even if it's not in the pre-built scheduler. But this is future work. It's not being done at the moment. The other thing, just for the sake of completeness, the vendor CRD support that we have with structured parameters. I think there are some divided opinions, whether it's a good or a bad thing. I think it's important because the vendor CRD gives us all the validation of the fields that we don't have with the entry parameter object. The cell expression itself is currently typesafe. So it's known in advance that it evaluates to a boolean. If we drop the CRD support and make it simpler, eventually it will be just the syntax and there will be runtime errors that will be very hard to debug for a user because it basically means they set up everything. The scheduler tries to schedule a pod and there is a cell expression error in filtering nodes that will be exposed, but it's not very user-friendly. The validation, I think, is a very important aspect of supporting the vendor CRDs. Okay. So my conclusion from all of this, I think that's the next slide. Yeah, the alternative model. So that's another implementation detail right now. I particularly was very scared about having to define a model that works across all the future hardware that Kubernetes ever will need to support. That seemed like a very daunting task, and I know people have tried before. I'm aware of a prior art in that area where people tried to model hardware and just couldn't come to some conclusion. Right now, built into the structured parameters approach is the notion that you can have multiple different models and then let's basically pick one for their resources. The parameters then must match, of course, but we have an escape hatch where we can say, okay, we do the named resources right now, but if we find that we need something completely different, then we can. We can have an alternative model added alongside and the types in Goa are defined so that Scheduler or any other component will see that it has an empty structure, one of many where the structure is empty and it doesn't know how to handle things. It fails gracefully, at least in that case. So yeah, the plan, I think we clearly need to build up on what we have in 130 right now. This is a long list of things that probably need to be added to make it more useful, but I think it's doable. I think we started on a few things already. We are discussing how to extend the model. We are discussing the management access, the use case that you mentioned that is currently missing partitioning needs to go into the current model, and then it's fairly complete. There are some other implementation aspects, a version skew between Cupid and the control plane is currently an issue because it needs to pass information in exactly the right version. So whenever we change anything in our structured model, we would need to update Cupid, which is kind of a pain for users. So there are some ideas how to address that. It's an implementation problem mostly. Scoring might be needed. So if we focus on these things, we think that for 131, that's the next slide, we can have something that's fairly complete and useful. And our proposal is that we focus on that, that we not yet cut anything because the opaque parameters are still have their justification right now because it is how vendors can explore the thing. They can come up with some ideas how to or what they expect from Kubernetes, how Kubernetes should be working by modeling basically the prototyping with opaque parameters, then come to us and tell us whether the structured model is sufficient and what changes might be needed there. If we follow this plan, it basically gives us a timeline. 131 where we do the major work that remains, freeze everything just to clean up after that. And then in 132 this year, it's an aggressive timeline, but I think it's doable. Then we can declare the structured parameters as beta, keep everything else as alpha and eventually when we are happy enough with what we have, then we can cut the alpha features, deprecate them, remove them from within born release and hopefully everyone is happy. Well, that's it. We lost some time at the beginning. Sorry for that. We have two minutes for questions. That wasn't the plan. Okay. So just at this afternoon, probably four o'clock, we're hoping there'll be an unconference and it's largely going to be about exactly this and that structured model and how we do the, how we evolve this from there. I guess that's my way of saying, about talk to this guy about all the tricky questions about the structured model, he'll answer that. Microphone in the back? No, Tim, go first please. So my question's around the kinds of limits and how this interacts with the hard limits like in this zone, I have 42 accelerators and there aren't going to be any more, it's bare metal or something like that. And also the sort of soft limits like in this namespace, you can have 10 accelerators. How do those things that you can already do a bit without dynamic resource allocation? How do they interact with dynamic resource allocation? So there is one pending PR about quotas, which is only counting claims. It's not detailed enough to actually count GPUs, for example. That is something that we could add now. We couldn't do that with the opaque model. The opaque model, we could only count the user visible part, which was numbers of claims per class, for example. But how big each claim is was completely opaque. One again, another drawback of making it opaque. With the structured model, we could define a quota mechanism that looks at all the allocations in a namespace and comes up with some answer whether that exceeds the quota and we could evaluate that to enforce quotas. But that is not currently part of the plan. But it would be possible. I think the architecture supports that. That was such a point in all that the controller could have blocked further allocations using some custom policy. So that would have been possible on the vendor side. You enter from NVIDIA, Kevin's colleague. Thanks for the update. My question you mentioned and the support of Deeks. I'm just wondering how to support the networking resources in general. Any plan, do you think this mechanism can be generalized to support if I'm the link or any of networking resources? Thank you. I think the parameter approach that we have for resources is a good, it could fit the information that you need to provide about networking requirements of a pod. So I think that's feasible. And also if we have a limited set of nicks that or SIOV, virtual interfaces that are a limited resource on a note, then you could use this mechanism to allocate votes. I think that is feasible. It's really just a matter of figuring out how to put the pieces together. Yeah, one thing I think for networking resources, the topology awareness, right? We have to have that capture topology information. Thank you. So the networking stuff is an active discussion right now, right? Kevin, that's what I was going to say. It literally in Slack, I have a message from Kevin that I haven't responded to because I was on a plane, right? So active discussion. I think there's an opportunity for this model to capture the RDMA sort of devices, the GPU specific nicks, but also possibly the more extended network model that there's a lot of talking about multi-network and how to model that in Kubernetes. I think there's a possibility that these things could come together, which I think is great. I wanted to ask a sort of loaded question. You suggested betaing part of the API, but not all of it. Roundtrip rule says you have to be able to go back and forth, which means you either drag baggage into the beta or you drop it from the alpha. So if we keep some things in alpha, they still would be in the API, the fields would still be there, but they would be disabled without a second feature gate. So my plan would be to have two feature gates, one core DRA and the other for say, let's call it Potskettling context or core classic DRA. Well, we need a name and this additional cluster, this additional feature gate remains alpha, disables all the API surface that we don't want to promote to beta and this would be some of these fields that we don't need in the structured parameters. The resource class currently has a field that says this class uses structured parameters, yes, no. This field would be alpha. It would be on by default and with beta, we wouldn't expose it. It would always be on the alpha field is what users can flip on to go back to the old model. It is baggage, I know, but it would be a behind the feature gate. I see where you're going. I'm not sure that there's a lot of precedent for doing it that way. I'm not sure the machinery will be in your favor, which you're not a stranger to, but I think I see it the other way around. We are doing this. We do have beta quality or GA APIs that contain alpha fields. We just happen to have the alpha fields already at the time that we go beta. I don't see such a big problem with that. We'll think about it when we get to review. Thanks. I know that I need to make it work, of course. Sorry, we're actually over time now. Is there a good place for them to follow up with you for more questions later? Hi there. Four o'clock. Okay. Thank you.