 Alright, welcome everybody. I think let's give it a few more minutes and let other people join. How's it going Eric? I see you on the call. Yeah, I finally figured out how to make it into this particular call. I'm good, how are you? Good, good. How's everybody doing? Everybody else? I hope you guys are doing great. Still at home, nothing changing. I'm trying to stay sane. Yeah. I'm pretty sure I'm going to volunteer myself as soon as possible, just work from anywhere but my house. But safety first, we'll see. Yeah, California has actually gone into lockdown phase two. So it's just kind of crazy. So I think like most companies will say that you can work from home until the end of the year. Hey Michael. Hey, how are you doing? Good. Where do you live? I'm in Indiana. How's the lockdown there? It's not too bad because I'm not in a heavily populated area but still staying safe and staying away from things as much as possible. Are you close to a metro area? Not really about an hour away from Indianapolis. All right, cool. I think we have enough people. So yeah, welcome everybody. So we have you, Michael, today. So thank you for deciding to present. So I guess you're going to be talking about the net resource interface. It's something you've been working on, I guess, in the last few months. So yeah, we're happy to hear about it and want to learn more and hopefully we can make this interactive too. So yeah, everybody can just jump in and ask questions. Yeah, sounds good. I'll go ahead and share my screen here. I have to rejoin to share my screen. Sorry. No worries. I haven't used the default MAC permissions, I think. Yeah, sure. Oh, we're waiting for MIC. Zoom released a dedicated standalone screen thing for Zoom calls recently. Facebook one seems to work really well. The only problem is at the moment it doesn't do Zoom, which is a pity. But the Zoom one clearly does. I think we should be good now. Are you able to see the slide deck? Yep. So this is the first time I've presented this to anyone. So you all are the first stop in showing what I've been working on. And so right now it's called the Node Resource Interface, kind of a sensible way to manage resources for containers. So the entire problem is that we have resource management and we're specifically talking about kind of C groups and topology on the system. So we have different workload requirements. You have batch workloads, latency sensitive workloads. Customers have their own SLAs and SLOs. Or you have different classes of workloads like P1, critical. It always needs to run. And then all the way down to where batch would probably be classified as like P3, P4, things like that. When you think about resource management with containers, CPUs are a big one today, especially when you're wanting to run multi-tenant workloads on the same system with batch and latency sensitive web services, things like that. You want to be able to schedule containers on specific cores, whether they have entire cores or running on hyper threads. And then we also have NUMA allocations where you need to have your workload running on a particular socket because that's where other devices are attached and so on. And then also we can dig down even lower than like NUMA and CPU on to like L3 cache and then huge pages. And then, like I said before, kind of proximity, what socket is my GPU connected on, what sockets are my network cards on, in large deployments, it's kind of things you have to think about at the end of the day. So this creates a large matrix. There's a lot of things to consider. And there's some current solutions. So Cubelet today has the CPU manager and there's a few caps that are already outstanding with the community on proposing like how do we improve this, how do we start adding NUMA support to this. But when I was researching this, there's a lot of weird UX. So like your guaranteed pod and your request equal your limits, then you get CPU sets and you're scheduled on dedicated cores. And I don't think that's very friendly, very friendly UX. It's kind of hidden away and you have to know the right knobs to turn and things like that. And it's off by default. And then there's topology manager, which basically only the CPU manager and device manager take advantage of. And that's provided by a hit providers. So there's another solution from Intel. They have the Intel CPU manager for cube. There's a CLI tool called CMK, which does kind of the low level allocating pools. And placing workloads within those pools, depending on different things and picking what CPUs go into what pool. And then they have the CRI resource manager that builds on top of CMK for use within cube. And one thing with that is they have to hijack the entire CRI socket. And they also have some API extensions for this. The CRI interface is kind of a really big interface whenever you only want to deal with like scheduling containers on specific cores. So overall QoS is hard. There's lots of users putting on your scale. It's hard to solve this for everyone. So my kind of proposal from the beginning is with the Cubelet having CPU manager and these hard, these implementations within the Cubelet cores. We need to focus on APIs and not implementations because it's very hard to solve a problem like this that's very dependent on scale and what devices are on machine for everyone. And so it's better to focus on the APIs to enable people to build their own solutions. So if we look at kind of existing interfaces, we have CNI, the container network interface, and this is something that really stands out to me within the container community. I think it's simple, elegant. It's extensible. You can compose various plugins together. And there's no controversy that I've seen within the design of CNI and it's basically we've all accepted it and use it within cube and other container projects. So my proposal is let's make CNI for resources. I like this API. Everyone that worked on CNI should be proud of what they built. And we need to start extending APIs like this into other areas for containers. So came up with NRI because CRI was already taken. I'd rather be named container resource interface, but we already have container runtime interface. So NRI is the best I could come up with right now. And designing this, like, I don't think kubelet is the right abstraction for this. We have the kubelet and then we have CRIs. And CRIs are very low level. They know how to interact with the system, whether you're on Linux or Windows. They deal with this, but then the lines between kubelet and the CRIs are starting to get very blurry. There's C groups code in the kubelet. It's hard to tell who's responsible for resource management right now with CPU manager and topology manager there. Even at the CRI level, we have very robust ways to hook into the actual life cycle of a container on a system. So it goes way beyond like OCI runtime hooks where you hook in kind of the recent developments are we have this create and start split within OCI where create will create the name spaces set up C groups. And then we have this pause in between and this is where like CNI comes in and where I'm proposing NRI comes in where it can take what's the existing set up. You can modify the resources add additional things and then we start the container, which is the user's process. So designing this kind of taking a lot of inspiration from CNI we have kind of a global systems config, and you have a list of these plugins where you can compose these together chain them plugins can have specific configuration. So for this confined plugin we have system reserve cores where we say when you're dealing with topology and scheduling these workloads on cores I need zero and one to be reserved for the system so don't touch those and you have the rest of it. And then kind of enable to build a good ecosystem around this like CNI is done you need kind of skeleton code make it easy for people to build these plugins and not get in their way so kind of worked on packages for as a plugin developer I want to develop a plugin that does X. Here's here's all the skeleton and boiler plate so you can quickly get up and running. If we look at the integration in the CRI level. It's extremely simple because I believe CRI is the right place to add things like this. So, you would go into different life cycles of the container and in the create step you would invoke the NRI plugins at this step. Deletion you would invoke the deletion handles and then we start the container so it's very robust we have explicit places to inject these in the life cycle of the container and it's not. It's not bleeding over into other people's functionality in the stack. So during this work I worked on this confined plugin with the cave or cube and what it does is dynamic topology management in QLS so as pods and workloads are scheduled on the system they're labeled with their QLS class and depending on whether they're latency sensitive they get placed on entire cores. The way this works is you have kind of a default pool which is where batch workloads go batch contains their CFS quotas as well so they can't use the entire core but if a licensee sensitive service comes in. We go ahead and clear the CFS quotas on that because they're saying this workload needs two cores they get allocated the entire core and they're able to use all of that and this does it dynamically so as latency sensitive and batch hit the system we keep high utilization because if a latency sensitive application stops those cores can be returned to the batch batch workloads. So we kind of build a dynamic node topology. It dynamically places workloads on the system based on the QLS class we have numerous support so if your latency sensitive service says I need to be on a specific Numa node or I need to reserve the entire Numa node it can do that and it will still still that node away from the batch workloads or return that whenever that workloads done. So kind of with these plugins there's no need to wait for longer cube release cycles for updates to CPU manager topology manager. You have a community being built up of all these plugins within our eye you just update them as you need to and you're not tied to cube release cycle as you would be with CPU manager. You can kind of chain all these together you can make plugins that do one thing and do them well and it keeps your code simpler and more robust and things we care about at the infrastructure layer. And like if if a specific plugin doesn't work for you then fork a change it make your own or build more plugins for your own needs. So kind of on this journey. Sig runtimes first stop to present this and start getting some feedback. I'll have a formal spec up hopefully today within the container D project because it's kind of where I have my default implementation and hooks into CTR and the CRI for for cube in that project, and then kind of expand out to different sigs and things like that. So that's kind of the high level overview. And if you have any questions and stuff we can do that now, and I can go back to any slide if you need. So with this are essentially would replace the topology manager and the CPU manager. So, or it will actually work side by side or it will be something that eventually will become just one. Yeah, they don't work side by side because they would start to conflict with each other, or at this stage. It would be able to override the queue what so it's best, you would want to have CPU manager off which right now it's off by default. Yeah. So it sounds like the idea just to eventually kind of make this the standard later on, you know, maybe a CPU manager and topology manager can potentially go away, you know, maybe to improve the experience right the user. Or they could be broken out in the plugins themselves with those specific implementations but the, the overall goal was to have an API to develop this stuff and not try to code everything in the cube with. Yeah, I've seen the CPU manager the UX is kind of weird too, because I mean you specify the request and the limits need to be the same. So that's not really clear. I mean, and when you specify the limits in the in the pot spec right so I yeah I can see that just I mean the UX. Well it's worse than that you need that and you need it to be integer and it needs to be a Tuesday and then you get a user here. Yeah, yeah. And when I was looking at the Intel Intel CPU manager work, it seems like they could easily plug into this inner I interface for the CMK tool and do their existing work. And then they wouldn't have to implement the surface area of CRC or I to hijack into this they have a very specific API for doing this so I think I think it aligns well with the work that's being done there. I told Sasha about this he had a hard conflict this morning so I'll make sure he gets the recording. I think you probably be very intrigued by this and maybe have some good feedback. They keep slacking me showed him some of the pictures from it and he'll reach out. Cool. Yeah, is there like a mechanism where like you also get feedback from from the systems right so you basically some monitoring that says okay my I'm kind of full or I'm kind of overloaded can you move this stuff away from me and to some other note or something like that or Yeah, so my general idea of it was that the cube schedulers will still handle placing workloads on the node and if they decide to over subscribe a node or not that's kind of a high level scheduling decision where at this level it's hard to provide feedback back up the stack. Like, we could always kill a container but then the cubelet wouldn't know why we killed that. So, I think with different scheduling strategies you may want to over subscribe so I don't think it's too much of a big deal that we don't have a current method to for the CRI to kind of talk back to the cubelet on rejecting workloads or not, but it's something I've thought about needs to look into more. Yeah, and I guess, essentially, the scheduler will make those decisions. I mean the scheduler will have to get that information and based on that or decide where to move the pods. Yeah, ideally the scheduler like we're we're thinking more about type topology and placing on what cores the scheduler still knows we have 24 cores to schedule on so hopefully it wouldn't push too much workload where we're the issue. Yeah. So how does this sorry I missed the beginning part. This is Diane. How would this work with special devices like FPGAs and so this would handle all that because there would be a plug in for each specialized hardware type. Is that the idea. Yeah, I am working on a way to chain plugins together so they can get feedback within the chain of where they know who executed before them what changes they made but yeah it would be where vendor specific implementations like plugins can be made for devices that you need to do something more powerful on any what maybe just an example use case for that would be placement based on which device is because we're still looking at more topology, they would still have a device plug and everything else. You're just looking at where you would place the actual container or the pod processes with the context of where your devices and the topology to is that sorry like it wouldn't be a device plugin but it would based on using a device you would have have a wiser placement of the workload is that kind of the idea or expectation. Yeah, the expectation is we're giving you a powerful interface where where you can make those decisions. I had a quick question. First of all, thanks for a really interesting presentation and you've raised some, but I think a very fundamental and very interesting questions. Well, I was prefixed this by saying I'm not super familiar with kubelet and the various interfaces that are down at that level. See an ICRI and the stuff you guys are working on but but it seems clear to me that a lot of that stuff kind of developed organically and I think if I read between the lines what you're saying like not all of it is ideal if we were to sort of start with a blank sheet of paper we might architect things differently and I don't mean this to be offensive to anybody who is being involved in the current architecture but it seems to me like it might be useful to actually kind of sketch up what we think of reference architecture for these kinds of things might be and you know look at where we came from look at where we'd like to go to and this doesn't necessarily need to be kubernetes specific. This can be sort of from a point of view of like if you're going to do container orchestration in a cloud native way, you know these are the kinds of things you run into. And this is what we think is a good reference architecture and have you know the various different implementations of these things, at least have a kind of a guiding light as to what we think a good approach is to does that. I know it's a little hand maybe an airy fairy but is that something anybody's working on at the moment that we're aware of or if not is it something that we as a group want to take on as a project. Yeah, I don't I don't know of anything from my research like I think it's a good idea and like within that architecture, like my whole thing of let's focus on API is instead of implementations like we you can tell the cuba grew organically just like Docker, which is one project I worked on in the past. So like, you may compromises and you have growing pains things like that. I think interfaces kind of stand the test of time where implementations get refactored and, and you can always break things out like CPU manager could start and cuba it now and say like actually this needs to be factored out behind an interface. That makes sense. Is anyone else on the call sort of intimately involved in for example signal and does anybody know what the status of like coming up with a grand vision for the node interfaces stands because it's the other problem that and this is completely sort of anecdotally from looking from the outside it looks like a lot of these interfaces have developed somewhat in isolation from each other so you know the networking people that CNI and the container people did CRI and and it's not clear that that anybody with a holistic vision across all of these things necessarily kind of weighed in and that that may be wrong but that's the impression I get and I was you know I'm not doing very well and I was in the Google Kubernetes team who sort of started all the stuff right back so so these are very competent engineers but things happen the way they do organically and maybe it's good time to sort of look at that stuff. Yeah, it sounds good to me I mean I mean to the the next question that I have is that is a group of people interested in starting sort of like a working group and and address some of these issues right so and and Quinton you brought up the issue of maybe bring it up as some of this all the interfaces together so or people can have that communication across different teams right so that with you know so that so the degree on uncertain standards. What I've seen in the past is that a lot of the teams actually work on their own like like you mentioned and and you know a lot of these interfaces kind of come up and and they're kind of different in different ways. In this case, you're following the CNI aspect which is great I think you know there are some other cases where I mean some of the other teams are actually working independently and it may not be the best best experience for some of the end users. So yeah so the idea here is just to make the user experience more together and so that people you know make better decisions on how to want to use the tools and they become more useful in the end. Yeah, I think before we start any group to do any brainstorming in that regard I would definitely want to catch up with the P I think the day factor place where these conversations happen or did in the past happen anyway was was was signaled the Kubernetes signaled so we should definitely go and chat to whoever runs the show there and just get a sort of state of play from them. I don't want us to create some other you know working group that is now like solving problems that already have been thought about deeply somewhere else be very cautious of that. You can have a subgroup for. I'm sorry. I thought we already created the working group for a computer device interface or is it still in process. That's already created but I don't know if this is exactly within the scope of that. Yeah, so there's a resource management workgroup as well that because there's a lot of discussion around like CRI RM and apology manager and things like this because I don't think anybody's happy with it. As is and signaled because that was consuming a lot of time they created a different workgroup specifically focused on resource management. I haven't been attending that just because it's very EU friendly it's a little bit too early on my side and haven't had a reason but I'd be curious to see if you could talk and signaled but it might be a more focused discussion with a bit more focus opinion in the resource management one as well. And if we are to create a working group in a CNCF maybe we can expand that working group for continued device interface something more generic that would incorporate more more interfaces not just a specific one. Yeah. Yeah, that's another option or a feasible option I think but what Clinton brings up also makes sense which is, you know, talk to some of the other teams and stick down and we don't want to kind of, you know, say that we're going to be working on this and when some of the people might be already be talking about doing something similar. Yeah. So I think it's some work is to be done related to reaching out to some of those groups and if somebody's on the call today that it's in one of those groups like Clinton mentioned then they can reach out to some of those other working groups. Yeah, I think it needs to be someone with a with a good sound technical understanding of the issues at that level. I don't have that understanding and I would definitely not want to be the person leading this technical kind of coordination role. But, but if anyone is on the call or knows of anyone who would be interested in doing that kind of co if you have some tech leads on this group. Diane and others. But yeah, I think it would be very important to get the right person there there's there's a lot of technical detail but there's also a lot of coordination of different groups. This this sort of area has a history of becoming politically charged. We have networking vendors and, you know, container vendors and all sorts of people involved so we have to be very careful not to kind of create a 17th committee to kind of try and standardize these things if the people involved and actually want to standardize them the way we do. So, yeah. That's my two cents. So anybody on the call would like to take this on. I think rather than necessarily force people to make decisions now let's let's put that on our to do list of finding the right person for that role and put us sort of, I mean it's going to take a while to get all of this stuff together this is kind of like a multi quarter project I think. And it's going to be very important to start with the right people. We've got enough examples of attempts in this direction that haven't gone anywhere so let's put that on our agenda for the next few months to figure out the details and get the right people involved and they may not be in this meeting. You know they could be working for companies of other people involved here, for example. Yeah, make sense. Any other comments, anything you'd like to talk about related to this. Any other topics that you want to bring in. Sounds like that's it. Thanks for a very thought provoking talk Michael and thanks for getting all the interesting presenters together. Ricardo. Yeah, no problem. Thanks for having me. Thank you guys. Yeah. Thanks so