 Greetings. Hi everyone. Hello, good morning. Hello, Victor. I've dropped a link to the meeting notes. If you can add your name and if you have a topic, I'd like to go over. Add it to the meeting notes. Or if you're having trouble accessing the zoom doc, you can either speak it or drop it in the zoom chat. All right, we can get started. Google Docs is running. Actually, everything's running a little slow. If you didn't see it, there's a link to the meeting notes and the zoom chat. Every posted. Does anyone have any items to add to the agenda for today? I'm going to review the. The. That should be less. White paper. Last week. Are you referring to the tag security white paper? No, the Google Doc white paper. Oh, the principle is privilege. Yeah, that's mine. All right. We can take a look at it. I think that maybe. Follow up to. We still have some pull request. And I'm going to add a related paper from tag security. The Cuban. CNCF tag security. Clear formatting. There we go. Or anything else. I'm bringing that one up. So this. This is the. It's at notes. And I guess the start of. What could be some papers. Or. And or use cases around. Lease. The principle of lease privilege and security in general. Probably. So that's the link. We'd go into a lot of different practices. And supplemental papers that we could go into. And so that's linked there for this. If, if you don't have. You should have access to the working group. So this link is from the working group notes. And. Ian went through this last week. Some comments and stuff added. So this whole document here is. Can be used as reference material. Anyone in the community that wants to. Go and use this. We don't have a specific license, but I think if it was going to drop in, it would be tied with the same on the repo. A creative comments. License to cover. But essentially this is a. A large chunk of reference material around security related. Thanks. At least privilege was one of the comments. Or one of the areas. And then that got down into more specific things like the. No routine container. So why don't we want to do that? But it covers a lot of other things. And the top part is to try to get something closer to. Whether it's a write up like a white paper or a. You know, whatever we want blog. Or just a supplemental stuff like in use cases. Let's see. So some of these sections. I don't know who all this last week. So this may be. Review for you. But the first part is, why are we doing this? What is the problem? I'm going to go ahead and bring up the. Let's see. I've lost the paper out. Here it is. This is a good. Reference that we didn't go over before. Ian and I were working on. All of this content, but. There's a lot of overlap in this. So the tag security white paper. Which is written by folks that are. Helping involved in running the security. Group. Including people from like Falco or sis. Sys dig who. That's what they do. Like. Their expertise and as in security. There's some people I've seen on the. Group like from OPA. So the open policy agent. And other groups. They're sharing a lot of their experience. With this. And they do have a section on. Least privilege. I can find it again. Let's see. This privilege. There we go. Think I've already linked this section. So that. They're talking about this. This is building up on a lot of other areas. The zero trust architecture, which Frederick. He's talked about in this group and other places. So I zero trust architecture. But this is building on all of the other security. Principles and practices that they're talking about. And they go into authentication authorization. So one of the areas in Kubernetes. That we haven't talked about, but could be. Talk about using our back. So role based. Authorization and control. And there's a lot of stuff out there on that. There's books about it. There's videos you can watch, but talking about using roles. To limit. Access. For both machines and as well as. Real people that are maybe manually interacting. So that's part of this. So the account layer down, they're going into that. But they're saying every stack, every layer of the stack, you should be doing it. Talk about rootless services and containers. So this was one of our focuses. So we're trying to stop access between containers and host and multiple, you know, containers, other things. And then the roles and namespaces aspect of this, as far as privileged access. I can limit that type of problems. And then you have. Ways that you could implement this. Which you can see. If you're doing custom implementations, or if you're using a. A cloud or a service base place like. Open shift has by default. Limited access on privileges. They enable that. There's a lot of stuff built in going through from. Well, if, if you're familiar with all of that history, building up into the open shift to, to be secure. And they're, you're starting to see that in other places by default. And there's a lot of stuff going into Azure. Around this. I'm not going to go through this whole white paper right now. It's pretty huge, but if. Folks want to point something out on this one, we could jump into that and focus on a particular area. I'm going to switch back to the working session notes, because we had some other comments. So one of the things that we're trying to cover in content, not to say that it's going to be in a specific use case that we may write up first, but we'll want to have it available somewhere. So it could be supplemental. It could be somewhere else's. Where do we start having problems with privilege need the need for privileges? Why do we want privilege? So we have some stuff here and you can see similar things if we go and look at other reference material and, and the areas, the performance networking. Put in some other areas where. There may be a need to have privilege or why it might be used. And then the question then is, is there something where. There's already something in the works to make it a native part of. I guess the ecosystem to be able to access those resources. So Numa access. Well, that's about. Trying to be specific with the memory allocation. And there is stuff in the works. If you look at, I think it was like the one 18 forward in the. Cubernettis trying to continue to move forward as far as being able to request access to those. In the networking domain. Fine grained access is usually desired and necessary. In the CNF test bed itself, we've done a lot of stuff where we've gone in and various tests to expand beyond what Kubernetes capabilities to do. Cpu pinning. Numa's own alignment and other things. But those things seem to be growing. And then you got a bunch of other areas. Special. It's just hardware resource. What we're talking about is, if you have resources. And you want access to something. Then how do you ask for that? And if there's not a way to ask for it. If there's not a. Let's say Kubernetes native potentially, but also without asking for privileges, then this might be an exception. And what we don't get into here, but I think what you could find in the tag. White paper. And in other places are. When you're giving something privileged to try to isolate that one component, even in your own application. Victor, did you have any specific areas that you wanted to focus on out of this session doc or. And do you know that you put this area here that you thought was important? Well, yeah. It's one of my. Comments. You know, we're all isolated. It's fine. I think that comment just. Express the. The order of the DS, but. I mean, that doesn't change too much. I want to add another small point or maybe a big point. In response to what Taylor just said. It's really interesting things like new mom. For example, are getting protected ways to request. For example, access in Kubernetes, they already have it. It would be interesting, right? A lot of our issue here with privileges is exactly that paragraph that Victor highlighted. We need to do things. And the only way apparently to do those things is with privilege. So we're suggesting, well, don't do those things, do other things. But if you, if you look at the history of container technologies and the very fact that we can access, for example, network interfaces in a safe way is because the container. Safeguard step, right? Using Linux namespaces and then at and networking, it allows you to access an interface in a safe way without. Accessing anything else. There could be technologies in the future that would allow the same type of namespacing perhaps or similar for other, all the list of technologies that I put there, right? For example, access to a GPU could be done in a safe way, potentially without requiring privileges, right? Or an FPGA or similar. So my point here to say is that there's, to an extent, all of this has been black and white, right? We're saying either you need privileges, you don't have a choice. So then you have two options, either find an alternative or, you know, we're stuck. You're going to need privileges, but the platform itself can improve. The containerization technologies can improve in such a way that you will have safe ways to access these technologies. Yeah. Something you can work towards. Yeah. I mean, that was kind of the point I was trying to make that by this, it's not the principle of no privilege. It is the principle of no privilege flag, because the privilege flag is not fine grained. And so it gives you a lot more privilege than you need to get your job done. And the rest of the privilege is really quite dangerous. So yeah, exactly the point that you're saying here is that the reason we run into problems with privileges, in fact, because the level of privilege we can request is not precisely what we need. It's not the least privilege that you need to get the job done. It's actually much greater than that at this point in time. But yeah, I absolutely agree. The point was not to say you should never use expanded privileges. You should never have the right to use, you know, to ask for memory on your NUMA node. You should never have the right to change networking in a certain way. It is that if that's the thing you want to do, you should be delegated exactly the rights you're looking for and no more. Another way of saying it is I think that having privilege could be the least privilege. It's you're trying to get something done and you're saying within the environment that we're in, granted the least privilege needed to get your job done. If there are no other options except giving full privileges, then you get full privileges. And then we have other practices to say, how do we safeguard against anything, including something that's partially privileged, but definitely for full privilege. But there's never anything that is, we're not to go with what Ian said. We are not explicitly saying you never can break a best practice. Anyone can always say that practice doesn't work for me. Our job is to try to help people see the practices that are communicated by the community as good practices to try to follow so that you can know. And then how do you use them? I think we had an idea before that if you have to use privileges, then don't let the entire application use high privileges only the part that needs to create the raw socket or whatever. So only that pod may use privileges, but the rest of your application shouldn't do it. I mean, once something cannot be met, it's not a reason to throw the entire best practice out the window. We can mitigate for that and kind of contain the exception. Right, Ronnie, you actually expressed a best practice, which is isolate privilege. That is the best practice. One thing we should also allow it as well. If we haven't already is to also state what those privileges are, because it's one thing to say, oh, I have privileges. It's like, okay, we understand that your application is going to need some additional privileges. But if I don't know what those privileges, what those privileges are, or my application by default installs those additional privileges without letting me know as part of the installer, then that puts me at risk. And so we want to make sure that we're also forthcoming, that we're using additional privileges, and that helps the operator determine how they want to defend the system. Like if you need full privileges on a cluster, and I know that it needs full privileges, and that the installer is going to do that, then I may make a decision to mitigate it by installing it on an isolated cluster, where it might affect other applications, just as an example. I think Taylor, something to add to practices too, as another one is the concept of like relinquishing privileges. I mean, we keep having the same like discussions, even though we keep calling out, it's not a zero sum game, right? It's a scale. So the least amount of privilege to get what you need done could be that you have maximum privilege. I mean, you are root, right? Like it's a scale, and maybe we haven't like articulated that enough, you know, in like the intro paragraphs and stuff. But I mean, the best practice is you give the least amount of privilege you need to get the job done. That's, you know, different than isolating privileges. That's different than relinquishing privileges you don't need after you have it. So we probably need more best practices, but it's not a zero sum game. I mean, if the only way you can get what you want done is through maximum privilege, that's where some of the other practices now that are being listed also help us, right? You have maximum privilege because that's what's required, but it's isolated to where you need it, right? I mean, it's typical like CISSP type stuff of like understanding like the domains where you're doing, you know, presenting from like horizontal attacks, insider threats, that kind of stuff. So I mean, you know, this is just one practice and like basically like, we could call it like a suite of security best practices. So then obviously we want to look at that white paper from the tag, but like, we just, we have to like, we always get into these like debates on like, well, that's fine. That's still neat. That's not even violating the least privilege practice. That is just saying that the least amount of privilege I need is maximum privilege. It's not like a zero sum game in this regard. Yeah. I think at least for some of these privileges, the problem of having the privilege is not that it's the least privilege you have, it's that it breaks the boundary between platform and application, which is a technically different thing than having the least privilege necessary. I mean, even if every privilege was completely secure, it didn't threaten the stability of the platform or other applications. It would still be a logical thing. Again, right? Routine containers is a matter of least privilege because it makes your application better, but having routine a container absolutely does not mean that you can step outside of your container. It's a perfectly safe privilege to have from that regard. The problem with something like Capsis admin, which is so necessary for a lot of the things that we'd want to do in networking, really the only option we've got for ourselves today is that it can potentially lead to the platform becoming broken by an application, which I think it's fair to say in a cloud environment, platform should never be endangered by any application running on it. So one thing that's locked in security though is everything is about risk avoidance, risk acceptance, risk analysis. To say that you're going to be in a completely risk-free world is never going to be the reality. So I mean, there's always going to be compromises somewhere, right? Well, kind of. I mean, if you look at a standard Unix process, for instance, that isn't running as root, then 99% of standard using Unix processes, not running as root, can do their job. And even that even includes one that use certain levels of privilege because you use a separate process where you ask precisely what you want, and it goes and asks for the kernel to do the thing that you're in need of doing. So there are examples of managing privileges, even when the privileges are not necessarily terribly fine break. Right. I'll add here another practice that we can consider is SELinux. You could, I don't know exactly how to do this. SELinux is pretty complex, but I wonder how it can support limiting specific containers, right? Because if you have a specific user, well, if you do have a privileged container running under cryo Kubernetes, potentially you can use SELinux on the host to make sure that you limit what it can do. So even though you requested privilege, you're still limited because of that user. Not quite sure how to do that in SELinux, but pretty sure it's, you could, and it's probably something worth looking into. I do want to add just another point to my earlier point about things changing in Kubernetes. We need a qualifier for all of these with the version saying, this is our best advice. These are the best practices considering Kubernetes 1.20 cryo with this version because future versions might indeed improve the fine-grained control that we have. So right now we're just doing best practices, but it's frozen in a moment in time in which these are the capabilities that platform has. I think it's, that one's even bigger than that because it's going to be Kubernetes with these add-ons and per the previous conversation about SCTP potentially also with set and kernel modules loaded. That's exactly true. So I'm just putting a note to us that we need to make sure that we're qualifying all our best practices really with versions or by date, if that's easy. I don't think they're going to all need them, but where needed we can specify that. The other thing to remember is all of them can be updated. So when we come in and look, that's if things change, then we're going to update. The run processes is non-root is older than Kubernetes. That, that one. When we're saying run with privilege equals false by default, that's a Kubernetes specific flag with privilege flag equals false. Probably should say that. I wish I had a different name. The thing about privilege is relinquishing privileges. As soon as you've done the work, what Jeffrey was putting forward, that's generic. That should be applicable outside of Kubernetes. It's a security practice that's older than Kubernetes, probably older than OpenStack. True. But the best practice would presumably detail how you do that within Kubernetes in a way that someone can verify that you've done it as well. I agree. Any time we reference specific things, we can do that. So the SE Linux thing that you were talking about before, Tal, there's a whole page about that under the Kubernetes docs for using security context, and you can tie it in with different systems that can do those type of things for escalation and access control that you can, you can do this. So if we're going to actually talk about implementation, a specific, whether that's an example in the best practice or supplemental material where we say, here's how to apply this practice using SE Linux, then yes, we can give versions. I want to just really quick piggyback off of something Ian said though. It's not just from the context of Kubernetes. This is the CNF working group. So we should be coming in with the bias of how this best practice applies to the container networking world and running CNS. What we don't want to do is just rehash a bunch of stuff that security experts and other groups have already written. There's plenty of examples. When you start getting into weird things like ONAP or other stuff where does this container need to definitely just maintain and hold privileges? I just want to throw out the caveat that we make sure that it's relevant specifically to what this working group is set out to achieve. And we're not just writing best practices that are just generic to Kubernetes because those probably are listed somewhere already. Right. We talked about this a bit last week. I think what we need to do is at least create a list that we think is important. But we don't have to detail. If there are other documents that do a better job, we should just reference them. But we can at least collate all those documents. We can do that in our own best practice. So the list that we created here of practices is great. And some of them could be detailed or some of them can just refer you to something else or we might be able to summarize it in a way that's convenient. And I do think looking at that white paper, and I apologize because I've been out for medical reasons. I'm catching up. But there's a lot of content that is specific to CNS. This is also with us trying to get some best practices that are just going because we were all sick of admin stuff. But this is also where I'm tying best practices to the use cases. We'll also kind of alleviate a lot of this stuff. Because if we have a use case that's specific to our genre here that we care about, and then the best practices are drafted from the standpoint of how are we satisfying these use cases. I think some of this will just happen organically, but just throwing it out there. Yeah, I think that's a good point as well. I think the many of the CNS use esoteric protocols, and it would be interesting in time, like we talk about these privileges, but what does that mean in the context of SCTP, or if I have to run one of the 5G user playing tunnel protocols, then like what does, how do the best practices that we have also apply to, to those particular environments as well. And like this is also two-way street. So if we generate information that ideally is useful in the creation of the CNS, we also generate information that could eventually go back to SIG network or SIG security or other organizations within Kubernetes that we could say, these are the things that work for us. These are things that don't work for us. And here's the things that we had to relax while this drew that we love to have something more fine-grained if it's possible. Yeah, I mean, the primary example that might be service meshes, which generally have privileged side cuts because they're doing what we're doing, right? They're trying to do things with networking that we're never originally conceived of, and the only way they can do that is overstep boundaries that have been set in stone with very blunt instruments to get round with them. And yeah, that would be one place, I think, where if we came up with the best practice that suited some of our needs, then we'd probably find that best practice was useful to others. And there's a lot of other topics as well that we should also take a look at, like what are we doing in terms of placement as an example because some of the workloads are latency sensitive. And so how do we make sure that the CNS are designed to make use of that placement and what can they use in order to, in order to ensure that, or what are the limitations? Because there are some limitations in how that placement works. That one's interesting. I think there's a whole category of things here where the knobs we've been given or that we're creating or recreating in some of the cases where we're basically borrowing out-of-step technologies are largely knobs for this will make things run faster, not this will make things run fast enough. Latency kind of fits into that as well because if we're talking network latency, then we can say, well, if you put these two things on the same host, it will run faster. But in actual fact, we don't want it to run faster. We want it to run fast enough, which is a guarantee not a, you know, not a best effort thing. So how far we can take that in the future is an interesting question, but we have to bear in mind that, again, tweaking placement to put two things on the same host, which is the easy thing to do is not actually what we want. It's just a step in the right direction. One step. Yeah, and also at what cost, I may place two workloads on the same host and I lose something else where there's convention there or one of the processes, like it's very common for data planes to burn a core. And so what does placement look like at that point? And so it's not just the, what can you do, but also what are the, what are the limitations so that when people are designing and architecting these systems up front, they're designing to not just what do we want to do, but what are those boundaries that we cannot get across and bringing them back to the context of these privileges, the privileges that are there are not very fine-grained. Like they're better than they used to be. And when you bring in things, so when we start bringing in things like the capabilities, we're definitely going to give too many capabilities with some of the flags that are there. So one of the things that would be interesting would be to say, this is where we are. And if we want to do a deep dive into one of these topics, we could say, how do we apply something like EVPF through Falco or something else in order to further constrain this thing so that you can work around the limitations of Kubernetes or to be more precise, the limitations of the Linux kernel itself that of the features of Kubernetes itself makes use of. So I think that there's some interesting things there that we could drive on a deep dive, but we need to start off with like this is the stick in the ground, this is the release privileges, so good job with your. Well, let's talk about what best practices we can have in the current situation. So I think it would be fair to say that things should not be running with privilege. They should be running with specific capabilities as an example. By privilege you mean, are you saying in the flag, the container flag or running on? We're not talking about privilege, we're talking about never do that, always list your specific capabilities that you're looking for and then always constrain it to the minimum set you need to get your job done. I mean, because as a best practice, that has as far as I can see, no exceptions, right? There is no reason not to do that. You can always get your job done, you can do it that way. Yeah, and taking it back to what Tal mentioned as well about what are the additional things that you can add on like is there, like I install this particular CNF, I create a CNF and it takes the least number of privileges it's still insecure, like maybe it's using CAPNET RAW and I can, if someone breaks into the system or into that specific service then there's a whole set of things they can do there. So what can I do to further constrain? I may be able to do something on the SC Linux side or maybe there's something I can do on the EVPF side or the data plane that it connects to based upon the interface that it has access to the data plane can constrain in certain ways. And so there's things that we need to be able to raise to say that it's like these privileges are a great start they're not enough. You can further constrain and fine tune using these additional things in order to further mitigate them. Yeah, so CAPNET RAW probably dangerous on an interface that your average system grade CNI provides you on the grounds that it's not going to expect you to be able to do most of the things that CAPNET RAW hasn't been designed for you to do most of the things that CAPNET RAW lets you do. On the other hand, probably perfectly healthy on an interface that you're getting from some secondary CNI through multis. But very, very difficult to narrow down when it is and it is not acceptable. It's not that it's clearly a good or a bad practice to use CAPNET RAW it's that it's really not possible to express a best practice when CAPNET RAW is laid on the table the way it is today. Yeah, it is default on for those that are unfamiliar. So by default and the reason it's on is because in order to respond not even make a ping outwards but literally to respond to a ping your container needs CAPNET RAW so the default is to enable it and this has been the source of multiple types of attacks such as R poisoning, BNS poisoning and similar which which lead to more complex attacks over time. So we want to be able to to say if you're building a CNF and we expect these CNFs to live in places that are sensitive potentially then like you're going to have CAPNET RAW almost certainly certainly all by default. So we're still not following like we're closer to the principle of these privileges because we're not giving you CAPNET admin but it's still there's still things you can do to further protect yourself and here are some examples of how you can do that and I think something like that on the concrete side would go a long way. So one answer to that is using a product like Cilium Cilium takes over all your networking and implements its own security layer there. I don't know specifically if it could protect against those attacks but it definitely can protect against a lot so that's another potential maybe something we could add to the list consider Cilium or similar products also another practice we forgot to add to the list is virtualization that's another potential solution to these problems you could use a full blown virtual machine with cubert or cotton containers or something else that provides better protection than containerization. I think both of those solutions slightly missed the point because the issue we run into with using something like CAPNET RAW is not that you can do dangerous things with raw packets it's that there is an unwritten contract between the workload that you are and the network that you're attached to that says you will use the network in these ways so for any for an unspecified C&I that contract might include well you won't have CAPNET RAW so it's not the thing we need to defend against for Cilium it may not be true and for what it's worth Cilium does what any other C&I could do as well there's not anything particularly special about Cilium but it amounts to if the workload starts sending random out packets because I feel like it and you the network start ending up with a poisoned out cache because you're not expecting that then that's where the problem comes in it isn't use technology X it is have expectations that are agreed on both sides or run them in rescue mode and capture packets that you weren't expecting to have access to as well but that's besides the point in terms of like we identified a particular problem we have with CAPNET RAW there are multiple mitigations mitigations include virtualization you can run something that's EVPF that further filters it down which is what Cilium does or you could run this in user space networking where you're not using the kernel mechanism there if you're using something like SRRV in direct mode then none of this applies it's all up to your your data point at that point yeah exactly and that's my point it doesn't matter what you're attached to because I could be attached to the kernel and make this work I could be attached to the top rack switch with a VF and make this work it matters that expectations are the same on both sides I think Cilium comes down to sorry what are you saying Taylor? I was just going to point out that Cilium actually has CAPNET RAW turned on for some of its tests that are doing ICMP and other things that like that is I'm guessing that's just going to be a limitation there's is there anything in the works to make ICMP work with that CAPNET RAW well I'm a little weird if ICMP doesn't work without CAPNET RAW because I can't send an ICMP package certainly but for instance ICMP responsible we're just fine without it you're really saying I can't open inside a container I presume well you couldn't respond to a ping so someone is pinging your container that's happening in the next step the container doesn't have to act for that to happen I don't run ping-d on every server that responds to a ping the kernel responds to a ping because it's a low-level part of ICMP and IP in general that may be accurate I recall there being issues with the response but that might not be an issue or just the weirdness on the system that I was being shown on a question more generally this sounds like it's gone the opens that way of nobody's really actually said these are the responsibilities of whatever network stack we're doing so is there such a thing is there a list of functionality that a CNI considers itself to be a good actor is expected to stand up and implement it's very basic it's incredibly basic the requirements are basically nodes can communicate with other nodes where node is defined as something running a cubelet nodes can talk to pods and pods can talk to pods it was purposely left open to interpretation outside of that what we should probably consider is are we in the realm of undefined behavior when we want to do a certain thing because if we are in the realm of undefined behavior then you can't do that thing in that way well it's definitely undefined behavior but de facto standards have appeared and so it's I think it's it's not reasonable for us to go off and define what all those things are but in but we know that in those set of standards things like net cap raw are set there's nothing in Kubernetes that says a should or should not be set and there's nothing that defines whether it should or should not be set and so it's left to the implementer of the CNI to work out and use their reasoning to determine whether or not it should it should happen so what I'm saying here is fine I set cap net raw in my container which is the MPLS packet if there is no documented responsibility of any given CNI to pass MPLS packets which I think you can say there isn't then I shouldn't expect setting cap net raw and sending an MPLS packet to do anything on that reason you want to do and I see what you're saying there the reality is we actually don't know what will happen on a per CNI basis and so that's something that we can go out and say that these things are not well defined so you need to state which CNIs you tested it with yeah I mean it's yeah that would be I think the point I'm trying to get to if you're asking for a specific CNI by names or features or functionality then we're doing something wrong honestly because either CNIs are defined to do what they do and not to define to do what you shouldn't expect and we should not use anything that hasn't been defined which fine paints us into a corner or alternatively there's something that's not been written down here that there is a subset of CNIs that implement a certain behaviour that we should require right a not by name not by saying I will only work with Thilliam ideally but by saying I need a CNI with this extension this extra capability SROV CNIs for instance I hate the fact that there are CNIs for a simple reason that I have absolutely no I mean they don't implement basic CNI functionality beyond the API they don't they don't pave like CNIs are stated to behave but my point is that if what I'm looking for is something that doesn't behave like CNIs usually behave or are guaranteed to behave then I need a way of expressing that and the contract for CNI is very simple the problem that we're going to run into that we already have run into is the same as in OpenStack like in OpenStack the definition of what Neutron could or could not do was very specific and similar to how CNI is like we are IP based networking in OpenStack we ran into a problem where OpenStack basically said we're layer 2 flash you need to be very careful about that you need to find absolutely nothing about how packets moved around well but in the control plane you still have to specify things like what MAC address it was being assigned and similar it did not specify with the actual movement but part of what people did was they said well we're not using MAC addresses for this particular thing so we'll go to reuse that field for something like MPLS and eventually they added in a plugin system for it but it definitely caused pain and then they worked out oh we could actually don't even have to go through Neutron we could actually just gain access to the RabbitMQ in the background and then completely bypass the APIs which ended up causing a lot of issues around portability the point I was making was rather more basic than that because it always looked like from an addressing perspective what you had was a bridge to make therefore you might reasonably expect that things like broadcast would work and so on but that was not promised in fact the issue with the promise is it was different and it was in everybody's head it was never actually written down sounds to me like the CNI the promise is written down and we should read it and then we should refer to it but we should make absolutely no assumptions about anything that hasn't been written into that promise right we certainly have to go over steps of boundaries we expect more than we're being offered and I think that's understandable I mean it's just the way it is but if what we're saying is that this CNF will work because it is emitting MPLS packets and the CNI will obviously do a certain thing with them then that's relying to ourselves that definitely can't be a best practice and the CNI was designed specifically for one primary problem which is how do I get a container how do I get an interface set up in a container and interestingly if you look at Kubernetes there's no concept of a pod subnet in Kubernetes there is a service subnet but there is not a pod subnet so even the concept of like the fact that there are pod subnets is a construct of the CNI implementation so it's literally what is the interface and what is the IP address actually I don't think there's that likely to be anything in the CNI that says there's an interface I think there's like to be something in the CNI that ensures that every pod has an address of its own and that CNI will make sure it can reach other things but you know I could set up a pod with ECMP in it and I don't think I would be breaking any rules I think it is bound to the kernel interface itself and a responsible interface name is but I could be wrong it's been a long time since I looked at a specific portion of CNI but this is easy to check the spec is trivial yeah the reason I'm being pedantic here is because you know there's very little I can do especially without privilege coming back to where we started there's very little I can do with a container interface name so knowing there is an interface or even what it's called or expecting to find an interface or any of these things generally not relevant to applications that Kubernetes was intended to run I do not need to know what the interface name is all that there's one interface if I'm trying to run a web server I have no interest in that I want to know I've got one IP address ideally but that's probably all I'm asking for I think one thing that's really confusing and also comical to me is how we seem to want to extend the functionality and add operators and CRGs for everything all over the place except for like when it comes to the CNI and then everything has to be done to that context I mean assuming that we had best practices that showed us how to do things intelligently and keep us out of trouble like I mean lots of other people have found ways to put interfaces in the pods you know in parallel to the CNI but if it's storage if it's something else everybody's like oh I've got like 15 operators or all these different extensions that you can do and we're just going to change the functionality of everything but in networking we're still going to shove everything into the CNI and try to recreate Neutron by just adding an infinite number of plugins I don't quite understand why we're so unwilling to add extensionality extension definitely not a word extensions to the networking space but we are everywhere else yeah I mean my concern here is one of pragmatism not necessarily perfection so what you could do is you could basically increase that CNI definition so that any CNI that's implemented that meets the standard of what a CNI is will definitely offer every bit of functionality that we need and we all know that's unrealistic because that means that Calico Moltus and that one that somebody's writing in their garage that we haven't seen yet all of a sudden have to do you know what 99% of people want to do and then you know 90% of their code will be doing what this 1% of people do that that's not going to happen how the remainder of functionality is expressed if you stick somehow to a CNI definition that everybody is willing to implement to implement properly is an open question I mean clearly today it's typically Moltus and a bunch of other CNIs is how we happen to have done it and we should live with the fact that that really is the best practice that we have but it doesn't necessarily mean it's the only way or indeed if it's the best way that it could be done right NSM demonstrates and alternative there are probably others out there I think we should be careful with this like I don't think we're going to get it scoped down and the definition is too broad yet in terms of being pragmatic that means that you have to pick a CNI and you have to make sure that CNI supports the features you need and you test for it and you need to pick a CNI you need to make sure it's a version that supports the features you need and I developed this to sign up supporting those features for some extended period as well so we've got to be quite careful even if we do that yeah like in terms of the in terms of the CNI there's also we also need to be careful in terms of interface versus capabilities and saying that the two are the same so it was something that came up early in the Moltus days when they were trying to work out how to how to position it and how to position the community is that like when they were trying to get multiple interfaces directly into the Kubernetes API itself and the people from Calico stepped forward and said well we don't want to be forced to implement multiple interfaces when we're just going to stick with us on the back of the current interface and so that caused multiple interfaces to all CNIs like CNI doesn't prevent you from starting multiple interfaces but you don't want to be forced to to do so so that's what I was saying like it's important to take a look at the feature set that you that you want and that feature set could include something like I ignore the CNI part the CNI is just about the interface CNI is not even running after after the plot has started and so it comes down to the data plane CNI is attached to you will that data plane do the things that you wanted to do and will it respect some contract that you've organized and there's no interface there in Kubernetes that beyond you can speak to other pods, you can speak to other things and the limited set of network policies they add in to go beyond what those capabilities are so it comes down to choosing that data plane and ensuring that you've tested against that data plane Well Hey Ian, we're at the top of the hour if we could focus on what are some best practices whether that's using CNI or using other things the capabilities we want there and like best practice for using those I think would be a good focus for us not that you can't use a CNI but what are the best practices in the use case that we can write up and we got a lot of other content we may have some other people next week or in the future from Red Hat on their testing that they've been doing in OpenShift and if you have some other topics then add them to the upcoming meeting notes Thanks everyone, see you next week