 Hello there and welcome to the CNI maintainers session here for KubeCon 2021. Whether you're watching this live and joining me afterwards for Q&A or joining afterwards on YouTube at one and a half time speed, I'd like to thank you for taking some time of your day to join me. So let's get started. So, the maintainers track session at KubeCon are where CNCF projects can give things up, can give updates and talk about what's next for those projects. In this session, I'll be doing that for CNI. And what we'll be doing here today is starting the conversation about what CNI 2.0 might look like. First, a brief introduction. My name is Casey Calangelo. I am an engineer at Red Hat working on OpenShift and upstream Kubernetes, as well as maintaining other upstream projects such as CNI, oven Kubernetes, and Go IP tables, which you may have used. I've given similar talks to this before, but this is mostly about CNI 1.0. And today, this is a new talk around what we might want to do for CNI 2.0. What are we going to talk about today? We'll talk about an update of what the project has done so far, namely releasing CNI 1.0. We'll talk about some pain points and considerations that we are thinking about as we look into the future. And we will look at some possible new directions that the project might be taking. So the first agenda item is CNI 1.0. But this slide is kind of boring. I think we need some more word art. There, that's better. By the time you're watching this talk, CNI 1.0 will have been cut. And this is a pretty cool achievement, right? This is a standard that started from the community about five years ago, and it's finally time for us to declare 1.0 that we've reached a stable specification. CNI 1.0 is not going to contain any particular surprises. It looks pretty much just like the previous versions before it. It's just a formalization and a rewriting of the existing spec that everybody knows and loves. Honestly, it's been almost five years now, so it's appropriate for a project that is as mature as CNI to declare a stable release. As an aside, I'd like to thank the CNCF for donating time and resources to help us set up a website, which we didn't have the resources to do. So now you can find everything you need to know about CNI at our shiny new website, cni.dev. Just to get everybody on the same page, here's a quick overview of how CNI currently fits into the overall Kubernetes and containerization ecosystem. CNI is responsible for configuring a network interface, more precisely, at attachment inside a container. That is to say it mediates the interaction between a network plugin and a container runtime. CNI, the protocol is an execution protocol, and CNI is additionally a configuration format. There is a reference implementation for consuming the configuration and executing the protocol, known as libcni, that is used by many plugins and many container runtimes. Libcni is maintained by the CNI project itself. The project CNI also supplies some commonly used network plugins for really common use cases, such as a generic bridge implementation. But by no means are the CNI plugins released by the CNI project an exhaustive list or the exclusive set of CNI plugins that people tend to use. So if you look here in this diagram on the screen, CNI is everything in this orange box. That is to say it is the configuration, it is the protocol by which a runtime on the right talks to a plugin on the left. Libcni is the reference implementation in between this interaction. Let's talk briefly about the abstract or logical components of the CNI model. On the left we have a container, which is managed outside of CNI. CNI is only one aspect of bringing container up, and CNI makes no commentary on how a container itself is managed. That is supposed to be handled by a container runtime engine. Then there is a network which in the CNI world is represented by a single CNI configuration file, and then you have an attachment of a container to a network. This is CNI's picture of the world. To be a little bit more accurate, the CNI model allows for multiple attachments in a single container, and it allows for multiple networks, and it even allows for multiple attachments of the same container to the same network. As an aside, the fact that Kubernetes only understands a single attachment is a limitation or a decision within Kubernetes itself. Other CNI run times such as, for example, Podman natively support multiple interfaces, and the CNI works with them in this regard. The execution protocol has only three methods. It is a simple sort of RPC world. It has three methods, add, del, and check. That's it. Add concerns with creating an attachment says, please attach this container to this network, and delete would be the inverse of that. Check is to report if a particular attachment is still functional. That is to say, asking a plugin to please validate that everything is still configured appropriately. We go into much deeper dive into the specifics about this in KubeCon 2020. If you're interested about that, you can watch that talk. CNI 1.0 really doesn't change anything from that talk, so that talk is still accurate. An important distinction about CNI, though, is that plugins are executable binaries. When we say RPC, many people think GRPC or JSONRPC or REST. No, that's not the way CNI works. Plugins are executable binaries, and each RPC call is a new execution of that binary. It's a little bit different than what you may be used to. But that's it. That's the whole protocol. It's not particularly complicated. So, now that we understand the basics of the CNI protocol, let's look at some problems and things, let's just call them sub-optimalities that users of CNI and developers of CNI plugins experience with real-world uses in the network. The first sort of wart, or what we'll call it a wart, is executing binaries. The bad thing about this, well, there's a couple of things. First of all, it's a security risk. And what do we mean by this? Well, we are deploying binaries executed as root in the host context, which is obviously an extremely privileged position to find oneself in. So that's a bit of a security risk. It's also annoying in containerized developments. For example, you are installing a binary that's executed in the host, but it's built in a container. It's a good thing that Go and REST make it very easy to build statically linked binaries. Otherwise, this might be a quite difficult problem to solve. The fact that it is a single RPC exec style protocol completely precludes the notion of any sort of events or push style APIs where a plugin could conceivably push state back up into the runtime. It's not possible with today's CNI. And one sort of funny little thing is that many plugins today are, in fact, thin shims to demons. So we find ourselves in the somewhat comical situation of a demonized container runtime, such as the Kubelet or container D, talking to a demonized network plugin, such as OVN Kubernetes, by executing a very small binary, which then talks back to that demon. So it's a bit strange that we have to put this sort of adapter in. But that said, we chose binary execution as part of the protocol for a couple of very real reasons. The first is that it solves a real problem with Go and namespaces, right? So I don't know. If you don't know about this, then consider yourself fortunate, Go and network namespaces don't necessarily get along so well, and executing binaries is one way to mitigate the damage that can be done by that. It's also extremely useful for demonless runtimes. Not all runtimes have a running demon. For example, Podman or Rocket, from which CNI came out of. Additionally, executing binaries ensures that plugins don't cheat and not checkpoint state to disk, right? If you need to execute a plugin every single time, it forces you to make sure that you are managing your state in a correct and checkpointed manner. That's the first warp with CNI 1.0. The second warp is one that some people have probably discovered themselves as well, which is network status. How do you tell Kubernetes that a node is configured and ready for pods to be scheduled to it? The answer is you write a configuration file to disk. You write your own CNI configuration file, which is a bit strange because it's the same configuration file you're supposed to use to configure yourself, right? So this is a strange catch-22, and CNI needs a better way for network status to be reported. Right now, we only have attachment status. Network status doesn't exist in the CNI model right now. So that's warp number two. Warp number three concerns itself with configuration management, right? So writing files to disk is a bit troublesome. It's inconvenient in containerized deployments because you need to bind them out something to disk, which is also an interesting privilege concern. It's not easily discoverable as well. It means that anybody who wants to know anything about network configuration needs to have the same directory bind mounted in containerized deployments. That's pretty awkward, right? It is also a bit too dynamic. If you have the same network configuration and you like the same network configuration across all of your nodes in some cluster or fleet, why do you have to deploy a demon set to copy a file to disk? That's kind of silly that you need to write a file to disk that's identical across your cluster, and you can't use any of the existing methods that you may have for doing this. Simultaneously, configuration files are also not dynamic enough just for some use cases. If you have this configuration file that is otherwise entirely constant except for, say, IP pools allocated to a node or network, why do you have to template in doing string manipulation, template in some IP pools? Your configuration, as far as you are concerned, is entirely constant except for addressing pools, right? So these files are not really supporting use case super well since you need to do some sort of templating or meta-CNI configuration management. That's a bit awkward. That said, what's a good thing about files? If the good thing about files is that they're simple, tooling them is very easy, obviously easy to script, and let's just say it's not been a problem right now, right? Everybody can figure out how to write a file to disk and then you move on with your life and you go and solve much bigger problems. But in any case, that's work number three configuration management. The fourth work I'd like to mention is a bit more abstract. It's also a bit more unclear. It's around devices and hardware. The CNI protocol is designed with simple vets and bridges in mind, these sort of virtual interfaces that are completely limitless and have no underlying basis that would either need to be accounted for or scheduled or managed in any way like hardware, and the protocol absolutely reflects this notion of utter limitlessness. Multis and DINM and some sort of meta-CNI runtimes contain an absolutely absurd amount of code to make working with hardware and device plugins even possible and that's not necessarily to say that it's easy or an easy model for people to understand, right? You can actually watch a talk in this KubeCon EU 2021 with my colleagues, Billy McFall and Adrian Moreno. They're talking about their effort on something called the device information spec which is an attempt to bring some order to the madness around hardware initialization, networking, and Kubernetes. But the takeaway from this is two things. First of all, hardware is complicated, really, really, really complicated and we don't necessarily want to specify every last little bit of that. But the other takeaway from this is that CNI is doing them absolutely no favors, right? The specification doesn't support their use case at all and if there's something we can do to make this simpler we should consider adding that to the specification. That's the fourth wart we'll call it. The last wart I'd like to talk about is around life cycles, specifically set up and teardown. CNI makes it difficult for a single plugin on a single network, the same plugin to share a given resource between multiple attachments or multiple containers. What do I mean by shared resources? It doesn't need to be that abstract, but shared resource can be as simple as a bridge, the same bridge that every container is attached to. And it's difficult to share these resources for a couple of reasons. The first is that there's not necessarily good information about addressing. It's hard to aggregate things such as when you can aggregate things based on IP addresses. There's also no timing guarantees. CNI explicitly makes no guarantees other than you will get the delete after an ad. So you need to do locking between multiple instances of your plugin if you want to share something between containers. It's also difficult to safely, if difficult, if not impossible to safely tear down shared resources such as a bridge when the last container leaves it. And you may not necessarily even want to do this, right? So most CNI plugins, the effect of this is that they leave their shared resources around forever, even if the network is done, even if you're not going to be using this bridge anymore. Because generally speaking, not tearing down the bridge is better than potentially tearing down a bridge and interrupting an in-flight operation or affecting something else. So there is no notion of tear down in CNI other than tearing down an attachment, and that sometimes makes things a bit awkward for users and developers of CNI. Okay, so that was some of the problems or let's just say sub-optimalities with CNI as it's been adopted and used in the real world. I want to briefly talk about some of the considerations we need to keep in mind, that we can then move forward to think about what we want to do next, right? So a quick dive into some considerations. So the most important consideration that we need to keep in mind is that CNI is not Kubernetes, right? We need to support multiple run times, we need to support multiple deployment paradigms. Not all consumers of CNI want to create a Kubernetes-style sort of single logical network across multiple nodes, right? For Podman, that doesn't even necessarily make sense. And there's lots of CNI run times, we need to design a specification that doesn't necessarily preclude them from doing what they need to do, right? CNI is vendor-neutral, and that's a good thing. A second consideration is sort of similar to the first, which is that some run times are demonless. So whatever we do, we should not make their jobs or administrators that choose demonless run times any harder. If an administrator chooses a demonless run time, they expect that they have a network infrastructure and network plugins that are probably, probably like those to be demonless as well, right? And if we require end users to manage running demons just to bring up a simple bridge and port forwarding, we've made real users' life a lot more complicated for no real benefit to them. We need to keep this in mind as we're designing things forward. And lastly, it's always useful to be wary, extremely wary of the so-called second system effect. This is the unfortunate tendency of version two of a particular system to try and solve all problems perfectly and thereby solve no problems well and wind up bloated and unusable, right? This is not a new problem in software engineering. It's probably the first problem in software engineering. It was even discussed by Fred Brooks in 1975. We really need to keep this in mind as we look forward, right? By the way, a brief aside, I think that Kubernetes deserves a lot of praise for avoiding some of the temptations that cause the second system effect, right? I encourage all of you to watch the talk in this KubeCon 2021 to talk about reimagining the Ingress API. And that team deserves a lot of praise for avoiding the same temptations around the second system effect. They've worked very hard to design something that is not bloated and is also not over-engineered or over-specified. So with all that in mind, how can CNI 2.0, whatever form it winds up taking, be a worthy and successful successor to CNI 1.0, right? And the answer in some sense is very simple. We need to solve real problems for real people without making life appreciably harder for anybody. That means we need to keep things simple, composable, understandable, keep true to what's enabled CNI 1.0's success, right? And another thing that's also critical is we don't want to over-specify every interaction, right? If we write a protocol that's really rigid and over-specified, then you don't leave room for unanticipated uses and you just make a protocol that's difficult for anybody to use even in slightly divergent manner. So with all that in mind and about 10 minutes left in my talk, I'd like to move and think about what CNI 2.0 might look like, excuse me. So the first thing I'd like to think about for CNI 2.0 are some potential lifecycle improvements. So I've showed here the logical diagram before and the three CNI methods, how they fit in. You can add, delete, and check an attachment. That's all that you can do. So let's imagine what if you could do the same verbs for all of the three logical components within CNI? What might that look like? What if you could manage networks the same way you manage attachments? Well, just for discussion's sake, I think we can probably come up with better verbs. So let's do that right here. And you can see here we have a similar lifecycle for networks and containers as we have attachments. Just rename things a little bit so it makes a bit more sense. Let's think about what would a network and containers' lifecycle look like within the context of CNI as we have it. So you can imagine a network having some sort of an ad, which in this case we're calling init. You can imagine network plugins creating shared resources, such as bridges and firewall rules, when the network itself is created. Likewise, what does it mean to check a network? Well, you could say checking a network is checking to see if this network is configured and is ready to accept ads. That's a real problem that we know we have right now is there's no way to check a network status. And additionally, destroy or delete for a network would be a way to say this network is no longer needed. Please tear down any attachments and please delete any shared resources. So that solves a couple of the things you discussed. I think that this is a pretty clear game for any sort of future directions in CNI. That's network lifecycle, somewhat akin to attachment lifecycle. So what about container lifecycle, knowing that CNI itself does not actually involve in the creation or deletion of containers or network namespaces or any sort of other isolation domains? So what would it mean to add a container? Well, one thing that comes to mind is some sort of ad being akin to a finalize, which is to say saying this container is fully attached and passing a container plugin as opposed to a attachment plugin is saying, hey, please, this container is fully attached. Here are all the interfaces that are configured in it. Please make some sort of additional super level or higher level aspect of a container. Please orchestrate or configure that after all attachments are done. The use case for this might be something like tweaking routing tables or adjusting CIS CTLs or an interesting one is adjusting some internal firewalling. Right now you can have an Istio network plugin which fits into CNI and Hooks, but it is a bit of a cheat because it doesn't actually create any interfaces. This would be perfect for that particular use case, which is to say I don't want to touch any interfaces. I don't even particularly care how many interfaces there are. I just need a container's networking state to look something like this after everything else is configured. So that would be an interesting thing to add to CNI 2.0. And then check and delete would sort of match this. Check would see please verify that your changes are correctly applied and delete would undo. It would be something along the lines of undo what you did. So that's the first exploration. This is looking in the direction of life cycle enhancements. What else might we want to consider for CNI enhancements? Well, a little bit more specific would be demonization. Should we offer and switch to GRPC for CNI 2.0? And the answer for that is almost certainly. There's significant demand for this, especially if you look at all of the work that is done to re-demonify the exact CNI binary. In other words, should we offer GRPC? Absolutely. GRPC is the standard choice. It is the expected solution for this particular corner of software engineering. And there's really no compelling reason for CNI in the future to avoid GRPC. However, can we require that all plugins and all administrators run as demons? And the answer is definitely not. The administrative overhead for simple plugins is way too great. It would be asking way too much for administrators in that particular context. So the first solution we've come up with as maintainers is to think about offering CNI 2.0 as both demonized and non-demonized plugins. In other words, we should support interactive RPC over a socket file as well as direct execution a la CNI 1.0. We could define some relatively simple fallback rules and it should be pretty seamless. Additionally, we can implement most of this in lib CNI so that plugin authors and administrators really don't necessarily need to see the complexity and they can pick and choose which ever works better for them. I think that's a pretty clear new direction that we're going to want to take CNI 2.0. We need to offer demonization and we need to not make it the only choice. So that's it for ideas that we've had right now that we'd like to present about for CNI 2.0. But I want to touch on another area for which there's room for serious potential improvement and that is the interaction between CNI and Kubernetes. We're here at KubeCon and Kubernetes is obviously an extremely important consumer of CNI. So to take a step back, CNI configuration is a file written on disk by some unknown processor. Everybody does it differently. All they need to do is write a file to disk and this has resulted in thousands of... That's an exaggeration. This resulted in many different solutions and everybody does it differently. And this isn't necessarily really the Kubernetes way. The Kubernetes way is to have discoverable, authoritative, validated, declarative configuration that is managed by some sort of central service, which is to say this sure seems like a lot, sure seems a lot like every other API object. What is Kubernetes? But a really excellent crud over at CD. Oh, I'm sort of cheating. I'm sort of joking when I say that. But network configuration doesn't have a compelling reason why it should be different from any other types of configuration within Kubernetes. So the configuration is not immediately obvious how this entirely fits together, however. So the container runtime engine in the Kubernetes cluster, which is to say container D or cryo or anything like that, that talks to the Kubelet via something called the CRI, the container runtime interface. And it is by design and really important that the CRI is not only for Kubernetes. It is an abstract, it is a standard and an abstraction boundary. And the container D and cryo and all of the CRI, the CRI runtimes, they don't talk to the API server. They don't talk Kubernetes. They talk CRI. And that's a very, very good thing for vendor neutrality and making it so that things are pluggable. So if we wanted to add logic to retrieve network configuration from an API server that would pretty fundamentally violate the boundary, the CRI Kubernetes boundary, we don't think that that's a particularly good idea. So our first sort of straw man proposal as CNI maintainers is that network configuration management should be first class within the CRI itself, right? There's no need for this to be CNI specific, sort of in the same way that you can have GCP and vSphere volumes. It would be really cool if you could have CNI or non CNI network configuration managed and configured and lifecycle managed by the Kubelet over the CRI, right? So CRI asks that runtimes create reconciles network configuration and enables networks to exist for a particular runtime for a particular sandbox in Kubernetes. Putting the network configuration in CRI has a couple advantages, right? It means that you can have simple cluster-wide network deployment and really importantly, it ends this sort of bizarre Kubernetes status catch-22 where your configuration file means that you're already configured. It's always been a bit strange, right? The problem is that it complicates per node configuration or more specifically, it complicates cases where network configuration is not uniform across a cluster. So that would require some careful thought, but Kubernetes solves these problems pretty well. They have the notion of label selectors and a node selector would be a pretty interesting thing to add to that object, right? And also by making network configuration a first-class concept within the CRI, it opens the door for future improvements like multiple interface support and maybe potentially dynamic attachment. Okay, I'm getting a little bit out of time, so I just want to wrap this up. Just a few parting words as the presentation comes to a close. This is obviously very early days in the saga that will be CNI 2.0, and this is nothing if not a community effort, right? The CNI maintainers, we want to hear from you and want to make sure that this is a worthy exercise for all of us in the community. And we really welcome your involvement, right? The CNI is not a project that is supported by any one company. It really is intended to be some form of encapsulation of community consensus. So please encourage all of you if you have opinions about this, please meet us on the CNI, the CNCF Slack and the CNI and CNI dev rooms. And if you'd like to start talking about this, we have a label on our GitHub. We are starting to open up issues to discuss and look at ways forward. So thank you very much. This needs more word art. I'd like to thank you very much for watching. Thanks for taking the time. Thanks for watching me at 1.5 times speed. I hope I didn't speak too quickly, and I believe it is now time for live Q&A. Cheers.