 Hey, thanks everybody for coming to our talk today about debugging CNI. It's something that touches all of us when we're Kubernetes users, and we wanted to share with you our experience. My name is Doug Smith, and I'm the technical lead for the OpenShift Network Plumbing team. We deal with kind of all things CNI, as well as multi-networking with multi-CNI, and I'm a member of a network plumbing working group. And I'm joined with Daniel here today. So hi all around. So I'm also being part of the OpenShift CN Networking team, and I've been basically the project team lead for the project. We've gotten CNI plugins, which is called Query Kubernetes. So basically I deal with a lot of networking and stuff, and we're had as well with that. All right. So what we're going to talk about today is first a brief introduction to CNI to help let you know kind of where things live, and what the touch points are, and kind of a flow of what CNI does. We're going to look at some of the basics for CNI debugging, and then Daniel's going to give us an overview of CNI tool and come into a demo to show you exactly how he tackles debugging. We're also going to then kind of feed this information back into what we think needs to happen for CNI 2.0 and how we can as a community make this better as we come together. So CNI, it's the container networking interface. What CNI is is an API, and each of the CNI plugins are discrete binaries that live on disk that you'll see here by the CNI binary directory. That's something you got to know where it is. And you also have CNI configurations that feed into those. Those are executed by your container runtime implementation. So depending on your distribution of Kubernetes, you'll have either a container D or a cryo that's going to fire off CNI. And there's a commonality between both of those, which is lib CNI, which is provided by the CNI maintainers. So between your container runtime implementation, the kubelet, and your CNI plugins themselves, those plugins will then go and manipulate your pod sandbox. So that's where you're going to have a network namespace, and that's where your CNI plug-in is going to do its primary work of creating the network interfaces that you see in your pods. When you've got a CNI plug-in call, it's really kind of a simple flow. What happens is your runtime implementation using lib CNI is going to call a binary that's on the host. And it's going to call it with some environment variables that give you some information about the call that's being made. And then it's going to feed in a configuration through standard in. Your plug-in itself then will speak via standard out, and it'll give a CNI result as specified by the CNI spec. And then additionally, it'll have an exit code, which counts. So if you don't exit zero, that pod's going to go off into a crash loop and try again. Yeah. So as Doug was mentioning here, in the CNI specification, we are doing to implement basically four different calls. So we will CNI add, del, check, and version. Even though this is somehow mandatory from this specification, most of the plugins, they don't implement checks. So as the name implies, add would just basically add a container for a network or apply modifications on your plug-in, delete would remove a container for the network, whatever buys. The del operation should return zero, or you may get into a terrible crash loop back off that we will speak about later. And also, the delete operation may not be successful should some operations in the node be happening, such as a rebooting. So then you would need to delete basically the remaining things on your own. Then you got check. Check. As I said, it's not really implemented. I'm going to be showing an example about the CNI plug-in that we developed specifically for KubeCon, which is called KubeCon CNI, and that does automatically nothing. But I think it's a good example to see how the CNI plug-in could be bootstrapped. And version will be just getting you back with the version name for the CNI plug-in. So CNI plug-ins get their configuration via a JSON file. Even though there's a lot of fields in this JSON file, what you should just be keeping up with is that you don't really need to have all these fields. I'll be showing that now, as of now. But the CNI inventory fields are just version, name, and type. Besides that, you can add whatever fields you may like to use in your own implementation, and then IPAM and DNS ones. IPAM is a little special. We'll speak about that later for a while. Let's take a quick look about the KubeCon CNI plug-in. I hope that this is big enough. Let's make it a little bit bigger just in case. So how do I really make a CNI plug-in? As I was saying, this does nothing. But as you can basically figure it out, we are implementing the scale from the plug-in, and we got the four meters out there. We got add, del, version. And as I was commenting, we are not going to be really implementing check at all in this case. But if you just go and check the add version, you see that it's basically loading the next con data from there. It's going to be getting in an error, should that happen. And we are just basically returning a JSON. So nothing really fancy. I'm mentioning this because developing a CNI plug-in is not just straightforward, even if this is just so easy, because there's seldom to know documentation and how to do this. So most of the people just read the spec, but there is no real example. This is something that at least someone should improve. You're happy to fill any pull request. Just something to mention quick about this plug-in that Daniel is showing is we've got this code available on GitHub. We've got a link later in the presentation. But this is what I would call a dummy CNI plug-in. And it's something that you could take just as a skeletal structure and put in some logging and use it to substitute in for a currently running CNI plug-in if you wanted to get some debug data as well. Carry on. And one other CNI configuration just to go over briefly is what we call a conf list, otherwise known as chain to CNI plug-ins. This is where instead of having the type field, you're going to have this plug-ins array. And I wanted to show this specifically because in the CNI 1.0 spec, it will be all configuration lists, all conf lists. So you're always going to see this list, even if it's a list of one in the future. So just to give you a quick example of that. All right, so let's just get into some of the basics. One of the first things that you're going to want to do when you dig in and start debugging CNI is to figure out where are your confter and your bender. This is the bender is where your CNI plug-ins exist themselves. So those are the binaries that are on disk and as well as your confter, which is going to have your primary CNI configuration. There's some defaults and most of the time you're going to find these in the default directories. However, they're configurable. And since the kubelet is going to be in charge of kicking off this process, that's where you're going to find the configuration for these. So two places that you want to look. You want to look at your kubelet's command line arguments. So just do a PS look for that, see how it's actually running. If it's not specified there, don't assume it's the defaults. What you're going to want to do is actually pull up the kubelet configuration. So if your distribution happens to use a non-default location, you can really burn some cycles trying to look in just the wrong place. Something that you've got to realize related to your binder is that the type field itself, where you've got flannel, calico, cilium, whatever it may happen to be, that's actually referring to a binary in your binder path itself. So that's what it's actually going to execute. So be mindful of the fact that that has to be an actual file that lives on disk and that matches the value that's there. Something that I've seen happen many a time is that you're having a failure and it's because you've specified this configuration, yet your CNI plugin itself hasn't installed properly or isn't installed on disk. So that's one thing to take a look for. And related to your configuration directory, this is going to, the CNI configuration that's alphabetically first there is the presence of that file is going to determine if your node is marked ready. So if you do a kubectl get nodes, you'll see a status field ready or not ready. And that CNI configuration is a semaphore for the kubelet to know that this node is ready to handle network traffic. So as a CNI developer, you want to be mindful of how you lay down your CNI configuration on disk. And as an administrator, you want to be mindful of how this process works so that you can know if your nodes are tainted properly or not in order to accept these pods. So you can of course set up pods that you may want to have run very early in your cluster life cycle to tolerate the not ready state. However, if you're seeing that you're in a not ready state, your pods aren't getting scheduled. Go ahead and take a look in your comfder to make sure that that CNI configuration isn't there. And if it's not there, it's probably specific to your primary CNI that you would use for pod-to-pod traffic. So for example, if it's flannel, that's installed with a daemon set. So check out what's going on with that daemon set and then enter your node to check out what's up. A few more just quick tips for debugging. The number one thing that will bite me is JSON. So it's great for parsing. It's not so great for humans. So whenever you're mucking around with your CNI configs in JSON, just send them through a linter all the time. So JQ is the one we use all the time. Or just if you don't have it on your system, just cat it out and grab a JSON linter on the web. In terms of logging, this can be tricky and very CNI plug-in specific. Since these plug-ins speak standard in, standard out, that standard out from your plug-in, if it exits non-zero, that is going to be captured by the cubelet itself. So that's a primary place that you want to look for any CNI specific errors. And then depending on the specific CNI plug-in it is, the developers may have implemented the logging in a specific way. So check the docs to see if there's any sort of parameters that could potentially enhance logging. And it's also possible that in some cases you'll have CNI plug-ins that have a daemon component that's running in a pod. So as I mentioned, we've got these binaries that run on the host and are on disk. And then we have some that are, that could potentially be running as a daemon in a pod. So check both on the host and in the pods. And if you're a developer, please add some additional logging for your administrators. This is, they can be particularly tricky to debug. Last but not least, when you're debugging CNI, you're often doing it specific node-to-node. And you'll be on one particular node. And if you're making manipulations to that node, you'll find yourself having to muck around with labels, node selectors, all of that kind of stuff. You may want to just use static pods, which allow you to put a YAML file specifying your pod on that particular node. And then it'll spin up on that node. So it's a handy way to kind of get around to that. So I also wanted to speak a little bit about CNI tool, which is a tool that comes within the repo of the CNI itself. Because, so, recapping. So how do we get the CNI plug-ins to work? So you can, CNI plug-ins, it's a binary which may get installed into all the nodes from the cluster using the daemon set of whatever. Then you may need to go see what happens when you are going to, you know, go and develop the binary. You may need to copy that. Then also you will need to check the logins. Maybe this is executed by the kubelet. You may want to go to the kubelet or cryo. So this is maybe a little bit too excessive for a developer who might not even have access to a community cluster and maybe not even kind. So the CNI repo comes with a tool, which is the SwissArminife. It's basically getting your binary and getting your configuration that I spoke before, which is basically a JSON file with all the tooling from there. It will create a network name and space, and then you will be able just to test that out. We got a demo about that, so I'll be showing that to you later. And in fact, if you just want to create a CNI plug-in, check out CNI tool and make it work using the network name and space. Because you'll definitely get an awful lot more of information rather than just, you know, a crash loop on the pod or yes, a sandbox couldn't get created and so forth. So again, and as we were commenting before, one of the things that these CNI plug-ins don't usually have, especially the reference ones, is login. A login boot just makes your life so much easier, even the user or the administration, because they would know how to debug the thing and what's going on. So let's get you started with some demo. Is the font big enough for them to make it bigger? Let's make it bigger. So we'll start by creating just a can cluster, no rocket science, just a quick Kubernetes over Docker demo. So let's get kicked in. By the way, we had to do some modifications because of the Wi-Fi from the beginning. So we wanted this to be real life. Sorry in advance, guys, we couldn't make it. Now we are going to be starting, by the way, this kind cluster has no CNI plug-ins. So at the beginning, the node would be in a node register. So let's get a primary CNI plug-in, which is going to be Celium. Let's build the image from here. We got it. And then let's load the image into kind. Okay, there we go. We are going to be also using multi-CNI as a way to have several CNI plug-ins working within the same pod and have several interfaces on the same pod. So there we go, running. So, okay. How multi-works, how do I add a secondary network interface or a secondary network? So you need to use a network CRD, which is called a network attachment definition, which has this way. If you see, I'm also injecting the conflict from the CNI plug-in. We are going to be using the plug-in that we spoke before that we developed for this conference, the Kipken CNI. And again, this is a binary, which is going to be executed later on. Now let's create a pod. How do I hook up this secondary network in the pod? You're going to be using an annotation. And if you see, it says the same name from the configuration that we used before. So that will create the secondary network interface. So far, this is okay. So now we got a pod with two NICs running. But what happens if you mess up? At the same, you create a secondary network with a non-ballot configuration. How would you debug this? Okay, here we are going to create yet another network attachment definition, which is going to be non-ballot, because this is just not there. And if we get to see the pods, okay, let's see. There we go. Those are the annotations filled from the first pod. This is running. It's okay. If you can see, it has two different interfaces as well from the CNI and the other one from the Kipken CNI, and that's okay. But what happens with the other one? The other one, it says it seems to be okay, but it's containing creating, and it's going to be there forever until it gets to a crash loop. Because as we were commenting on the slides, one of the most common things and issues when creating one of these is that the CNI is just not there on the binary. So we are using Celium, which is there. Of course, it works. And then we are using the Kipken CNI, which exists, and the other one, which doesn't. So how could we debug this out? So if we go to the Kiblot locks, you'll see that it's just trying to create a pod sandbox and it's trying to hook up into the binary, but it's just not there. So getting back to the CNI plugin path, and let's assume that we are using the default path, you could get to see that most of the plugins are there. We got Celium CNI, we got the Kipken CNI. We don't have the other one we are referencing, but of course it would always fail. I also wanted to speak a little bit about, okay, how would I test this? So Kipken CNI, if we are on the CNI plugin path, you could always just go ahead and call the plugin. Those are all binaries, and if you are on each node, you can just execute 10. If you see this Kipken CNI is just kicking out like today's date, so it does nothing fancy, but I think it's a great example on how to handle this on. Feel free to check that on GitHub. And I also spoke about CNI Tool. So how would I check locally and without having to deal with any cluster, my CNI plugin and the configuration? So for instance, this is a sample configuration that I took out for the egress router CNI, which is also available on GitHub. So basically it's meant to be expecting a few IP addresses and a few ports. And if you see there, I'll say that the address is just called Kipken, and the destination is called Binfuber. So this won't ever work. So what happens? Let's just give the try to test this with CNI Tool. So I got this script, which is just basically, first I do create a network dynamic space, then I do call CNI plugin. If you check there, I'm just asking for it to do a CNI add command using the egress router binary. So egress router binary is again the name and the type for that CNI plane that we are testing. So let's run it. Here, this is one of the plugins that has logins. Sorry, the word login is enabled. So this is again an awesome thing because you will be able to see what's going on. So we are calling add. We are trying to set a gateway, which of course doesn't work. But you can see some stuff. This doesn't happen for any of the reference plugins. And as Doug was commenting before, it's enough of now and we hope that this is going to be improved by the CNI Tool specification. Every CNI plugin needs to go and implement their own log in. So we're about CNI, they do log in. Multi-CNI, they do log in, but most of them don't. Let's try to do the same with the KubeCon CNI that we just got here. Again, as I was commenting before, you just need to put in the type and the name and the CNI version. It's nothing much more. So that would be the scale of the first CNI plugin you're going to be playing with. Afterwards, you could just go and add any fields that you would be happy to. This is just the same script adapted to the other CNI plugin. And what's going on? There you go. It gives you valid output. So should it have failed, we wouldn't have any way of knowing that because there is no login. Okay, so thank you for attending the demo. Cool. Thanks, Daniel. Really appreciate it. Alright, so to briefly try to wrap this all together and talk about how we'd like to see CNI 2.0 improve based on our experience with debugging CNI 2.0. One of the main things I'm excited about in CNI 2.0 is the prospect of having the plugins themselves running in pods. So since we have these plugins running as binaries on disk, it's kind of a non sequitur for how you interact with your cluster as an administrator. So that's something I'd really like to see. We're not a huge fan of having the configurations in JSON. It's also not exactly the same as what we're used to with our MAML type of specifications that we're working with. Additionally, devices are a difficulty, so we're trying to figure out if there's kind of a better, more collapsed way to work with devices and lastly is to have something that better handles network lifecycle during a container's runtime. So when we mention the commands we've got CNI add, delete, et cetera, those happen in discrete moments when your pod is created, when your pod is deleted. But what if something happens between those two detent points? We'd like to see something that's a little bit richer in that sense. And just briefly, something that is kind of an idea in terms of design paradigm is the idea of a thin plugin and a thick plugin. A thin CNI plugin is just the binary on disk that runs as a one-shot. It starts, it does its work, and it exits. You could have a richer design that we call a thick plugin and the idea is you replace that thin plugin that runs on disk with what we call a CNI shim. That shim takes that standard in the environment variables and it package that up and sends it to a process that's usually running in a pod as a long-running process. That's where we call it a thick plugin. And it takes that information, does the work that it needs to do and then sends the results back to the shim and the shim exits. As you can see by this diagram there's a number of moving parts happening here. And in CNI 2.0 we're hopeful that we can standardize all of that communication and have your plugins running natively as a thick plugin and then in this sense instead of all of that work that Daniel had to do in order to figure out is my pod running and not running how did I get the logs from the cubelet. What you can just do is more like a cube cuddle get logs and get your logs out of the pod. So yeah, thank you. If you have interest in any of this and in CNI 2.0 as well there's a working group that I work with that we would love to have you join with any questions, comments, ideas, contributions, etc. As well as the CNI maintainers I'm sure would be happy to hear from you. There's a number of issues in the github issues that are tagged as a CNI 2.0 if you're interested in that as well and we have the contents available. That's the content and also if you want to talk with us feel free to ping us on Kubernetes. So yeah, thank you. Any questions anyone? I have a question like this idea of CNI plugin in pod but you have idea how to resolve the issue of eggs and chicken like how you create the pod without CNI plugin so what would it be first? That's a great question and I think that you would probably have to have a kind of special class of pod. It might look more like a static pod itself that would be my guess but that's a great question. Another thing I'm thinking about in a similar sense is in CNI 2.0 if you have these plugins running in pods they're going to need host access as well and I think that that's going to be something tricky to be able to expose that host access to the pods and then it's also going to have security implications so I think that there's a number of interesting challenges there. Thank you. I think it's going to be a little bit like you know with the Kubernetes API so you're putting the static pod also probably going to get either capabilities for accessing networking and so forth or yes you know privileged access. One question I finally thought that in order to have multiple CNIs I have to deploy a multus or whatever. Is any work in progress to be able to support multiple CNIs without having to resort to multus or something like embedding multus in pod? I'll let Dr. reply because he's a multus maintainer so go ahead. Yeah Francisco that's a great question so this is something that we've as Network Plumbing Working Group have been looking at over a number of years I think that SIG network is becoming increasingly more interested in the idea of having this natively in tree in Kubernetes. When we first started the effort for CNI we it's a long story but the short version is it was going to create sweeping changes across the code base of Kubernetes and the maintainers weren't super excited but I think that over time more people have seen the value in being able to have multi-homed pods so we are continuing to thrive in that direction and I think that everyone who's a member of the Network Plumbing Working Group would like to see that happen so stay tuned because it's certainly part of the discussion thank you. Also one of the things that has been discussed lately is that so far you need to add a network definition so it's a work in progress less than a year to have a network primitive within Kubernetes directly we have to do the same so you can do that and allow the network to use Plumbing under the hood. Thank you for the talk. Slightly off-topic question what software did you use to make the demo? It's called demo magic so basically you could do everything. I think it's a cool thing to do you could basically One follow-up question related to MULTUS. Can you comment on Intel DPDK support, so data planning development kit, where we would need additional network interfaces? Sure, so the best context that I have for the use of DPDK with secondary interfaces has usually been relative to the use of SROV devices. Additionally, it is one of the considerations of the specification that we have for secondary networks was to allow you to have kind of a blank slate, if you will, because if you're using DPDK, you're going to bring along your own IP stack, et cetera. So it's definitely possible. And I believe that using the SROV network operator, which is one of the projects as part of the network plumbing working group, there is DPDK support for SROV devices as secondary networks. Does that help? There's also another option, which is, let's say you got your own plugin. You got to have any plugin that, for instance, would attach your secondary unique to OBS DPDK, then you're getting that for free. Or you could also have a secondary plugin that would attach a Macbeta interface, and then you get the SROV passthrough and so forth. So I mean, it kind of depends on what you want. I think that we have run out of time, but we would. We're happy to answer your question, so we'll be going for a while. So thank you. Thank you a lot for attending today, and see you around.