 They've told me that my time has begun. So while we're all still getting settled, let me get an idea of who's out there. Then I'll give you an idea of who's up here. Who here at your first KubeCon? Oh, yeah. Okay. Good. Good. That's actually really good. I was aiming this for very much a beginner audience, and I like to see lots of beginners out there. Who here has been to at least three KubeCons? Okay, we've got a few. Got a few veterans. Who here has ever gone through and successfully completed Kubernetes the hard way? Okay, I see a few hands there. All right. Y'all maybe don't need to be here, but you're welcome to stay. I love, you know, and if I say something wrong, feel free to like jump in and be like, no. So we are here to grab our can openers and apply them to Kubernetes networking. And this is going to be very much a, not so much a dive into the Kubernetes networking API constructs. Although they will get mentioned, of course, it's more about when those are in play, what do you actually see outside of Kubernetes on your host going across your network? That kind of thing. So there's a lot of pointers in here. I'm not going to dive terribly deep just because we don't have a super long amount of time, but there should be plenty of pointers for you to dig into things as, as you desire after the talk. I will of course be around to answer questions even after the Q and A. I'll be hanging around in the hall and such. So with that, let me tell you who I am. I've been in IT depending on how you count for going on 30 years now. I have worked for some large companies, some small companies, some companies people recognize and probably some that nobody but a few people have ever heard of. I started working with Kubernetes in 2015. Okay, that's somebody in the back. I started working with Kubernetes in 2015. I went to work at CoreOS shortly thereafter and my first KubeCon was Austin in 2017, the time that it snowed in Austin, Texas. That was so much fun. I have, you know, my pronouns. There's a link there that how to get in touch is actually a link to the contact info page on my website and I have a couple of contact info items just right there as well. Currently, I'm a staff solutions engineer for HashiCorp. So I do, although Kubernetes isn't like I don't run a cluster day to day, it is actually a big part of the job that I do because what I work in is our customer design lab and so I'm working on implementing reference architectures into live demo and workshop environments to show our products off to our customers and prospects. So why are we all here? Well, first and foremost, we're here to learn, hopefully. I hope we're all here for that. But also, you know, learning about the foundations of Kubernetes networking and how we can apply the knowledge that we already have to those foundations because I'm sure everybody here, even if you're a Kubernetes newbie, you already know a lot about networking on various platforms and that knowledge doesn't just go away and it doesn't become useless because you're now trying to apply it to Kubernetes. Those same tools that you've always been using, they are old and well-worn and well-maintained and hopefully well-kept up, but just because they're old doesn't mean we throw them away. So what is the general situation that we find ourselves in? Kubernetes lives on a set of hosts and it enables some very advanced networking models. I know everybody here has probably run across Service Mesh at this point, even if you're a beginner. There are various kinds of Service Mesh. We have Consul, of course. But those models, those Service Meshes, those advanced networking utilities and such are all built on the same basic host operating system and the same basic host networking capabilities that have been around for a really, really, really long time. In some cases, I'll actually note some of these things that people think of as being new are actually over 20 years old at this point. And then some of the other tools go back even way further than that, way further than I do. So some people would ask, well, aren't there already Kubernetes tools to debug all the stuff in Kubernetes? Well, yeah, sure there are. And they're great and you should definitely learn them and use a lot of them. And the ephemeral debug containers and such that just went GA fairly recently, those are highly useful. There's all kinds of nifty add-ons like Weave Scope. But sometimes you have trouble getting those things to work, getting them installed, getting them running. Or they were installed but they're failing now and you have to figure out why they're failing or they're not telling you something that you need to know. So you might even be in a situation where somebody comes to you and this is kind of my demo later on. Somebody comes to you and says, well, we got something broken in our cluster and I can't give you access to the cluster API but I still need you to fix it. So how are you gonna do that? Well, you actually can do that some of the time. So we'll go through some scenarios like that. So Kubernetes networking in a fairly large nutshell. Now we're gonna rewind history here for a little bit. Back in the 80s, some of you may go back that far. There was this thing called the OSI 7 Layer Model. Now, everybody talked about the OSI 7 Layer Model and we still kind of reference it today when you hear people talk about a Layer 4 Proxy or a Layer 7 Load Balancer. That's OSI 7 Layer Model terminology. Nobody ever really implemented it strictly, compliantly, but it's still kind of with us. So the layers went from the physical layer which is your actual physical ethernet cables and network interfaces all the way up to the application layer, the actual thing, the software that you're running on the host. Kubernetes networking is layered as well and we'll go through some of those layers and some of them do, in fact, correspond fairly well to the OSI layers. So we'll see how that plays out. At the bottom, of course, with any orchestration system, you have the host network itself. You've got your regular host addresses that represent things like physical ethernet cards or virtual machine virtual network interfaces. But then also when you add containers into the mix or at least the containers the way that they're done these days, you have interface pairs. So you'll have one interface in the container, one outside the container on the host, and that's how your containers actually get networking to the host. So all of that we could collectively call it the physical layer because it's all physical-looking interfaces as far as the host is concerned and you can treat them as physical interfaces when you're looking at them with various tools. Now, above the host networking layer, the relatively new thing is the container networking layer. So this is your Docker bridge, your container D interfaces, et cetera. This is where your container runtime starts this up, handles it, attaches your containers to it. In Kubernetes terms, a lot of this gets configured by CNI plugins. That's the container network interface that Kubernetes has standardized on. And then you often install those. Oops, I see I have a typo there. You often install those with kube control. How do you do that? Well, it turns out that what you're actually doing is you're installing them with privileged containers running on the host network interface. So that's how you bootstrap that container pod network layer from the host layer. The CNI plugins that run also have to set up some kind of way for traffic to get from host to host. Now, every plugin seems to do this differently and some of them support different options. I remember back in the day when Flannel was pretty much it, I think it supported either three or four different ways to get traffic from host to host. And so it would set up the host to route anything that was intending to leave the host from the container network to itself. And then it would take care of getting that traffic across in whatever way you had configured. So next up, on top of the container network, we have the service network. Now, this one doesn't really show up as interfaces. And so it took me a long time to figure out, well, how does the service network even work? It doesn't have interfaces. It doesn't have routing rules. What is going on? Well, it turns out that what it's doing is it's running on the host, and we'll talk about how in just a minute, but it's an abstraction. The service network is kind of a fictitious network. And so even inside of Kubernetes, it's kind of a fictitious overlay to the real-ish pod network. And if you want to know more about the service API, Tim Hawken did a really good lightning talk just the other day. I have a slide at the end of this that has a lot of reading material and different presentations that have already gone by that you can watch later and some of them that are still upcoming. His lightning talk on Monday night was fantastic. The gist of it was that the title of it was the service network, the worst API in Kubernetes because it is so overloaded and it's causing so many problems for the project at this point. And services in turn have their own set of layers. So you can have a cluster IP service, you can have a node port service, you can have a load balancer service. And that basically is whether it's purely internal, whether it's host local, or whether it's actually exposed outside the host to the rest of the network. Then on top of services, we have these other abstractions, Ingress Gateway API. Some people are saying stop using Ingress and start using Gateway API. I don't know if I go that far, but definitely Gateway API is something you should learn about. Basically every service mesh, you're gonna find that it has a service in there, probably a load balancer service. A lot of these things do layer seven stuff. And in literal terms, a service mesh is just a big complicated application operating at layer seven. So we talked about what we have built. Now let's talk about how it is built. What is it built on top of? Oh, and I wanna go back to this actually because if you notice this big beautiful house with this really incredible architectural detail, look what it's built on top of. This crude found out that might even have been like a previous building's foundation that this building was built on. So the foundation could even be older than the house. But again, just because it's old, doesn't mean that you have to stop using it. So host networking interfaces, you like I said before, you'll see them both inside and outside of your containers. You will also, when you're talking about C&I plug-in traffic going from host to host, you will often see this kind of thing reflected in the routing tables. You'll see routes that will direct traffic to a particular interface that is an interface managed by your C&I plug-in. Flannel did that, it was the Flannel Zero interface. Now, IP tables is where you will see your service network show up. So it doesn't have interfaces, but what it does have is a whole bunch of IP tables rules, or if you set your cluster up this way, it'll have a whole bunch of IPBS rules, IP virtual service. And these are both kernel mechanisms, but they are kernel mechanisms that you can see from the host without ever having to go into a container. Now we start talking about namespaces, okay? Remember I said some of this stuff goes back way further than people think it does? Kernel namespaces were added to the Linux kernel way back in 2002. And the first namespace was the mount namespace. They've added several others since then. For a long time, it was seven namespaces. And then just like yesterday when I was looking up the seven namespaces for this talk, I saw, oh, they added an eighth. In 2020, there is now a time namespace. But the three that you're mostly gonna concern yourself with when you're debugging containers are gonna be the network, since that's what we're here to talk about. The PID namespace, that's the process ID. And the mount namespace, that is the file system that is exposed to the container. And the last two, you don't necessarily need all the time, but depending on the kind of debugging you're doing, they may come in handy. Now when we start talking about things above that level, there are some more exotic things coming down the pike. These are actually brand new things, relatively speaking. The big one is probably EBPF. So anybody here who has not heard of EBPF yet? Okay, a couple of people. It's all over KubeCon. There's tons and tons of sessions about we're adding EBPF to our product. We're diagnosing EBPF issues. Definitely go to that stuff. That is a new tool and is starting to take over the world. It is going to, I assume, go through a phase where initially it's going to be very hard to debug because EBPF is literally an opaque, compiled piece of byte code running in a virtual machine in the kernel. So you don't necessarily have all the usual mechanisms that you're used to to poke and prod at it and go, what's going on in there? But somebody's going to have to come up with ways to do that. Let's talk about getting your can opener down into the host so that you can see what's going on in all these containers. I said before, your basic host tools still work. They will still tell you useful things and they work the same way. Interfaces are still interfaces and you can still use the same interface tools like IP, IF config. You can still use the route command for routing. You can use, like I mentioned, the IP tables command to see all the IP tables rules that are in the NAT table for your services. There are other higher-level things that you may also be able to use. Another one that I use frequently when debugging container stuff is OpenSSL. The OpenSSL S Client mode is great for debugging TLS issues. But you may need to use that inside a container. So on the host itself in the regular host namespace, it may or may not tell you what you need to know. And so that's the big downside of sticking to the host namespaces entirely is some things that you need to talk about live-in containers. So some of your basic tools may not... They'll tell you about the host but you don't need to know what's on the host. You need to know what's going on inside the container network. So how do you get into the container? Well, the good old standby, you can try and exec into a container. And if that container was built with a distro base image or somebody included a shell in it, then that may succeed. Getting a shell is only half the battle. You have to make sure that you have the tools in that container that you need. And depending on the container and depending on the particular situation, there may not be a way to get those tools in there if they're not already there, especially in a minimized container. I've seen a lot of minimal containers where somebody said, yeah, it's basically just our app binary, but we're gonna throw bash in there just in case we need to debug. What are you gonna debug with? There's not that much you can debug with bash alone. So how do you debug that kind of thing? Well, you can run a debug container. So what you do is, and every container runtime has its own way of handling this, but you spawn a new container, you attach it to the namespaces of an existing container. And Kubernetes does a weird hybrid version of this. So when you start up a pod, who knows what the first thing to start up on a pod is? Pause, I heard it over there. The pause container starts up. The pause is a little do nothing container. It's just there to set up the network namespace. And it just sits there and idles forever. And then the other containers come in and they join that network namespace of the pause container. But they don't join all of its namespaces. They each have their own independent PID namespace. So it's kind of halfway. Like, they're not all the same from a namespace perspective, but they do share some namespaces in the pod. Now, when you're spawning your own container, of course you can pick and choose which namespaces you wanna try to share. If you're trying to debug something that lives inside of a minimal container, this is one of the best ways to do it. I'll show you the other one in just a second. This is actually something that Kubernetes now supports recently. It's 125, it went beta, and 126, it went GA. I forget the exact timeline, but there's actually now a cube control debug command that you can use to spawn an ephemeral container in a pod. And you can target a container because they don't share PID namespaces by default. And then you can do this kind of debugging when you need to. But the downside of it is, A, you need to be able to spawn a container through some mechanism. B, it's kind of heavyweight. You're adding the load of a whole container. And if you're doing debugging, you're probably running some kind of a distro container. Those can get pretty big, they can take a lot of memory. So you don't always wanna do this. But if you can do it, if you have the room to do it, it's often the most convenient way to go. When you can't run the debug container, but you still need to get into the namespace, there is the classic NSEnter command. That runs another command, joining it to the namespace of one or more namespaces of a targeted process ID. So you can attach any namespaces you want when you do that. Now, some namespaces, when you try to attach them, you may not get a result that's particularly useful. But you can do this. You can pick and choose your namespaces with NSEnter. You can run a shell that will be attached to that process namespace. Or you can run any other command, any other debug command you need to. This is going to be commands on your host, shells on your host, but they're going to be in that container namespace now. It is a little bit harder than just doing a Docker run. Or if you have been working with container D, doing a like CTRC run. So at this point, I will, if you'll bear with me for a moment, I'm going to try and mirror my screen so I can show you this demo of debugging a faulty ingress. And we're going to do this without touching cube control once. Nope, it's not letting me do it. I don't know why, but it's not. Bring this over. Just bring this up front. Okay. So if you'll bear with me for a moment, this will be slightly tedious because I'm going to be copying, pasting back and forth, but we will get there. So I am already in a cluster node. So the scenario here is there's an ingress and it has two routes and one of them is working fine. It returns what it should return. The other one gives you a bad gateway error. Okay. Now this is running on an EKS cluster in AWS. So when you have a connectivity issue with anything to do with AWS, I know people are hearing the answer. I want to hear you all state in unison. The first thing I'm going to count down from three. The first thing you check for a connectivity problem in AWS is three, two, one. There you go. Actually, I lied. The first thing you do is say, hey, anybody want to debug these security groups for me? Okay. So we are on the container node. And what we can now do, this is running container D. So a little bit of background here. Container D is sort of kind of patterned after Docker because that's what it came out of, but it's a little bit less friendly. It actually has its own namespacing of containers. And when you set it up as the Kubernetes container runtime, it puts all the Kubernetes containers in a namespace of its own, not related to kernel namespaces. This is just container D's own namespacing called kates.io. So what I'm doing here is I'm saying, list all the containers in the kates.io container namespace. Okay. So I've got a bunch of containers here. Now, if I scroll up and down, I think I can see it if I scroll back up here. Okay. So I've got this container here that is running HTTPD. Now I know that I'm trying to debug HTTPD, and I know that the problem container is on this host. And this is the only container that says it has HTTPD running. So let me come back here and what I'm now going to do is get info on that container. Now, this spits out a whole bunch of JSON, and I'll save you some time. It's not super, super helpful in terms of what I'm trying to debug. So what I'm going to do is I'm going to run a debug container so that I can attach to it. Now what I'm going to run here is, again, slightly less friendly than Docker. If you do a Docker run and you don't have the image, it just goes and pulls it for you. Container D doesn't do that. You have to pull it manually. So I pull this Fedora container image. Now I can run it by... And I want to kind of go over this real quick. We are attaching to two namespaces. We're attaching to the PID namespace of the existing container and the network namespace. We're also going to bind them out from the host resolve.conf. And the reason we're going to do that is because when we get in, I didn't know this going in. I had to learn this the hard way. When we get into the container, we're going to need to install some tools, which means we're going to need to download packages over the network and without resolve.conf. We can't do that. So that's what's going on in that command. We're just attaching to a couple of namespaces. We're setting up DNS resolution. Okay, now we're in the container. And I will save you the time. I learned the hard way again that this container does not have IP route pre-installed. So we install IP route. IP route, by the way, is the replacement for a lot of the commands people have been using for a long time, like ifconfig, netstat. There are successor utilities to those that live in the IP route package that you should get used to using. Learning new tools is good, even if you still continue to use the old tools. This does take a slight bit of time here. It's usually about, okay, there we go. It's just about done. Okay, so now we have IP route too. We can now run the ss command and see what is going on. Okay, so ss-anp tells me, now my HTTP process in the container is listening on localhost. There's the problem. Okay, so we found the problem. Now, we can go on and we can do some additional debugging where we determine how the problem came to be, more or less. So if we exit out of this container and we then clean it up by deleting it. I hope everybody can see this fine, by the way. So now our container has been deleted. Now, let's kind of go at it with stone tools. We're just gonna do a plain old brute force grep for HTTP. Okay, so we got some processes here. I'm gonna pick, because it's the lowest process number, so it probably spawned off the others, I'm gonna pick that 8487 process at the top of the list there. So if I now ns-enter and attach to... Now, if I don't give it a command to run it, we'll just by default run a shell. So I'm attaching that process. I'm attaching the mount namespace and the network namespace. Okay, so now I'm in not a container, but a shell. And just for grins, because I suspect I'm gonna need to do it, let's check what kind of container this is built on. It's a Debian container. Okay, so instead of running DNF, we're gonna run apt commands on this. It's gonna be real quick here. We are almost done with this demo, by the way. Now, we install our IP route. Now, on Debian, it's called IP Route 2. It is actually still the same package, okay? Now, just to prove that we can do this, we can repeat our sysstat command. We can say ss-ant, and we see the same thing we saw before. We're listening on localhost. But now what we're gonna do is we're gonna go beyond that. We know this is an Apache container. Who knows what the default config file for Apache is called? I hear it back there, httpd.conf. I don't know where it is in this container. I'm just gonna go looking for it. Okay, it's in user local. So now, let's take a look at that container, actually add that file running in that container, and let's see if we can figure out what the problem might be. Boilerplate there. More or less boilerplate there. That's the problem. Somebody misconfigured Apache. We kind of suspected that, right? But now we can actually prove it. We can say the reason that your Apache is only listening on localhost is because it's there explicitly in your http.conf file. Now, how did that get there? It could have gotten there a number of ways. That image could have been built with that file already in the image. It could be mounted in from some kind of volume, which is in fact the case. What I did when I was setting this demo up was I actually pulled the http.conf out of this container, well, another clone of this container, modified it, and then created a config map from it, and then started this container up with that config map mounted as the config file. If we want to, actually, I will show you one more thing since I have a little bit of time to burn still. If I do dash t, net dash l. Oh, I'm still in the shell. Never mind. Okay, now if I'm on this host, if I do, okay, I think I got that right. Ah, see all those kubeservice things? That's where all your services are happening. All of that mess up there. Good, you know, if you want to go parse through it, go to it. You may have to do that one day, so it's useful to know where you can do that. That concludes the demo. So wrapping up, I have about five minutes left. I'm going to wrap this up real quick so we have some time for Q&A. If there's one thing I want you to go away from here with, it's the idea that just because you're a beginner with Kubernetes doesn't mean that you don't know anything useful. Everything you knew about host networking and Linux networking before continues to be useful. You're going to need to learn new tools. I'm going to learn new tools. You're sometimes still going to want to use the same tools that you got, and that's fine because the thing I always tell people, they go, well, I don't really know how to use that new tool. Use the tool you know how to use, if you know how to use it in the way that you need to use it. The right tool is the tool that gets the job done most efficiently. I'm not going to go through all this. I did want to mention specifically there's a link to Tim Hawkins' Lightning Talk right up there on the top right. There's a link to Kubernetes, the hard way on the top left for the many of you who have never done it. I highly recommend it. There's links to some other videos, and this is a list of sessions that are available. For the Tuesday sessions, of course, I've already gone by, but the Wednesday sessions, there's one right after the break from this. So if you head on over to W178, you can kind of start rolling through those. That's all I got. The QR code there goes to the same URL that the tiny URL points to. The QR code on the right is for feedback, and I am open to any questions that you have at this point. Oh, oh, yes, oh, I have done that, yes, many times. And actually, that is one of the ways that I verify that, hey, is there anything actually listening in that container? Because sometimes it's a little hard to navigate the process space in a busy container, so it sometimes is easier just to do a TCP dump. The question was, have I ever done a TCP dump packet capture, or is that useful to do? Yes, it's entirely useful to do. Anybody else? We got a microphone over there. I think that's it. I think it's just the one right there. Yeah, so you spoke to an OSI model not really being rigidly implemented. Has there been any effort in the space of the Kubernetes network stack to standardize, to ensure long-term interoperability? Can you say that again? I couldn't quite make that out. Yeah, has there been any effort to solidify some sort of model of the Kubernetes networking stack to ensure there's long-term interoperability instead of some of the chaos that happened with OSI where vendors didn't implement things? Oh, yeah. Oh, there's a lot of stuff going on with interoperability and multi-cluster federation and multi-cluster networking. One of the talks that I went to yesterday was about multi-networking in Kubernetes. I think I have it linked from the previous slide there. There are lots of plans to enable multi-networking for containers, which is going to be useful for people who are, like, running Kuber. So that last demo, you were on the EC2 instance, and then you used NS Enter to get into the container? Yes, so the second half of that was we started off, we exited out of the container that we had spawned, then we, from the host, without spawning a new container, we just attached a process to the namespace that the container was using so that we could actually observe what was going on in the container without needing to run another container ourselves. First of all, great demo. Thank you very much. Last time I had to troubleshoot some network connectivity issues, it actually ended up being EBPF-related with some macro-sec package. Is there a strategy you could suggest if you had an issue like that? And the second part is, those commands that you showed for ContainerD, like, how does one actually, like, look them up and learn? I had to look all those ContainerD commands up in the ContainerD documentation, and or I did a lot of CTR-dash help at the command line. Now, as far as the other one, talking about strategies for dealing with EBPF type of stuff, I, to be quite frank, don't know a whole lot about that. It is still an emerging area. I think at this point, your best bet is to do more observational type of things like I can see traffic is or is not flowing and then maybe narrow that down and if you can determine that it is something related to an EBPF module, then I guess at that point, you have to go back to whoever wrote it and say, this is not working. Why is it not working? Thank you. Hi. First, I can see that in the Container's IPv4 is, you know, entirely sufficient, but when it comes to IPv6 on the host, what kind of issues can there be in publishing, you know, the Kubernetes... So, it's kind of interesting because until fairly recently, Kubernetes did not support IPv6 and we just ran out of time, so I'm going to wrap this up real quick. It did not support IPv6. It does now and as far as any kind of issues you would run into, I don't know of anything specific to Kubernetes. I think it would just be the regular sort of IPv6 issues you would see anywhere, like, you know, when you don't have IPv6 connectivity but something is trying to prefer it and so your networking suddenly dies and you don't know why. I'm going to wrap up here, but like I said, I'll be hanging out in the hall for a little bit if anybody has any more questions for me.