 Hello everyone. Good afternoon. Welcome to this talk on root causing Kubernetes networking problems in an automated way. I wish this session was in-person and interactive. I still would like to have this as interactive as possible, so keep your questions coming in on chat and I'll answer them as we go. I will also allocate some time at the end as much as possible for Q&A. I've debugged network problems in over a hundred customer clusters and production, so whatever you see here today as well as what you'll see from a KADES network perspective, it is from my bad experience. All right, so what is this talk all about, right? Have you been in a situation where everything looks okay, all of the parts are running, but then your app just won't communicate or it works most of the times, but does not work sometimes. Now, if you look at the cartoons there, that could turn from an exciting happy person running Kubernetes and happen within Kubernetes to a very frustrated person, especially when it comes to a networking problem. At the end of the talk, I'm hoping that you feel less intimidated about networking issues as well as a lot more comfortable within Kubernetes networking and I leave you equipped with a couple of tools and ways in which you can get to root causing a Kubernetes networking problem in an automated way as well as if not getting to the problem, getting to the place where you can find right now. It's always beneficial to know some basics of Kubernetes networking. Rest assured, most of the hard stuff is going to be done by KADES. In that sense, I will be talking about Kubernetes networking in brief. It is a big topic in itself and the whole talk. So think of it as a crash course. We'll then look at the problems that you would normally see with Kubernetes networking and running pods within Kubernetes and how to classify these issues. We'll look at what people do to resolve these issues. It's very interesting what people do and that slide is exciting. After that, the important part, we'll look at some open source tools and how we can use them to debug any networking issue and how to go about that debugging process. And then we'll look at KADES network in detail, give an introduction on it and a quick demo on how to work with it. All right, without further ado, let's get started. So Kubernetes networking to keep it simple, right? There are three golden rules and when it comes to communication within Kubernetes and they are that all containers within the Kubernetes cluster talk to each other without any kind of net, which means that the IP address that the container knows when you say run IP config or if config or IP link show or IP address show. The address that you see is the same address that another part or container outside of within that cluster would use. The other is that all nodes are able to reach these pods or containers without any net and vice versa, right? So without net is an important construct here. Now, to understand network problems and how to debug, it's beneficial to know what the components are that play into Kubernetes networking. So let's quickly take a look at what these components are, which are responsible for everything from connecting your part to your service and everything else within Kubernetes. The first one is the CNI backend. Most of you would be familiar with it. You either run Calico or Flamel or Celium and so many other CNI back end are there. That's the component that provides pod connectivity. It connects multiple pods together either on the same host or across hosts. It also does netting. So if a pod needs to reach the internet, it kind of connects that pod over to the internet. The next component is Qproxy. It's responsible for the service construct. Whenever you create a service within Kubernetes, Qproxy comes into play and it works with IP tables, IPVS to give you that functionality of internal load balancing or east-west load balancing. And then there is the load balancing implementation. If you're running a cluster cloud, you have AWS ELB for example, Metal LB is an open source project that is used in production to deploy a software load balancer within the Kubernetes cluster. Then there is Ingress. If you want to be able to connect multiple services and route through a single point, single entry point, either by using URL pods or ports, you can do L4, Ingress, L7, Ingress. There are various examples there again. As you can see, as we go down this list, complexity increases. So there is Service Mesh, which is a lot more complex. You have Istio, Linker, Deconstant Connect, and the whole Kubernetes networking stack, right? IP tables, IPVS, Linux networking stack, and yada yada. So let's look at how communication works across the cluster. Say for example, two pods in an east-west manner. And say, let's say you have two pods. One is the blog app where there is an index app. You have two copies of these. They're running on two different hosts in your Kubernetes cluster. Now if pod A or the blog app wants to talk to pod B or the index app through the service, assuming there's a service that's backing this app, index B or pod B, how would it look like, right? So for the first call would be to call DNS to get the DNS resolution. If you're doing an HTTP get using the service-discovered DNS address, then you'd need a cluster IP. Once you get that cluster IP, you reach out to the packet, goes to IP tables to look up which pod it needs to get to. So it hits the IP tables with a service IP, it translates over to a pod IP, destination pod IP, and then goes to the particular pod. And on the way back, contract and keep track of these connections and replaces the source destination IP accordingly. And when it comes to north-south, you have endpoints, node ports and load balancers. Either north-south can be either that the pod is trying to reach out to the internet or in most cases what happens is a client is trying to connect to an application or service that you're running within Kubernetes. How does that happen, right? So you have a load balancer service that you deploy within Kubernetes, you create a load balancer object that backs your pods or your app. It goes through something called a node port. So when a client tries to connect to your app, they hit the load balancer IP, and then it translates over, selects one of the hosts that's part of your Kubernetes cluster, either where the pod is running or any of the other worker hosts. It goes to that host through the node port or a large high number port. And from that on, it goes through cluster IP and then to the pod. So this is the basic working of east-west traffic and north-south traffic within Kubernetes. Again, this was a very short Kubernetes networking introduction. So with that, let's start looking at what problems can exist within Kubernetes. So the way I see it when you have a networking issue, it's good to think about it in two terms or two ways. One is think about the impact, right? So what kind of an outage is it? Is it a complete service impact or is it partial outage or is it degraded performance? So plays the problem that you see in one of these three buckets. It helps here, right? Because if it's a complete service outage that means, and if you have multiple pods backing your service, that means that either there is a problem with your app or service or there is something critically wrong within the Kubernetes cluster. It's a larger problem. Whereas if it's a partial outage, that means maybe a few pods of your service isn't working. That's a lot more localized. That's mainly your app. Or even if it's the subsystem, it has something to do with specificity of that particular one app. Degraded performance is the most difficult problem to debug and fix. As you can see, the difficulty of debugging kind of increases when you start moving towards your right. And debugging performance is difficult. We'll see what tools are there and how to do it. But it's always good to classify your problem in these two aspects. One is the impact severity or class type, I call it. The other one is where this problem is occurring, right? Or what type of problem it is. It could be an application problem, which means that your application isn't working and what you could do there, right? So some of the things that people end up doing is look at logs to see if there are errors in your app. Maybe there was a code problem, the process crashed. Find descriptors that I know are if the app was just doing so much and running out of CPU or memory. Now, let's look at the other one could be platform components, right? So this is where a problem within the Kubernetes subsystem happens. DNS, load balancer, Q proxy service, there could be some kind of a problem happening somewhere there. You're not able to get DNS access, load balancer access, Q proxy access. There could be a configuration problem, which is the third type of third bucket, which is very common. And in my experience, there is a problem in MTU setting somewhere across from your source to destination that's causing an issue. Or there are routing issues, or there is an IP overflow just simply that your part is not able to get an IP address and it just fails coming up. Now, how do people resolve these, right? And this is a good segue to what do you do today without any of the debugging tools, right? The first one is the big hammer approach. What that means is there are two ways to how people fix it. One is by just hammering the part or the deployment of the service or the ingress, just kill it, right? And then let Kubernetes do the job, re-spin it up or re-schedule it a little bit more, and everything starts working magically. Cloud-native architectures are meant to do that, are meant to kind of have failures be tolerated. But then it may not be as visible for state pull-ups, or it takes longer to fail on it. And I've also seen people do this, where they go ahead and delete the node. If there's a problem on the node with multiple apps, and rather than fixing it, it might just be easier and quicker to delete it, especially if the Kubernetes cluster is running with the cloud. Just delete it and let AWS or Google or Azure kind of recreate that VM for you, if it scales it. The other one that actually works, most cases is asking for help. It's always great, even there, right? Knowing which component to ask help for goes a long way. So if you know that there's a problem with Calico, if you know there's a problem with Flannel or the CNI vacuum, or if you know there's a problem with Kube proxy, you get to be able to reach those specific groups. And people are very helpful. So Slack or GitHub, you'll be able to get help there. Or I've seen this happen too, where people find a hard networking problem and they revert back to saying, hey, let's just use VMs or not continue as it or you run it outside of Kubernetes, or just not use a load balancer or an Ingress, and let's keep it simple and do something differently. So that happens quite a bit too. For people who are a lot more excited and a lot more exploring, they try to troubleshoot live setups and manually. And that's where today's talk should help. Now, troubleshooting workflow, right? This is an interesting topic. So what I wanted to do here was to put whatever I've seen out in the field with respect to problems and how I've gone about solving a problem and how I built Kda's network to start looking at it in an automated way. So the first thing that I say is if there's a problem, then see what the problem is, right? If a pod is crashing, it's a lot more obvious. But then if everything is working, but you still don't have communication, it's a lot more harder to do. So see if there are other apps known, more often than not, there are other apps that are working, which means that it's not localized to your app. It could be more than one app. It could be specific to a node. And if it's specific to a node, then it's probably the subsystem or the Kubernetes components that might be thousands. If it isn't, then if it's an app specific problem first would be to rule out your app itself, look for logs, look for health metrics. In all of these cases, right, if you have Prometheus, Defana setup, have time series data being pulled in, it's a lot better. There's a cube metrics service which gives you node statistics as a node exporter. There are a bunch of tools out there that allow you to monitor your Kubernetes cluster. Again, let me put a warning on that. In a lot of cases, I've seen network policy within the Kubernetes cluster cause issues. I've seen firewall causing issues outside in your environment, which means external traffic does not come in or go out. And then search or TLS causing issues, which could be either empty you or again firewall or just your app. Anyway, now coming back to that flow chart, if other apps are bound and look at your subsystem, right? Either case, what we need to do is to identify the problem type of the class. Like I said, is it partial, complete or a performance degradation problem in case of both partial and complete outage. What we do next is to see what type of traffic pack the problem lies on. And here again, we classified into three things. One is East West and what I mean by East West is applications within the Kubernetes cluster talking to each other. Somewhere there's a problem there and that's normally through a Kubernetes service. The second one is North-South and North-South again can be divided into two, which is part to external, which means that the part is trying to reach a object store or a database server running outside Kubernetes or just, for example, if it's running in cloud, it could be ideas. And then the other case is when clients are trying to connect to your service that's running within the Kubernetes cluster from outside of a Kubernetes cluster, either outside onto the internet or from your intranet, whatever it may be, right? So first figure that out. And then let's look at the East West case where there's a problem with app-to-app. What would you do, right? You would first run something like ping to see if part A of your app A is able to reach part B of your app B, right? Through IP, through a specific container IP. If that doesn't work, there's definitely a problem either in the underlay or the overlay, which is your CNI backend. You're now kind of localizing the problem, which is a very important step and then you'll be able to get help. The second one is if that works, then look at DNS, try pinging or reaching the destination service through DNS. You can use part DNS. If that doesn't work, there's probably a coordinates issue or again, that could be an issue or an overlay issue too. If that works too, then look at cluster IP. So if there's a service that's backing your destination app, then pick up the cluster IP for that particular service and see if you're able to connect to that. If that works, again, if that doesn't work, it's Q proxy. It could be a problem in IP table rules or it could just be the underlay. So the bubbles that I've put there is where the problem class could lie, which component. And the blue boxes are the ones that you probably do manually to figure out if that's exactly the problem. And going down, you would then look at part-to-service DNS name. If that works, then you'd look at MP tables, IPVS admin, if you're using IPVS for Q proxy. And then of course, look at pathMT. If the size of the packet that's allowed on the network is some size defined and if you're trying to send larger packets, it's all right. Coming to North-South, you do similar. To add it to whatever you do in East-West, you'd also do part-to-gateway, right? So leaving the part it max out of the load and gets to the gateway and your physical network. And are you able to reach your gateway first? If yes, then are you able to, is the pathMT you find again? If that's a yes too, then maybe there's a problem between your gateway and your internet, which means it could be a firewall. If that works too, then check net rules, IP table rules, if there's any problem there. More often than not, I've seen Flannel or Calico. If there isn't a rule present in IP tables and act table that tells that the packet that's leaving part, busting to external to the node, needs to be netted out, it's not going to work. So that's another assumption. And if all else fails, you can always TCP dump and look at packets, which is a lot more complex. So in the interest of time, quickly going forward, open source tools, right? So there are a bunch of open source tools that you can use to figure out what the problem lies. Some of them that I've used or found helpful is Netshoot. Netshoot is an offer image that essentially bundles all of the binaries and programs that are useful to debug TCP dump, IP ping, IPerf. It even has tools to look at, just calls, various counters, performance metrics and whatnot. And this is useful because you may have an app that does not even have a shell. And so how do you debug that live app that's not working? But then you need to be able to log into that pod to try it out, right? So what you can do is run Netshoot, attach it to that particular pod, and then run commands, as if you're running within the network names. And what KTAS network does or the way I've built it is to do somewhat similar, right? So it's a binary, you run it on the node and you give it on the node where the source part is running. It's a source side debugging tool. And what it does is it runs a couple of things on the node to check if communication on the node and configuration is fine. And then what it does is it kind of goes or runs inside the pod network namespace that you've specified and then runs as if it's running within the pod. So you're able to see or figure out problems within the pod that actually has a problem. And the other way of debugging is by using probes or monitoring tools. So KubeNurse is a tool that deploys apps on multiple boxes or nodes. And then it talks to each other. All of these agents or apps on each of these nodes talk to each other and provide metrics in terms of communication path in terms of performance. So it's like a synthetic probe. So it gives you information of whether the Kubernetes cluster is healthy or whether the pod network is healthy. But it doesn't tell you if why your part is not working because it could be that there are other apps running on the same node that's working and you would still get a healthy result but your app might still be crashing. So that's how KTAS might look and naturally to some extent comes into play. KTAS Net Checker is another probe-based tool. It's not been updated for a long, long time. So I'll leave it as is. What to use when, right? This is important. So again, going back to the slide where we looked at problem classification and common problems, what I did was I put it, put down what you would normally do when you saw such problems. And we've discussed this in the flow chart, so I'm going to skip this slide very quickly. But for example, if it's a complete outage, then you're probably better off looking at continuous monitoring and looking at probes. Each of these links, I'll show the slides and each of these links will take you to GitHub reports where you get the app, you'll be able to install it and play for yourself. Now, for the automated way, how do you do things in an automated way, right? So that's where KTAS Net Look comes in. So why KTAS Net Look, what does it do? It is a standalone binary. It also has a Docker image. You can run it within Kubernetes, but I prefer it run outside of Kubernetes because you don't want to run a monitoring solution within the platform that may be having an issue. So it's an external to KTAS binary. It does source site debugging and it needs to be run on the node that runs the source pod. It imitates the pod as I spoke about earlier. Why did I do this, right? Because as you know, networking is fundamentally hard to debug and it is also, the network is on the level and most issues need to be debugged in the app that's causing the issue. And the aim of this tool is to make that network debugging easy and not have people shy away from it or just start off it. Time is of the essence for debugging things. The problem might correct itself. So knowing that there's a problem is important because Kubernetes is built to fix itself, but then if there's a problem, then it's good to know about it because if it keeps compounding, there might be a large scale downtime. Automating a lot of those steps that you saw and on the flowchart one, it's hard. You might probably need to know a bunch of working technology tools. The idea is with KTAS network, you're able to automate that. It is automated. So you're able to run it easily and then get some kind of feedback. Essentially give you that self-service debugging and it's inspired by that no problem detector. You should check that out too. What can it do today? It works with IPv6, IPv4 stacks. There are a bunch of checks that it runs. On the host side, when it's running on the host network namespace, it runs connectivity check for the gateway, API server. It runs, if there are multiple API servers in a highly available cluster, it checks all of them. When it comes to pods, it again looks at gateway. It looks at, if it's able to talk to the API server, it does DNS by looking at DNS lookup. It also does PMTO discovery, which is a big one that I've seen causing issues. And it also looks at various IP endpoints for a service. So if a service has 20 pods that's backed by it, it's going to reach each of these 20 pods and see if everything's working. Because if there's a partial degradation where you have 18 pods working out of 20, you want a tool to tell you that. So this tool will do that for you. We'll quickly go to the demo. I'm hoping I'll be able to give you five minutes on the demo. But what you'll see in the demo is that there are three nodes, two workers and a control pen node. Kate's network binary is installed on all of the nodes. And there is a busy box part that I've applied, as well as the host names part, which is my app backed by a service, printed by a service. So let's quickly take a look at this and switch to the demo. So what you can see here is the number of nodes and I have a bunch of containers running there. But if you look at get all, minus or wide, you'll see that the host names app is deployed busy boxes deployed. I have a service that's backing host names. It's a cluster IP service, not load balancer. And there's a traffic which you can ignore. So we'll see two cases here. We'll mainly see east-west communication and how Kate's network helps in that. And the first case we'll see is where the communication stops working due to maybe say a network policy, right? So let's see if things are working fine now. So what I'm going to do is I'm just going to run a HTTP get on the DNS entry for my host names app from my busy box, which is app A to app B. And if you look at it, it works. And it gives you K242K, which is one of the parts running that if you run the same command again, you're going to get a reply from another one. So it seems to be working fine, right? And if you look at it, busy boxes running on worker one. So on worker one, I have Kate as a network installed here. So if I do a wave-run demo, this is where my Kate's network binary is. If I do a host and run it, I'm going to get a bunch of checks. And what it's saying is my cluster looks healthy, everything looks good. And you can run a debug command that gives you more information of what it did. And how did all of these tests? You can make it silent to get a JSON blob in case you want to program it. Now, what we'll do, right? We'll look at a case where we're trying to, but there's a problem in the node. And I'll do this by applying a network policy. That's basically going to deny all traffic, right? And now if I run a pod check, right? Let's see what happens. It's trying to run it. And it'll give you a result that says host checks were good, but then it's not able to connect to the destination pod. PMTU is also not working. Let's look at debug information. So what's happening? It's trying to run a destination pod connectivity check and it failed. It's trying to run PMTU and it failed. So that's the result that you get. Let's remove this happening yesterday. And let's run the same test again. Looks like things are back to good. So this tool kind of helps you figure out what's going on, right? And you can see the debug. It also tries various, it does a binary search to get to the right MTU. And it tells you that there is a, the path MTU is 1480 and things are working. So let's look at an MTU problem next. What I'm going to do is I'm going to do a Q pedal exact minus IT. Let's go into busy box and change the MTU. To something bigger. 1500 is normal, right? So it's the default size. But then we are running an overlay with Calico in IP IP mode. So it is going to need 20 bytes for the encapsulation header. So if I do this, now let's see what happens. An activity check works, but something failed, right? So let's look at what failed PMTU check for destination IP actually failed. Because it's not able to get things from there. And if you look at the PMTU discovery, it got a reflection for 1424 bytes, but it didn't get something for 1500. So there you go. Now let's fix this again and fix it to 1480. And then run this again. The MTU for destination is 1480 and everything looks good. We can do the same thing for external IP as well. So let me say external IP. Let's try 1.1.1.1. More checks added to it. And you can see that the external IP connectivity check also works. I have an issue on my laptop running, I'm running vagrant boxes in the laptop for this demo. So if I try to use external IP 8.8.8.8, that is an issue. And you will see that with the tool. And that's pretty much for the demo. You can run it as a Docker container. You can run it within Kubernetes as an app. All of that information is present in the GitHub repo. With that, I hope you've had a good session and learned something about Kubernetes networking. Looked at various problems that can happen within Kubernetes when you're deploying parts or apps. As well as hopefully KDAS Network will help you to get to that first level debugging and at least finding out where the problem could lie. And then getting to the right people for fixing it. As always, looking for contributors, there's a lot of things that needs to be done. The tool doesn't do any automation for the external to part today. It's not CNI aware today, so we can do a lot more to that. So, yeah, anybody interested, please do look at GitHub reach me on Slack or GitHub. Again, the link is on the first page. Feedback for the session. Welcome. And Q&A.