 Let's get started. Thank you. Thank you for coming Hope you're having a great coupon. I'm Daniel up a vetski. Find me a deal of vetski on Kubernetes slack and github I've been building software that creates and manages Kubernetes clusters for more than a few years now I build it upstream in the cluster API project and as an engineer at day two IQ I Work a lot with cluster API When cluster API is a complex Kubernetes application so today we're going to learn what it takes to debug it and other Kubernetes applications now Kubernetes application and debug are not exact terms When I say Kubernetes application, I mean an application that runs in one or more pods When I say debug, I mean set break points in code and step through It's also called interactive debugging so First let's quickly look at some technical details. Then I'll demonstrate a debug session Then we'll look at the challenges that I faced and the solutions that I found then What work remains and finally I've got a call to action for all of you in here If you've got slides that you've downloaded I uploaded a newer version So you're welcome to download the latest one from the from the site So this presentation came out of my own attempt to step through code running in multiple pods If you've been working in Kubernetes for a few years, you may be wondering Why didn't I just use squash which is an open source tool built to debug Kubernetes applications? That's a good question two reasons One it was designed before ephemeral containers were available To it didn't solve some of the challenges I faced that said squash was a source of knowledge and inspiration You may also be wondering why I didn't use telepresence That's a CNCF project that lets you run an application locally and makes it appear as though it runs in a cluster In my experience that works well if you're developing locally and running in one pod But I want to debug an application that runs in multiple pods and is already deployed on a cluster All right, let's take a closer look at two things cluster API the application I want to debug and ephemeral containers the Kubernetes feature that helped me Raise your hand if you know cluster API if you know what it is Okay, some of you. That's good. Good. Okay So cluster API manages Kubernetes clusters and it's also a Kubernetes application I've been working with it since its start in 2018 and two words describe it complex and powerful It's composed of more than 14 controllers running in more than four pods While debugging cluster API I faced most of the challenges you can expect to face Debugging any Kubernetes application This diagram shows the relationships between some of the some of cluster API's custom resources Don't worry about the small font the details aren't important. What's important is that the relationships are there They mean that multiple cluster API controllers work together for example if you want to add a Machine to a cluster So I wanted to follow along as these controllers did their work and to do that I needed to set breakpoints in multiple pods Then I heard about ephemeral containers So I needed to set breakpoints and pods to do that I needed to run the debugger in the pod and I needed to give the debugger some special privileges and to be honest I had already done that a couple of years ago, but back then I had to build Publish deploy my own cluster API images just include the debugger It felt like a lot of work and I didn't recommend it to others a Femal containers let me do what I need with just a little work and that makes it easier for you to reproduce What I've done. I Don't have time to cover a Femal containers in depth today, but I strongly recommend watching this seeing as believing talk Let's see what we can achieve Raise your hand if you've heard of the demo gods Okay, good. Good. Okay, so the people who have heard of the demo gods will understand why this demo is recorded so let me let me let me perform some magic and and switch to the video and Okay so this is a VS code session and I'm going to debug processes in three pods and The breakpoints that I've set are gonna help us follow along as cluster API creates a new machine And you can reproduce this demo yourself There is a link in the slides and I hope you do reproduce it yourself So Let's start it in The lower left-hand corner. I've set breakpoints in three different controllers And I'm attaching the debugger client to three different pods So now I'm gonna scale the number of worker machines in this cluster up by one and cluster API will go to work to create a new machine first We're gonna get stopped at a breakpoint in a controller that creates some data that is used to run on the machine at first boot and then I continue and then we Stop at another break point in another controller in a second pod and This one is actually what creates the machine itself in this case. It's a docker container and then I continue and Takes a little bit for the machine to get created But then we end up at a third break point in a third controller third pod and this one is responsible for Reconciling some some metadata Associating the new node when the machine joined the cluster There's a new node resource and we want to make sure that it's associated with the cluster API machine resource so see Ha, so I hope that You thought that was cool like that. That is what's possible And if I can only find my mouse pointer So to make all this work, I have to solve a few challenges, so let's go through them Some of the examples you'll see are related to cluster API and its go implementation But the challenges apply to any Kubernetes application, especially compiled ones So this is the starting point. Okay, we've got our debug client local machine source code And then we've got a node and a pod and then our target container. Okay first challenge There's no debugger in the pod The debugger needs to run in the same process namespace as its target. I could just run it in the container If it had the debugger executable, maybe a shell some other utilities, but it doesn't that's good You don't want those in the container image. They're not application dependencies, so they waste space most of the time If they have any CVs, you're gonna get false positives on a CV scan. I think you know the drill So what can we do? We can build an image with the debugger executable a shell Whatever we need we can create an ephemeral container in the pod and use everything in our image Here's the debugger image. I used it's got the delve debugger for the goal language It's based on an alpine Linux image. So it has lots of utilities including a shell and I've included some other utilities for working with executables and debug information After I built my debugger image. I created an ephemeral container in my pod. I Give it the name debugger so I could refer to that when I ran kubectl exact for example The ephemeral container shares the process namespace of my target container Ordinary containers cannot share a process namespace On the other hand ephemeral containers cannot be restarted or replaced So once the last process in the container stops You need to create a new ephemeral container with a new name that can be inconvenient For that reason I use sleep to keep the container running and then use kubectl exact to run more processes in it So this is what the environment looks like now, right? We've got an ephemeral container with our debug All right. Are we done? Well Not quite once I had my ephemeral container. I ran my debugger and it wouldn't attach After reading the documentation looking at the pod events. I realized the debugger container had inherited the pods restrictive security context the debugger needs the Petrace capability and to run as root or with the user ID of the target process The solution seemed straightforward First I defined a security context for my debugger container giving it the sys ptrace capability Second I allowed the debugger to run as root. I Could have also run the debugger with the user ID of the target process But I didn't want to take the extra step of matching the user ID Which I don't know ahead of time Necessarily and it was more convenient to use the root user with the alpine Linux image so On at least one node where I did this the error Didn't go away and I discovered that the yama kernel module Was running on that node was denying the debugger's ptrace system call So because I had privileged access to that node I could reconfigure the module just something to watch out for and Finally Today it isn't that easy to set the security context on your ephemeral container It's not yet supported by the coop cuddle debug CLI command. You have to send an HTTP request to the API server or patch the the coop cuddle CLI To do it in a slightly different way, but it will be easier in the future All right, so this is where we are now. We've got a debugger in our pod We're attaching to the container to the target process Even after you give the debugger the right capability and user ID it may fail to attach if you see this error It means executable does not have debug info which the debugger needs to understand the structure of a compiled program So interpreted languages are not affected Fortunately for me cluster API publishers executables with their debug info. Thank you Other projects remove it for example Debug info is not in the kubernetes API server controller manager and scheduler executables If your target executable does not have debug info you may be able to provide it yourself Sometimes that debug info is published separately if it is use that if it's not create it The debug info must match the executable in the pod So you'll have to build your own executable from the same source code revision Using the same compiler and linker used to build the executable in the pod Then you can extract the debug info to its own file and that's what I've done here with this With this command line utility after you've got that debug info you can copy it into the container so that the debugger can read it and Finally you'll need to add a special link to help the debugger find this debug info Now you may be wondering Containers in a pod don't share a mountain namespace and don't see one another's root file systems How is he able to write to the executable in another container and that's thanks to the proc pseudo file system The proc slash Pid slash root path provides the same view of the file system that the process with that kid has So here process one is the example process The debugger container has the sys p trace capability which allows it access to this special path So I've only tested this with go executables, but the principles are the same for other compiled languages So this is what our environment looks like now Okay, we've got that debug info there. All right Once the debugger found the debug info it attached to the process But now I was debugging in a terminal and I want to debug in VS code It's called remote debugging and it's usually works with a client connected to a server using TCP or UDP The pod didn't expose ports I could use for this connection and ephemeral containers can't expose ports I had a few options But the best by far was the port forward API which allows you to forward TCP From your machine to any port and a pod and it works by encapsulating TCP packets in Speedy streams goes via the Kubernetes API server and the kubelet on that pod So this is what the environment looks like now. I've got a tunnel to My debug container All right, I finally had remote debugging working and I was excited. I started to set break points and none of them worked The break points identified Lines in source code stored at paths on my machine The executable was built on a different machine Which use different paths and so its debug info Had those paths and the debugger had no idea what to do with my source code paths I could provide the debugger with a map to help it match My breakpoint locations with the debug info I Had no idea what the right map looked like because I didn't know the paths in the debug info So I looked them up the read-off utility can list the source paths in the debug info I read through the paths and found a pattern that I was fairly confident in and based on that pattern I made a map and I gave it to the debugger So for example on on my machine if I had something in my home directory example project example go in the debug info that actually would be stored under a path of The module and its version and then the source file so Finally Let's recap Okay One we created the debugger container To gave it the sys p trace capability Three created and uploaded the debug info if we needed to for Created a tunnel for the debugger client to reach the server and Five showed the debugger how to translate the local source code paths to the paths and the debug info So at this point I was happily setting breakpoints and cluster API When I noticed my debug sessions would mysteriously stop Looking at the pot events I found Kublet was stopping the target containers because they weren't responding to liveness probes while I was at a breakpoint When I removed the liveness probes, I noticed that the debug sessions would still Stop but now just after I resumed execution after stopping at a breakpoint so I looked at the pod logs and Found that the controllers were losing leader election while I was at a breakpoint and we're terminating them themselves When they resumed execution Finally cluster API controllers make kubernetes API requests and I noticed those requests often failed as I stepped through the code I found that the requests were being sent to admission webhooks These webhooks ran in the same process as the controller and they weren't responding while I was at a breakpoint Worse the readiness probes eventually failed and the requests were not even reaching the webhooks So these challenges may or may not apply to your kubernetes application, but I wanted to mention them all the same In my case, I disabled leader election remove the liveness and readiness probes and to do that I did have to make a small change to the cluster API deployment And I could have avoided that By using log points But I really wanted to step through the code. So there's still work left to do first to add support to profiles to coop cuddle debug so that users can get defaults for a security context that will work for Attaching a debugger to a process for example And I think there's also work left to be done To make it easier to get the debug info of a containerized application in the Linux distro system package space. There is something called debug info D But I'm not sure that that solution fits well for containerized applications Maybe there's something simple we could do like deciding on a convention where we have an image with the suffix dash debug TBD, so finally a call to action for all of you in here Set break points in your kubernetes applications Learn to do it and feel confident in in being able to do it and teach others If you've got any questions if you want to collaborate on this topic Reach out to me dealer pavesky on kubernetes slack. I'll be at the day to IQ booth tomorrow morning Yeah, I just want to say, you know, thank you all for coming Thank you to my to my wife and son my day to IQ colleagues and a lot of people that maintain The software that I use to Yeah, to get this work done So if you have any questions and there's time left for questions, fortunately There are mics at the on both sides But yeah, otherwise, and if you've got any feedback Scan the scan the QR code and thanks again