 So this is Jorge Salerno, Jorge Salerno, he likes monitoring things, including the cloud, including stacks of Raspberry Pi's with sensors, and he is going to tell us how to troubleshoot Kubernetes. Okay. So welcome, Jorge. Thank you. Okay, so today has been an intensive day for me, this is my third talk. Has anyone seen one of the previous ones? Can you raise hands? Whoa, cool. So, you know, Cezdik, I've shown this slide before. If you're here, it's probably because you are running Kubernetes or you're applying to do so. And until we got things like containers, Kubernetes, when we had to troubleshoot and find out what was wrong, what was broken on our servers, we used tools like that. But then when containers came around, things stopped working. Basically those tools, they were not aware of namespaces, of decrypts. We had to tweak them. They were not working as good as they were working before. And actually, this is a real problem. I've included here an issue on Kubernetes GitHub of project and someone trying to troubleshoot and the answer of the developers is, well, you know, like try to run bash, do some prints, see if somehow you can get visibility of what's going on inside of the containers. We love containers for multiple reasons. Probably four ones to mention is that they are simple. They are small, most of the times, not always. They are isolated from security point of view. That's nice. They have less dependencies. We ship everything. We don't have to deal with messing things around. But they are black boxes. We cannot see what's happening inside as we used to do before. Or we have to break these things and install the troubleshooting tools inside, which is not the desired or the best option. That's what we are trying to fix with SysDec. SysDec, it's like a combination of all those tools you were using before to troubleshoot your servers, HSTOP, BMSTOD and HSTOD, all of them, all together in one single tool, available as open source. We also have a commercial product, but that's not for today. And what we do differently is two things. It's two-fold. On one side we are able to keep complete visibility of everything that's happening inside of the containers. We have managed to do that instrument in the kernel. We installed a kernel module that basically does capture all the system calls. We copy them into ring buffer and then we let user space process to pick the data from the buffer. And suddenly we have everything to user space where we are more comfortable mumbling things around grouping and segmenting and everything. So yeah, we installed the kernel module, the agent, these kernel module captures everything. We moved that into the user space. It can either be a demo or we can even run inside of a privilege container, because we need to do an instance mode to install that kernel module. That kernel module builds dynamically using DKMS if we don't have binary. This is included in the distribution that you are probably using. If you are using Naples, we can still open the capture files, but we cannot capture Naples. Yeah, we dump everything into a file. And this approach basically has some benefits, has some advantages. We discover, since that process, the Sysdig can see everything happening in your house. We can automatically discover every container you run. So there is no need to instrument to configure that, oh, I have a scale up replica or something and then there is a new container or there are less containers and I need to reconfigure things around now, forget about that. No instrumentation, so there is no need to install an agent in every container or funky things like that and full visibility. The con or the thing that you have to pay the trade off is that you need to install a kernel module. If you haven't been into my previous talks, I have mentioned this. The reason I'm going to answer this because probably otherwise will be a question that you will be making me at the end is when we started Sysdig, EBPF didn't exist or didn't exist as it is today. So we have no chance to use that. It has been evolving and for a period of time it has some limitations. So for example we have to copy everything on the buffers and EBPF has some limitation on the amount of data you could copy out from system calls. There are some limitations. Those they have been solved so maybe in the future we change. Then the other problem with Kubernetes or with containers and microservices is that we don't have single hosts and one, two, three a number of services running inside now. We have a scheduler and it's moving things around all the time. So when we need to filter, we need to organize things, we would like to get the view as close as possible where we can say, so I have my Cassandra service or I have my Redis and I want to filter things by that. Well, the other very cool thing we do at Sysdig is we talk to your container orchestrator platform. So either if it's just plain Docker or Kubernetes, OpenShift, DCOS, Mesos and even a bunch of other platforms we can talk to them, understand how you are deploying our containers, how you schedule them, how they are related to each other and use that information to filter. Because at the end of the day what Sysdig can do is capturing all these events that they are the system calls with some context. So we can understand better what they are doing, filter them and run scripts to change, generate reports and find out what these things are doing. We can do this live or we can dump things into a file which is similar to TCP DOM pickup files. We have container support, I have mentioned this before and we have common line interface but also we have our courses interface, something similar to H-TOP. And this is basically everything I have for slides. Now we are moving into the dangerous part of this presentation, which is the demo. I have a very cool use case of how you can or of a story that happened to us when running Kubernetes. I wanted to fully show it live but there was some last minute problem with the network because it depends on something external. But no worries because I have a system capture and I'm going to be able to show you exactly the same thing. But first of all we are brave and we are going to show you something live. So let me move this terminal background, you can forget about it. It notes so I don't forget the things. So basically I have here a Kubernetes instance. It's running different containers. I installed everything just pulling the different Docker containers. And first of all I want to show you how we can leverage, how we can use Cystic to understand how Kubernetes works. And I'm going to illustrate this with a very simple example of a service. So I'm going to, first thing I'm going to do, I'm going to create name space which is a critical app. Very, very critical. That's why I'm running it on Kubernetes. Then inside of this name space I have a service with a deployment. And I'll be deploying three replicas of an NGINX container. Simple as that. So let me deploy that. And let's see. Okay, it was catch. So that's running already. Cool. So one of the things it's very interesting or that Kubernetes does for us is managing the load balancer. You probably, I'm not going to discover anything new for you. So we deploy a service. Kubernetes plays load balancer in front and distributes the traffic between the number of containers we have there. There are multiple or different strategies. In this case I'm just using load balancer on one fixed IP address. We can do this with DNS from Robin, but in this case we keep it simple. So we know that our backend service gets that IP address.241. And I got these three different endpoints, my NGINX paths. What I'm going to do now is to launch one client just with curl. And I leave that on the background. And I'll open a new shell so we can follow up. Perfect. So this is my client. So now if I do curl backend, this works. So basically it returns the default home page for NGINX. Cool. Now that's a surprise. But do we really know what's happening behind the scenes here? We can imagine if we read the documentation. But the first step to get familiar and to use properly a troubleshooting tool is to understand how the infrastructure really works. So what I'm going to do is to leverage Sestec for that, I told you already. So one of the things, very interesting things of Sestec is that we can use on Kubernetes entities or objects to filter out things. I'm showing here the different labels we understand from Kubernetes. If you are using custom ones, we can also filter them. But these are the defaults, so pods, namespaces, like the usual stuff. What I want to do now is show you the course interface. So I'll be using Sestec, actually I wanted to show you this. Basically what we do here is to tell Sestec to connect to Kubernetes API. So we know how things work inside. So these are all the processes running in my machine. This could be htop, well, it's not. Because if we see here, this is probably one of the most interesting sections, they are different views that we have prepared for you. I showed some of them before, you know, talks like using tracers or using this for security. But if you have a look at it here, we have prepared views for different orchestration tools. In this case, we will focus on Kubernetes. So if I click here, I'm going to be able to see the different namespaces I got. So I have my critical app and also CubeSystem, which is the internal namespace. Well, let me stop here. I did close this. I could get this with using CubeCtl. What it's interesting here is that I'm able to show you some metrics, CPU, memory, file, network, aggregated by that information. So I could see what was the memory usage for the namespace or the network usage. In the same way I do these four namespaces, I can do four deployments with my backend deployment or four services. And what's interesting is that this is like a tree. There is a hierarchy here. So if I go into my backend and I hit enter, automatically I go to the level underneath and I see all the pods that they are running inside of that service. All the time I'm showing the metrics. And if you pay attention here on filter, I can see the filter that CSS Digi is applying. So this very same filter can be applied on the command line. So one of the things I want to show now is, as I mentioned before, I want to understand what really happens behind the scenes when I do curl backend and how all that connection is handled, all the load balancing and everything. So what I'm going to do is to open CSS Digi with a few filters. In this case I'm using the command line interface. So minus K for the Kubernetes API. Then I'm using event type open to see open CSS calls and files. And then I want to see every file that it's in slash etc. And also I want to see everything that's in the client path. So if I go back here and I do curl backend, automatically I see all the files that they were open in my client container under slash etc. Remember, I'm able to see this from outside. CSS Digi is running on my host in this case. Okay, that's cool. I can see the different open parameters and everything. But I'm curious and I would like to know what it's actually that I'm reading for those files. And in this case, what I can do is to leverage on CSS Digi. CSS Digi are some lower scripts we got that we feed all the system calls, all the events and all the related information. And with those lower scripts, we can reformat the information, aggregate it, manipulate it, and for example, generate a report or format the output. So I'm going to do see ecom fds. This is all right, come on, let me do a little check. Now I was doing spy file. It would be the same thing. Okay, so if I run this command again, nothing happens. Probably I'm doing something wrong. I will be using echo. I don't know why I have all my notes. What's wrong? This is the first step of the demo. There we go. Now this is what I showed you before. Well, otherwise I will move into the next command I wanted to show you. I'm not messing this with this. The other thing I wanted to show you, spy file, that should be working. If I use this other chisel and I was working. I can see exactly what I was reading. It's very similar to the echo fds. So when I did open NSWitch, I did curl. So curl is try to resolve that name. So open NSWitch. I read that file. Then I open host.conf. Then I open resolve.conf. Then I read hosts. And with all that information, I didn't read anything else under slash etc. So this is a nice way to see where my application is actually reading from those files, because they could be changing automatically of things like that. Another use case of chisels. Another example I wanted to show you is spy users. Spy users is a chisel that will print everything I execute. So simple as that. Come here. Execute again. And I can see how the rock user executed.com. So these are the kind of things that are going to be very convenient to troubleshoot your Kubernetes when you have an issue. That's an example before we move to someone else. I'm using HTTP lock. So guess what? I'm going to decode system calls right into sockets. We'll decode the HTTP protocol. And we'll show me here the request, including HTTP method or return a response code, latency, things like that. So this is very interesting, but still I would like to know everything that's happening. So, and I mentioned before, I'm not copying everything. And I mentioned before, says they can see everything. So, okay, let's see everything. I'm going to explain you here this command. So again, I'm connecting to Kubernetes API. I'm using PK that's printing from which container is that event coming from. And then I'm applying some filters and I can already foresee this is going to fail. But, and then I'm applying some filters. My notes, they were not as good as I thought. Oh, that should be okay. I'm filtering all the traffic, that's network traffic with FD type. So file descriptors that they are actually IPv4 or IPv sockets. And then I'm using some Kubernetes filters saying, okay, everything on the namespace critical app or that it's sky DNS, which is the resolver I'm using. So if I run my command, I can see everything that works execute. So let me show you from the beginning. This is a bit complicated, but we'll go through it. So I'm just going to highlight the most important parts. Okay. So we get the socket. First thing we know we see that this is going to port 53. So it's the DNS resolution. And the DNS resolution goes into my DNS server. We can see the IP there. But then, and we see that it's coming from curl. Then we see that the next message is actually coming from sky DNS. We read from the socket, but we open a new connection to this guy on port 4001. So we know that that guy is ETCD. And what we do there is ETCD press request with this URL trying to find out what's the IP address associated with that domain name. We did that. We get the reply over here. I'm not going to get into the details. Let me find, there we go. You see there, JSON reply with the IP address there. Then that goes back to sky DNS. Sky DNS sends a DNS response. And then we see curl here that it got the IP address that we need to connect to. One very interesting thing that actually I don't want to miss is what is this 241? So if we go back to my Kubernetes, kubectl, not this one here, kubectl inspect backend. We see that 241 is the IP address of the load balancer. So what ETC replied was the IP address of the load balancer. Let's go back here. We see curl connecting to that IP address. Let's find it out. There we go. Or should we even... So it's connecting to this API address. 241 port 80. But then suddenly when we keep reading and we see the NGINX port on this line, we already see that this reply it's not coming from the same AP address. So this is different. 172, 1705, which is the IP address of the pod. This is tricky. This is thick. So we see the reply back. Or we see how NGINX gets the read. They get request. We do the write back, which arrives to curl and curl print. Okay. So we see how the system goes. Yes, we got a question. We don't have a way to export to... Yes, the question was if there is any way to export to pick up. We don't have a way now because probably if not all most of the filters that you can be using on TCP DOM can be used here already. Okay. So we did this. There was... And we saw the system calls, like all the sockets and everything, writes, reads, files being read to understand the DNS resolution. But we found out that there were some changes on the IP address. And actually, if we do IP tables, we'll see how Kubernetes has one... I am in the right place. This is a keyer. We have one place, Cube Services, where all the traffic that goes into the backend service because the screen is too small, cannot be seen properly. But we will... If we read this carefully, we will see how Kubernetes creates a chain on IP tables that all the traffic that goes into the IP address is split in three different chains with some probability deterministic module to load balance between the different containers, different pods, and the traffic goes to a different IP address. Okay. That's more or less how it works. Since it's very useful for this. And if we have some more time, I got something else prepared, which is one issue I experimented. And we have 45 minutes, right? So we have 10 minutes more. Okay, perfect. So this is life. This is very helpful to understand what Kubernetes is doing behind the scenes. But actually, the title of this talk was how we can troubleshoot Kubernetes. And as I said before, I wanted to have do this live, but there are some network external dependencies. So instead, I'm bringing a capture file of an issue that happened to us. And we were writing exactly the same scenario, but curl instead of returning immediately, it was taking close to 10 seconds, waiting there, and then it was working. So what is the worst thing that can happen to us as operators of an infrastructure? That things fails? No, that's great, because we can fix it. The worst thing that can happen is that things work very slowly, and we don't know why. So I took a capture of that scenario, and I'm going to show you how we found the problem using sysdake. So I'm moving back. Hopefully this is big enough. What I'm going to do here is, again, open, I think you're going to hate syscalls, I've already talked. Open a capture file, and again, into the IPv4 and IPvc sockets and filter by, again, my critical app on the SkyDNS. It's very interesting here, and as you see this, can you still see this? This is very slow. It's very interesting here. I'm going to go quickly. It's similar to what we saw before. We are going to see Curl trying to resolve a house name. We'll go to ETCD. ETCD will try, sorry, we'll go to SkyDNS. We'll try to resolve through ETCD and if the entry is not found in ETCD, it's going to try with different search domains and at the end we'll see that something unexpected happens. So actually let me go through directly to the points I want to show you. So we can see that first my request, or the request that goes through ETCD, it's very, very long. Oh, sorry, I forgot since the beginning. Now what we are doing differently is instead of using Curl backend, I'm using a fully qualified domain name because it works, or at least according to the documentation, if I use Curl or any DNS which my service, as we see then my name is space.cluster.local, Kubernetes offer that DNS entry by default. So by some reason, I see on the syscalls and all this tracing that we are crafting a very funny search or we are searching for a very funny domain name. So it's probably search doing funky stuff. We'll see here. We do this request that obviously it's not going to be found. We'll do the request again removing some parts of it. So the search domain, it's removing different parts. Again, not found. It does it one more time with less, it's not found. So we might think that we are losing time. So the original problem was this was taken to execute around 8 to 10 seconds. But if we look at the timestamps from the beginning, this was 41 minutes 39 seconds and I'm here. I'm still 41 minutes 39 seconds. So what's the problem here? Why this is not working? If we see already here, let me find... If we see already here, we start to see DNS request that they are adding or they are appending at the end something we are not expecting. Local domain. So let me keep searching because I want to find and actually I want to find the last request we do here. You see, I'm trying to do the local domain. I'm already at the end. And this is not working. So that's crazy. So what happened? What is happening here is that let's start to look at the timestamp on the left. So we go faster. 41, 47, 41, 43. So what happened in between... Oh, I lost it. So what's happening in here? Look, what we can see here. So what's happening is that the SkyDNS is going to ETCD is trying to get... Sorry, the DNS request is going to SkyDNS. SkyDNS identifies this request with a local domain appended at the end. And SkyDNS says, whoa, this is a domain. I don't have any Kubernetes. So this must be some external domain. So let's send this out to my upstream DNS server. And because my upstream DNS server it's not answering. It's answering very slowly or it's not even available. We wait for a full DNS timeout, almost four seconds. So this will happen one time, will happen two times and will happen three times until libc decides that it's enough trying to remove pieces of my full search domain and we will just try to resolve the fully qualified domain, vanilla, as I give it to my curl and then obviously it will work. So this is what happens at the end. So this is a very good way to when we have different services talking to each other how we can get visibility of things one side the other side. Cezdik and the filters are very convenient and you will say well but this was everything network. I could have done this with TCP-DAM and with more patients using my filters. I said okay yes but you want to be completely sure that what was happening on your client. So as I show you before Cezdik does allow not only to filter on your network traffic but on any file descriptor. So if I'm using the echo FDS as I showed you before and now I'm going to repeat it because now it will make more sense to you if I don't have the same problem I had before. So I can see that there was something appended at the end which was local domain that at the end we found that because it was being executed in a developer machine and the developer had that local domain added by probably some network manager crap or who knows and that was going or was being transmitted into Kubernetes and Kubernetes was creating the Docker containers with that search domain appended to it and how you can be sure that this is happening well again we know that we can use Cezdik to understand how Kubernetes is instructing Docker to create the containers. So I'm not going to use local domain here because this is not live on my example but I'm going to use just search. So if I run similar command to the one before but this time I'm filtering on unique sockets and if I try to launch a new container I'll see how Kubernetes is using the Docker API on a unique socket to tell Docker how you need to create this container according to the pod description. So I'm going to run this and now I'm going to create a different container different name there we go a foobar container and something is failing here again have a buffer, contain search well you will have to believe me well probably there is something wrong on my filtering I want to keep for five minutes for questions but this is the idea if or when you leave this talk now in five minutes what I would like you to remember or to keep for how much that you can use Cezdik to troubleshoot anywhere you can hook anywhere on your hosts if you compare this to other EBPF base tools this is way more simple to use you don't need to know exactly what kind of system call you want to hook you don't need to code any script just using something similar to what could be TCP dump both on asteroids and system call aware you can very quickly see a lot of things obviously it's not as complete it has different use cases but you can see network you can see file descriptors everything that has Cezdik attached and this is everything I have for today I'm living here three links this very same talk it's fully explained in the last link so if you're very tired and you want to look at the Cezdik Cezdik calls more carefully to be here in your hand more relaxed go through it you have the same capture so you can play with filters yourself and everything totally recommended I hope you like it we have five minutes for questions now the question was how can I run this on any system like DCOS well in this case it was a command line tool that it was installed through a package but Cezdik can be installed just pulling the container from Docker Hub any place where you can run a Docker container you can run Cezdik the requirement is that needs to be a privileged container because we need to pin the kernel module insert it and we need to execute in small so that happens the question was when I have a multiple machine scenario how can I correlate the event so the system calls between all of them at the moment there is no tool to merge the different capture files yes we have the commercial tool that does something similar to run on multiple hosts shouldn't be very difficult in the same way we have tools to merge two P couple of files together you could write a very simple tool to merge this and then open the only thing to take into account is that these files they tend to grow very very quickly remember that they store all the system calls with all the IO buffers everything being executed on your machine so just a few seconds on a production server can be a few gigabytes so you need to be careful with filters to keep only the information that you are interested in well if you have more questions send me a tweet message or anything I'll be happy to answer thank you