 Hello everyone and welcome to this talk about who needs an API server to the back cover its cluster. I'm host about who. Hello everyone and welcome to this talk about who needs an API server to the back cover its cluster. I'm Jose, I'm a senior software engineer of Microsoft. The main question for this talk is, how will you debug when it's clustered if the API server goes down. Well, I will try to ask for that question by running a couple of demos. The first one will be to the back and API server is using BCC and a standard Linux tools. I will show what are the main difficulties of using those tools in a containerized environment. Then it will help me to introduce the local gadget project and run again the same demo but this time using local gadgets to finalize. I would like to share what is the roadmap and our plan for the local gadget project. Okay, let's start with the demo. Here we have a cluster with one single node. It is running container D as runtime. And we have a couple of pots running on it. Everything is running okay. And let me SSH into the node. Okay, I would like to start this demo by showing how we can use BCC gadgets, BCC tools to to the back containers. So you know that if we execute like this, it will capture the event from the host itself and also from all the containers that are running in this host. If we want to filter this output of course we can grab it but it's not that efficient of course because we are collecting a lot of information from the kernel sending it to user space to then grab only a couple of them. It's not the most efficient way. So the BCC tools already provide us some examples on ways to filter the event. It is possible to to run to filter the name for the name of the command or the arguments of the command. But if for example we are running a pod like this and we are executing a different system called in our pod we are not able to use those kind of filters to get all the events happening in our container. What we can use is this option. We have been collaborating with the BCC upstream to introduce this option. The idea is to have an ebpf map that contains the list of the mountain space that represents the containers that we want to get the events for instance that when we deploy the the sex noob program, the ebpf program will every for every event will check if the mountain space of that event is in the list. If it is then it will be forwarded to the user space and show to the user otherwise the event will be discarded. And that's more much more efficient because we are doing all the filtering in the kernel space and just sending the to the user space the information that is actually requested. And let's try to do it ourselves. First of all, we need to consider that the, the, the ebpf map needs to be painted. So the only option that we have to provide is the, is the path where the map will be painted so let's define it like file. It is perfectly this and we run a game xx no but this time we pass the minus M option, it will use that map to consider to discard or keep an event. As you can see, we are not capturing any event. It is because it is because the ebpf map right now is empty. Let's see it. Yeah, it is empty and what we need to do is to put in there the mountain space of our container. So I usually use ps3 to do that. Then I look for my container this our container indeed. Let's copy the pit. Let's keep the pit. Okay, and now we need with using the pit we are able to extract all the namespaces for that bit. This is the information the mountain namespace is the one that we want to put in the ebpf map, but we need to do it in in exo decimal. And this common to transferring to exo decimal. And with the current end in this and now let's update them up. We are adding an entry to our map where the key is the mountain space of our container and the value is not relevant because we are just the ebpf program will look if the mountain space is in the ebpf is a key of the map of the ebpf map. So run it and once we run it immediately we will start seeing the events because now all the events are not are being captured but not discarded because the map the mountain space ID is present in there in the map. Okay, what we were looking for. And for instance, now we can do the same for any other for any other bct tool. Let's try to now to break not fix first break of our API server. We can now we this now down. Actually all the container are still running. Yep, we have all the content running but we are not able to access the API server. And let's try with the. Let me close this one. And let's try it with the locks. The locks of the API server container. And we run it. There is not too much information but let's see again. It is running now it is not running anymore so if it is being restarted maybe we are seeing some logs about resources let's try the out of memory. And tool. And let's see if it is the reason why the the the content is being restarted. For instance, for this kind of situation, our filter the filter I show before is not possible to be used because if the pot is the container is being restarted. It will change every time the mountain space. So, by the time we added the money space in the map, the container will be up ready fell and restarted again. So, that's the problem. The, another problem that we have here is that we are not able to filter by the name of the container. But we are directly using the mountain space which will change about every time that the content is restarted. For instance, we here we captured event. It is indeed being killed because of auto-memory. So, if we check VI we check what is the configuration we can see that the actual is was pretty simple. I just limited the memory that it can use. If we fix it. After fixing it, we don't have any more than the limit here. Okay, it's running again. Actually, it's not running but it's being created. Okay, now let's continue with our presentation. Now it's running. Well, what we have seen. We need to, we saw that we need to manually retrieve the container information, the PID, we use PS3 to get the PID, then we go through the broken directory to get the mountain space, you know, ID. And for this, for the filtering, it was, we were able to filter by container but using the mountain space. So it means that every time the container is restarted, we need to update them up. And it is not possible for the cases like when the poll, the container is restarting and restarting again because the mountain space changing and changing. It's difficult to do it manually. And for instance, it is, if we want to do something like, don't get the socket or a container, you know that each container is running a different network namespace unless it's using specific the host net, for example. If we want to get a socket, we need to switch, move into the network namespace of that socket, a network namespace of that container, and then run our tools. So this is, there is still a lot of things that we have to do manually. We are able, but you need a good background on Linux namespaces to do this. So for this, let me introduce you local gadget. And the idea here is that allows you the local gadget allows you to trace local container using ebpf. It's a single binary statically linked. So it's easy to install. And the information. Another problem we saw before is that we don't have the information Kubernetes we are at the level of Pete. It is, I'm not saying that the BCC on the standard tools are not working correctly. I'm just saying that they, we are using them in a different context for which they were created. We are trying to adapt them to a different environment. So within with local gadget, what we are trying to do is to think in a tool that is aimed to be to work in a local in a containerized environment. Another feature thing that local gadget by that is that this is possible to trace Kubernetes and no coordinates containers and available tools. Some of them come from BCC and some other were developed by our team for some cases that use cases that we found and let me describe better how local gadget is working. We have three main tasks. The main one, of course, is collect the insights that is the tracer are in charge of all these tasks. We need to enrich the data with the Kubernetes metadata and then we will like to avoid making all those steps for filtering manual filtering and of course being able to do it also in the case the container is restarting and restarting. Okay, regarding the tracers. This is a very simple example to run a trace. We are using we are using the the ebpf code from BCC or the BCC tools. We are using them, but we have brought our own constant playing colon. It is because we will like to make it easy to be integrated in cloud not environment, which is where the most of the application are written in go. So we we are right now defining this API for our tracers. So you have to define for example for the exact tracer you have to define a config where mainly you have an ebpf map you can already start thinking that it will be the map we are filtering for filtering that we are creating before manually we have an enricher which is an interface that can be we will use to call every time we have an event that we want to enrich it with the Kubernetes metadata and of course the event callback. This is a pretty simple example it is not having a map defining a map or an enricher because we want to make it very simple to show how a tracer can work and it can work independently without the richer or the filtering feature here is an example of the output. We have the command and the pit, but of course we have all this information and we just just printing those two. Regarding the container collection, the main tool task for the container collection is to enrich the event and also to notify us about the creation and deletion of the containers. Everything starts here and we have the RunC Notify module that will notify us every time a container is created and deleted. It's a very important thing to specify that this is a synchronous call that is allowing us to get this information when the container is being created but it's not yet running. So at this point this is very important because we will be able to prepare everything to start tracing the events right before the containers are running so we will not lose any event of doing the start up of the container. So every time a container is created we get the ID and the pit and we use a set of enrichers that we have to get more information about this container. The idea is to keep a list of containers with all this information. We are taking the coordinates metadata from the container runtime and we are of course getting all the Linux namespaces that we will use to then map with the event. For example, at the moment the tracer have a new event and it wants to enrich that event with the coordinates metadata, it will call the Enrich API and we pass in also the mountain namespace. With the mountain namespace we will be able to match what is the container related to that event and then we can enrich the event with the coordinates metadata. This model as I said will also provide a notifications about the container added and removed using the container structure that we have created here. So also this model as the tracer can be used in an independent way if you just need to have this kind of notification with all this information. The enrichers are optional, you can add all of them as just a couple of days we have some others. For instance, our last model, the trace collection, it is using the notification about the container lifecycle because when user ask for example like this to run the exact system calls and pass in this kind of filter, they want to filter by the container name that is called MyCount. So we will ask the trace collection to add a tracer for such a filter. If by the time we ask this, the container is not yet created, then the eBPF map will be created empty, but when we receive the notification about a new container has been created, we check if the mountain space of that container is a match, matches the filter that we will ask if it is then we are will add the mountain space in the eBPF map. This is exactly what we were doing manually before, but we are adding the automation that receiving the notification about the container when it is created and updating immediately the map. In this way, we are not given that these events are synchronous and the added notification will arrive before the container start running. We will be able to prepare our eBPF map before we receive the first event. So when we really receive the first event, we will have everything in place to not discard those very first events and provide them to the user. Okay, this is the internal, those were the internal modules. If you want to know more about them, we have written a couple of blogs and we have more example for each of them where that you can run easily. I just want to mention what are the use cases we have thought for local gadget. The first one of course is debugging using without the API server where we have seen that it is difficult and some things are not yet possible that is using the PCC tools. That was the first thing that I thought when in a situation like this where the API server is down. Another use case we have is that if you are implementing a tool that needs to get inside from the node, we can provide you two options. One will be to include the local gadget binary in your container image and then run it or you can use our modules and to just get maybe just a notification about the containers or you can also, you can also just run the tracers. Again, I remember we are collaborating with the VCC report to keep those eBPF code updated and with the new features and we have only the right to the control plane to be easily introduced integrated with your code here. And the last use case will be observing and debugging containers also outside Kubernetes environment. So we are also able to trace all those containers that were not created via Kubernetes. Okay, let's run the second demo. Here this time we let me show you local gadget. We have those options the most. The first one I would like to show is the list of containers. We are able to show the list of containers and also with the Kubernetes metadata. So we can, you can see that we have support for local container the talk here and it also tried to communicate with trial but this is not available in this system. So it fails but you can run it only for the runtimes that you need. And let's try to do the same with it before. So we run a local gadget exec and we want to, we want to filter by the port we execute before. So sorry, the port we created before. And this time we don't need to manually get the mountain space, the P then the mountain space we can directly run. The name of our container. And to the bucket. So the same here the same like this we can also reach this information these events with the Kubernetes metadata. And you have here the namespace the question space the name of the port and the name of the container. And now let me again prime the same issue we have before with the with the with the API server break API server. And again, we are still are able to get everything even to the API server now is down. We still are able to continue debugging and let's try to capture again. Also, the program of out of memory place out of memory kill. And this time, so we are able to filter like this. Q API server. Sorry, gadget. Local. And gadget. Okay. Here we can just wait to get the same event on as I mentioned this event can also be enriched with the coordinates metadata. Okay. I think that's all for local gadget I just would like to add that you can also snapshot socket. For instance, container name my pod. To get this information manually getting this information with SS or net stat, you will have to enter in the namespace of the network namespace of the container to get this information because each name it container is running a different network namespace or something that is also local gadget doing for us, avoiding that we are need to do all this extra work that I mentioned before. Okay, I think that's all for the demo for local gadget. I just want to add it. One more thing. It is that the the feature that we are able to capture the events at the very beginning of the event for instance if we are running a docker container like this that is immediately failing and because the nice command is using a capability that is not in the in the default capabilities of the docker. So in this case we are able to trace pseudo capabilities. And again here. Here we are able to filter by Today is not. It's a day of the typo and we are able to filter here by my container which is the name. We don't need to wait to use the mountain space and because it will be changed every time so we don't care about the mountain space because it will be changing every time that the that the default the container fails. If I run again, the container, we can see that we are capturing also the cups that are used by right and see during the start up. And we can see that nice here is the capability that is being denied that this is the reason why it is failing. Okay, I think that's all. We just finalize here just some notes about local gadgets about the demo. We were able to the back when it's container even if the API server was down, we are reached that information with the coordinates metadata. There was no manual step. And we don't lose the event. We don't load any event at this container start up. What is the future for local gadget. In the roadmap we have now that we have all the information of the containers of the of Kubernetes we would like to be able to also filter by the container source for example filter for all the containers in the coordinates namespace. For example, in the ebps map in that case will be a lease of mountain spaces or for the port the same the container name remember it could be different. The Kubernetes container name from the one given by the runtime. We would like to support non-covernous containers created by all the runtime we are not supporting only Docker. We would like to add more and more and more gadgets. We will start with the ones already available in inspector gadget, but if you want. If you have any use case that we want to share, we are open to listen and we would like to get people involved in this project. Thank you very much.