 Hello everybody. So this is the tutorial about BPF, understanding what happens inside Kubernetes clusters using BPF tools. Thanks for coming. My name is Albon. I'm a co-founder and director at Github Labs. We do consulting and engineering around open source projects related to Linux on Kubernetes. So Kinfolk Labs is a dedicated team. We focus on innovation on our own open source projects and we collaborate with other software companies on abstract projects. Hi, I'm Marga. I also work at Kinfolk with Albon. I'm in the Flatcar container Linux team. Flatcar is a container optimized OS. And it's usually used as the OS when you run Kubernetes clusters. Flatcar is one of the options of the OS is that you can run in your notes. And I'm very excited about joining Albon today in this BPF tutorial because I think BPF tools are really cool. Thanks. And I mentioned before, our tagline is building the 100% open enterprise alternative stack. And Marga mentioned Flatcar container Linux. We don't have locomotive or Kubernetes distribution, but everything we will show today is not related to our product. It can also apply to other Linux distribution on other Kubernetes distribution. All right, so we are going to do a very interactive tutorial and this tutorial is stored in a Git repo. And in order for you to follow along, you should clone this Git repo that we are showing here and install it in your computer or a VM if you don't want to run it in your computer and follow all the instructions that we have there. We published the URL for this repo in advance, so maybe you already did this, but if you didn't do it, don't panic. You can still do it now and you are in time to do it now and follow along with the tutorial. There's a bunch of requirements that we list here, Minikube and the latest version of the architectural gadget and QCTL trace. We will install these requirements right now, like after this slide. But if you don't want to use Minikube and you want to use a different Kubernetes distribution, that's also possible. The tricky thing is that you need to have the kernel headers available. So if you want to like run this in GKE or EKS or whatever, you need to make sure that you have the kernel headers that of the running kernel of your nodes installed. Otherwise, you can't run the BPF tools because BPF tools require access to the kernel headers. We provided this image of Minikube that already has the headers. So if you are just getting ready right now, I strongly recommend that you just go with this Minikube image that already has the headers and hopefully everything will be solved for you. But like maybe you did this in advance and you want to use a different Kubernetes distribution and that's also possible. All right. So I, yes, so the first handsome task is to get ready and install Minikube. Just a second. We'll share my screen. Okay, so the first thing we will do is clone the repo. So git clone. And then HTTPS, I'm going to do this with you. So like if you are doing this right now, you have time to type the URL and clone the repo. All right, we've cloned the repo and we can look at the contents. It has a bunch of directories with all the steps of the things that we will do. So the first thing that we will do is go into the getting started directory. And here we have a bunch of scripts that can help us get started. There's these Ubuntu deps. Actually, it also works for Debian and a bunch of other Debian Ubuntu similar distribution that installs a bunch of dependencies that maybe you already have in your computer, maybe you don't. And then there's this start Minikube and get inspector gadgets. So all these things we will execute first. Let's go with Ubuntu devs to make sure that we have all the dependencies because we are running on an Ubuntu machine. So, okay, so we had everything good. But if you were missing some dependency, this will have installed your dependency. Let's have a look at what we are actually executing. So before I just execute it, but let's let's look at the start Minikube script. So to have a look at what it does. So what it does is it first downloads a couple of Minikube components. As I mentioned, these components already come with the kernel headers, which is the one thing that might be tricky when dealing with BPF. It verifies that they are correct. It then it asks us whether we are sure that we want to do this or not. And this is because if you already have Minikube running in your computer, this could cause your Minikube to stop. So this like lets you know that be careful this could make your Minikubes do up. And then if we say yes, it deletes whatever Minikube you had running and starts a new one. There's this use driver non flag, which is reserved only if you are running this inside a virtual machine. So for example, if you are running this on a Linux virtual machine inside a Windows machine, or if you're running this in the cloud in a virtual machine, given to you by some cloud provider, you might need to use this use driver non flag. Otherwise, if you are running this on your laptop on workstation or any other bare metal machine, you just don't use that flag. And well, you can see here that Minikubes started it started with the ISO file that we were going to download. And that everything is configured so that we can start using our BPF tools. So let's execute this script. Okay, it's downloading and it's now it asking me if I want to actually do this. Yes, I want to actually do this. This will now take a while because it downloads a bunch of components it starts the the Kubernetes cluster so it takes a while so while that is running we will go back to the slides and we will keep talking a little bit about why we are here and what's the problem that we are trying to solve. And yeah, and if you have any issues like just make sure you ask on the chat, read the documentation and see if you can solve them hopefully everything will work out but if it doesn't we will try to help you get there. Okay. Oops. All right. So, as the title of the tutorial says the goal of the tools that we will demonstrate today is to understand what's going on inside our Kubernetes clusters debugging in general is hard. It's an art and it takes a time to master, but as our applications get more complex and in particular when they are distributed applications that are running inside a Kubernetes cluster. It becomes really, really hard to debug what's going on. So we can benefit from tools that exist out there in particular the BPF tracing tools have been developed to help us understand what's going on inside our applications. They were developed to debug local machines. So it's kind of hard to run the BPF tools that exist inside Kubernetes cluster and the tools that we will demonstrate today, do exactly that they help us run BPF tracing expressions and programs inside our Kubernetes clusters in a way that it's much easier than if we were trying to wire all of this ourselves. So we will demonstrate two tools. The first tool is called inspector gadget. The second tool is called QCTL trace. We will see both of these tools in action. They are both very useful. They have different use cases. So we will try to see the different use cases and how we can use these tools to solve our problems. And to help us understand more about what BPF is, Alvin we now explain a little bit more of how all of this works. Alvin, you are muted. Thanks, Marga. So to introduce BPF, BPF was initially created for TCP done. Initially the use case was for socket filtering on sockets. That means when you want to capture your network traffic, it used this BPF program to design which packet to capture. That was initially the first use case. Well, since then it was extended to a lot more use case and it's become called BPF or extended BPF more recently 2013. And the additional use case were in different different categories about networking, so like TCP dump or traffic shipping, network traffic shipping, about security. There is a second comp to filter what kind of system code you can do or about a new security module. There are BPF programs designed to decide what can you do on your system. And how about tracing? So tracing is the topic of today and we will focus only on that in this talk. BPF program for tracing can be in different subcategories. There are BPF programs for trace points where you, it's a hook in the Linux channel where you decide to execute something to add specific trace points that have been introduced in the Linux channel or K-Prop where you can trace any function in the Linux channel on new props and so on. So in a nutshell, it looks like that when you want to write a BPF program, you usually don't write the bad code but write a program in C or at least it looks like C or subset of the C language. And once you have written your code, it can be compiled by C-Long on LLVM and after that you get a health object or BPF by code that is compile BPF. Once you have your compile BPF, you can upload this program into the channel with a BPF system column. And then what will the channel do with that? Well, the first step is to verify that the program is okay, like it's safe to execute. As you might know, executing things in the channel can be dangerous, like if you write bad channel modules and you execute them, there is a risk you can crash the machine. With BPF, it's a bit different. There is a verifier that checks that only safe instruction can be done so it will not access any random memory in the channel or do bad calls there. Once the channel has assessed that it's safe to execute this BPF program, it will be installed somewhere. The BPF program can be attached to a specific subsystem. For example, if it is for capturing network packets with DCP dump, then it will be executed for each packet traveling on a network. Or if it is for testing on a K-Prob, it can be executed for every channel function that is executed. So this BPF program cannot do everything. It will only be able to do a safe instruction, just like it's in a sandbox, and it will be able to interact with the tracer application through BPF maps. BPF maps is kind of a global variable that is shared between the channel on application in user space. And there is such variable of different type, for example, a hash map or array, and so on. Both the BPF program on the application can read and write on those variables, on those BPF maps, and that's how it communicates. And finally, the BPF program cannot execute any channel function, even through its run inside the channel. It only allows access to a specific BPF helper function to only run safe functions that has been designed for that. So using BPF for tracing, there are a lot of different projects that do that. I'll put them in two categories here. First on the left, the next tracing tool. So there are things that are not cluster aware. It runs on a single Linux machine. At least here, BPF Trace, which is a cool project that you can write one liners or a small amount of code and do cool tracing inside using BPF. Another project is PCC. So the BPF compiler collection is a really interesting resource to learn how to do BPF because it has a lot of tracing tools. And maybe on the red, I don't know how many exactly, but you can see the example and see how it's done on the with the source code to see how you can do different things in BPF. So that's a really cool learning resource. It has a lot of tracing tools and it's also a library for BPF. So if you write code in C or C++ or Python, they're binding for Python as well or you are, there are libraries to do that. The next project I mentioned today is TraceLoop. I will talk about that more in detail later, but that's another way of tracing your programs. And I mentioned a few others. So all of them on the left side is a tracing on the single Linux machine, but there is another equivalent for tracing on the Kubernetes level. So there is Kubectl trace that use BPF trace of the Kubernetes level. And there is inspector gadget that we use a bunch of different tools to offer user experience similar to Kubectl on Kubernetes. So you can manipulate things on the cluster level. All right. So going back to our Minikube installation, we see that everything worked correctly and that we now have Minikube running. We can, for example, view Kubectl get pod.a and we see that we have a basic Kubernetes system running in our node. So that's good. And one thing I'm going to do now is because one of the things we did was download this Minikube command that we will use later. So I'm going to copy this Minikube command to the bin folder in my home directory. And I've already set up the path variable to include that bin folder. So now this Minikube command is in the path. So later when we execute it from somewhere else, it will be in the path. Otherwise we would need to type this 00 getting started folder name each time we want to execute it, which is not a big deal, but just to make things easier. All right. So that's installed. And then the next thing we need to make like you need to make sure is you have Kubectl. I already showed that I do have Kubectl. If you don't have Kubectl installed, you need to install it because otherwise you will not be able to interact with your Kubernetes cluster. And then next step is to install Inspector Gadget. To do that we will use this handy get Inspector Gadget script and we are going to look at it before we execute it so that we make sure that it's doing what it's supposed to be doing. And well the first thing it does is it checks Kubectl is installed and then it checks that Inspector Gadget is not installed because if it's already installed there's nothing to do. And then it either uses Crew if Crew is available because that's the recommended way of installing it or it downloads the file and installs it, and it then copies it to the same path where Kubectl is installed. So wherever you have Kubectl installed, you will also get the Gadget plugin installed. And then in the end it verifies that it worked. Let's execute this now and I think because we already tested this it will just say that it's already installed. Yeah, because we tested this before and so it's already installed. But if it was not installed it would be like downloading and installing now. And we can do for example Kubectl Gadget version and see that it's there and then we can do like help. We can see that this Kubectl Gadget plugin has a bunch of sub commands that we can use and these commands are the ones that we will be demonstrating. Well not all of them but a bunch of them we will be demonstrating throughout this tutorial. So that yeah so that we have a little bit of an idea of what all of this is. And the first sub command that we will use is this Deploy command, which the description here says Deploy Inspector Gadget on the worker nodes. And what this does is it deploys a container into the worker nodes that will run the BPF commands for us. So first let's run Kubectl Gadget Deploy five less so that we see what it's doing. And we see that it will create a cluster role binding a demon set with a bunch of stuff and it downloads a container image with Inspector Gadget in it. So yeah, it will do a bunch of stuff. It looks everything looks sensible so we can now apply it. All right, so this created the service account that cluster role binding and the demon set and we can see that it's actually running. And we see now that we have this gadget pod that it's creating and we can wait a few more seconds and it should be done. Yeah, so now it's running and we have the gadget pod running in our cluster. And we can check the logs to see that everything is correct. And we use the label because yeah, why not. Yeah, and we see that it started everything started there's nowhere. So our gadget pod is running correctly in the cluster and we are ready to do interesting things with Inspector Gadget. So, so what can we actually do with that. Let's go back to the slides there. Okay, and so the first gadget that we will look at is called network policy advisor. And this gadget is designed to help us understand the network policies that are necessary for a project when we don't really have a clear picture of what the project does. And for example that you just showing a team that has a big Kubernetes deployment and you like you know that it has a lot of pod services demon sets everything is like communicating with each other. But the project doesn't have any network policies whatsoever, everything is completely open. If there was like anything malicious to break into the system there will be like no gating communication everything would be like just communicating everywhere. That's worse, you're the one that's tasked with coming up with network policies to make this better and you're just new and everything is a mess and you have no idea where to start. So, this is where the network policy advisor can seen. This is not just such a rare story, it's pretty common that network policies and afterthought and so this is why it made sense to have a gadget that would help us understand what's going on inside our clusters and how we can write the right network policies. So, for our hands on example we will use this microservices demo from Google. It's actually a demo of like how you write a microservices application for an e-commerce and so it has a lot of different components like a database a checkout service a shipping service, like a bunch of stuff. We don't really care about all the stuff for like seeing them in practice but what we care about is that it already has a bunch of containers a bunch of food services that are like transmitting data from one side to the other. And it's, it doesn't have network policies. So, it's a great example for seeing the network policy advisor in action. All right, so I think it's time for our next hands on task, which we do actually started. So, we will now move to the previous, the other directory. So the zero one network policy advisor directory. And in this directory we see this Kubernetes manifest YAML file, which it's a copy from this Google example that I was talking about. And, yeah, let's, let's have a quick look at it. I'm just scrolling through it very quickly. You see it has a ton of different things. We can like just see the different kinds of things it has. And we see it has a bunch of deployments and a bunch of services. And all of these deployments and services have a bunch of configurations in them. So it's complicated. And we don't want to spend a lot of time like thinking about it. So what we will do is Oh, I need to split my screen. So first let me start Tmux. And so what we will do is first start the network policy in monitor mode. And to do that, I will do cube CTL gadget network policy monitor. And then I will tell it to monitor on the namespaces demo, because that's where we will deploy this and then store the output in a file that we will call network trace.blog. So this is now monitoring for new connections on this name on this demo namespace. And it's going to store all of that in this network trace log. So now we need to actually create this new connections. And let's speed the screen. There. Okay, so what I'm going to do now is in this other side is first create the demo namespace and now apply this file that we were looking at before in the demo namespace. Okay, so it created all those things that we said would be created. This will take a while to like every for everything to get ready demo. So everything or almost everything is now container creating it will take a while until it's ready. But we can just tail the log. And we can wait to see if we start getting something if we don't get anything it means I made a mistake, but I think I did it correctly. Yes. Okay, so we are getting some entries in the log and this will take a while until everything is ready and until the log makes sense. So, while this is running we can go back to the slides and explain a little bit more about what's going on. So, so this where the commands that we run just to highlight them. And what we what we did was first start the network policy advisor in monitor mode, because we need this to be running early so that it can capture all the connections that happen. So if we started late, there might be some connections that would not have been captured. So that's why we did this thing of like running it first and then running the other command. And yeah, and then, as we said this thing is going to like store all the things in in a log. Yeah, alright, so, but let's look a little bit more as how this is working inside. In this diagram we we see a diagram of how gadget works. And one of the things that we want to highlight is that everything goes through the Kubernetes API so it's not that it's talking directly to the nodes directed to the kernel running in the nodes where the BPF tracing is going to happen, but rather everything is going through the API. So what we did was deploy this gadget pod that is deployed one pod per node. In our case we only have one node, but if you are doing this in a cluster that actually has more than one node you would see one gadget pod per node because the gadget pod is then the one that is in charge of installing the BPF program in the kernel and running the BPF tracing and that one needs to be run in each of the nodes when we want to do some tracing. So what we did with the network monitor, network policy monitor was we asked this gadget pod to create this BPF tracing program that traces TCP connections. And so the program captures whenever there's a new TCP connection and it stores that in one of the maps that Alvin talked about. And then the log created looks somewhat like this, the entries are either accept or connect, they are tracing the TCP accept and connect calls. And they include a lot of information like the name of the pods the ports, the IPs, the labels that were involved. And all of this information is like it's kind of a lot to parse so it's not something that you would want to parse as a human, but rather the network policy gadget comes with a sub command called report that processes this information and generates a basic network policy from it. Of course, this network policy is not intended for you to like just use it blindly. It's a start that we can use to start to make sense of what is going on with the application. And not like just say, okay, this is the network policy to apply and we should apply it on our cluster, of course. All right, let's now go back to our console. And let's see if our pods are running. We can see that they are all running, they've been running for a while. So probably we already have enough data then we can stop our monitor and run, as I was saying, run the report. Right, so what we will do is run QCTL, gadget, network policy report. And then dash dash input, and the file that we created, network trace dot log, and then redirect the output to network policy dot YAML. All right, this created a file. Let's close the other one. And we can then look at this file. It's a long file because there were a lot of network policies to create, but basically we can see that for each of these different services, it says who needs to talk to who in which ports. So we see the different labels, the different ports, either for egress or for egress or for both. And as I said, this, the idea of this file is that you can then use it as a basis. And if you need more, maybe you need to generate additional traffic and not just like bringing up the ports but generate extra traffic that actually makes it capture all the possible situations but even then there might be some situations that you don't and then you need to apply some like thinking on whether these are the right policies. Okay, I hope that was interesting. This demo is now finished. I'm going to delete these bots. And now Alvin can tell you more about other gadgets. Yes, so I will tell you more about the next gadget, which is to a slope. To give a technical summary of to a slope. It's about tracing system calls in C groups using BPF on overwrite and bring the first. That sounds like a very complicated sentence so I will detail over the next slide what it means on what it, what it is for. So the idea with first loop initial idea was to remember the use case using a trace as a developer really like to use trace to develop to debug my applications to see what kind of system can they do and to have a trace of what's happening. However, it's a bit difficult to use trace on communities on one side because stress can be slow and it's not possible to use trace on all the parts on the other process on all the programs running on a community discuss that will slow down too much and that will not work. Another issue is, I would like to, I open the use case where something crashed in production. And then it's too late to tracing to trace it because the program has crashed so the process doesn't exist anymore. So it's not possible to just stress on the program that doesn't run anymore. And that become difficult to debug and un-reproducible crashes when things crash a bit randomly is difficult to use trace. So the idea that comes from that is to use the idea of a flight recorder. It's like a ring buffer that permanently record all the system calls done by other programs by other parts running on your community discuss. But instead of displaying that to the user, it's just a record on the flight recorder system. And whenever something crashed, we can look back into that record, into that log to see what were the problem and try to get hints why it crashed before. So that's what the idea is. Now I can compare stress on trace loop is both of can be used for debugging your applications, but it works in different way. First, the technology used is different. For stress, it use the Linux system called P trace. For trace loop, it doesn't use P trace, but it use BPF on trace points. That make it a bit different that make it faster for stress loop. The granularity is different stress, you either trace one process or several processes together, and you specify which process you want to trace. For trace loop, you specify which C-groups you want to trace. So a C-group can be taking together several processes in one system you need, for example, for one Kubernetes bottom. Another issue is the stress is slow on trace loop can run fast, but the stress is more established program. It's working a synchronous way and you cannot lose any events when you see the lines output from stress, you necessarily see everything. On trace loop, it works in a asynchronous way. It's just recorded in a buffer and it can lose events and sometimes it can fail to read some parameters from your system calls. Okay, so now I go a bit in more details, how does trace loop works. The first thing it does it use BPF program, attach it on the trace point C-center. C-center is the trace point that is executed every time an application does a system call on Emux. And what this BPF program will do, it will first look at the C-group to identify which container it is, which pod on your Kubernetes cluster is running the system call. And depending on that, it will redirect the execution flow on different other BPF programs to store the data about the system call in a different path ring buffer. That will, that means that every pod on your Kubernetes system will have a different ring buffer to store the events about the system calls. And that ring buffer is never actually read unless the user asks for that. So whenever the user want to debug something that happened before, the user can read that ring buffer using Inspector Gadget for example. Okay. So the user interface we look in this way, we use the kubectl gadget with a trace loop gadget. And we have a few sub commands to list the existing trace that exists, and to see a different trace that has been generated or to look at specific posts. Okay, so now I will show the same thing that on my terminal, so I will share my terminal, I hope you can see it. I will first go to this trace loop directory where you can follow along and make the same run the same command as me if you follow with me from this repository. First, I will look a bit which pod exists, so now on the different space, and there are a few in general. Then I will use this kubectl gadget trace loop command, and from there there are several sub commands. I can show them with this. You can list the traces that has been captured. Okay, my default is only look at the different namespace so in a similar way to other kubectl command you can specify this dash A flag on the CLI to say I want to look at all the namespace. That's a lot of trace, maybe if I make it a bit smaller you can see the list, but here you see a lot of trace that come from the demo namespace. That's the demo that Marca was running before with this shopping application. So even through the posts are not running anymore. If I show the list of parts, there is nothing in the demo namespace, we still have this trace that exists for a little while and I see there have been ten minutes ago. I will make it bigger again so you can see something. Let me see, let me show to you things in the kubectl system namespace. You can see there is one trace that has been recorded and I will, I can show that one to see how it looks like. So if I take the trace ID, I can feed that to the kubectl, you can check, trace loop, show command. And this will dump the last few system calls that has been executed from this bottom. Okay, let's see if it works as well for some of the demo parts anymore in this. But I can show things, for example from this, from the namespace, if I take one trace ID from the ad service for example, I can show this. And here in the same way, I see it was running Java process and it run different system calls. Of course that's not all the system calls forever because the amount of memory available to record that is limited. But you can see the last few same calls and if it crashed you can try to understand why it happened. Okay. So, I will go to my next example from this outcome from this. I will split my terminal like this. And I will do this next demo, which is about tracing a pod that crashed. So I will just run a command shell command here. Oops. Okay, I will start a new pod that run just a shell script. And this shows it will execute some multiplication, so save the result into a file, and then attempt to display the result of the multiplication. Yeah, of course this shell doesn't work. There is a bug in a shell suite, and I cannot see what's what the result of this multiplication. If you see, if you want to see the pod, I see it turned on an error and didn't manage to print the result. Moreover, if I decide to delete the pod, or if it get deleted, because it come from replic asset or deployment that we start the pod for example, that it may be that the pod doesn't exist anymore. But even if the pod doesn't exist anymore, I can still see if the trace exists still. So with kubectl gadget trace loop list. I see that about one minute ago, there was this multiplication pod, and there is still a trace available. So let's try to see what was there. So, try to see this. This is the screen bigger. So I see the last system calls run by all the processes in that process. And there was some processes which are cut the shell. And let's see with less. If I can go back to the multiplication, I see this busy program where the multiplication to perform and it tried the output. I will recover what we have the system called on the result that was lost, even to the part was deleted. I can still recover the trace shortly after. Okay. So, let me go back to this slide. And that was the demo for the trace loop gadget. Next gadget. I will talk a bit about the gadgets based on BCC. So in inspector gadget, there are a lot of available gadgets, but a lot of them are not implemented from scratch. They are just picked from BCC and adapted to make them work in communities. For example, once noob execs noob on the lot of them are directly taken from BCC. So it allows you to inspect different aspects of the operating system or network on different things. Okay, so what kind of user interface do we want for those gadgets. When we work on the Kubernetes level, usually we don't really care about the specific PID or the specific node that the application is using. We want to work on the port level. That's the basic deployment. So the basic unit that we care about on Kubernetes. We don't want to specify PIDs, but we want to specify ports. And we want to use Kubernetes native concept like labels on namespace, instead of doing things like SSH and so on. So the interface I want is something based on QTL that doesn't use SSH so developers don't need to SSH on a specific node. And then they can specify what they want to trace using labels on ports. So taking the example of execs noob, it looks like that. You can run the execs noob gadget. You can specify which port do you want to trace and you can use one of those many different flags, either one of them or several of them together. For example, you can say I want only specific labels. So all the ports that are possible will be selected or only on a specific namespace or on a specific pod or specific node and so on. So if you have a pod of several containers inside, you can specify which container do you want to trace as well. So that makes it useful, for example, when you have a deployment. A deployment will generate several pods and it will add this suffix, this randomly generated suffix that you don't know in advance. When you have a deployment that's quite useful to be able to select by labels because you know which level it has that you don't know yet what name it will have. So how does it work? That's a bit difficult to implement because when you say QTL gadget and you want to select on one level, the set of pods that match that is a bit, it can change over time. So maybe there was no pod at the beginning that matched that and then pods are created and then they're destroying and so on. So the set of pods to monitor is dynamic. So in this example we have one pod that is monitored by QTL gadgets and so on. So how does it work behind the scene? So the gadget has this component running in a gadget demon set called the gadget tracer manager. The gadget tracer manager is just a demon that implements our gRPC API, so it implements different methods like mainly for different methods on the gRPC API. And it can be informed on new containers that are created on new tracers. So let me go on. So we need to be informed of the set of containers that are running and to do that, it use a feature called OCI hook, pre-start and post-top. So on OCI containers, we have these hooks that can be executed whenever a container starts or stops. And the inspector gadgets use that to feed the information to the gadget tracer manager so it knows which container exists, which which level and so on. The gadget tracer manager has the needs to know about the different tracers, so which gadget you are running at the moment. So when you do Cubsitial Gadget, it will execute on the node this shell script that will first inform the gadget tracer manager that I want to run this gadget with this level sector for example. Then from this, the gadget tracer manager will know which container he should trace because it has information about the container levels. With that information, it will update BPF maps. So in here, there will be one map, one BPF map for each tracer. And the content of the map will be the list of containers that need to trace. And then when we actually execute the PCC tool, like xxnope in this example, the BPF program, it will first check if it's actually running on the pod it needs to trace or if it should discard the event. So it will look at the map, see if the container, if the currency group is one selected by this map. If not, then it will just return zero without actually do any tracing. And if yes, then it can capture the event. So that's how it works. If you have time at the end of this presentation, we can actually go together to see a bit more details. But that's a picture of it. Okay, so now that you have some explanation about how PCC based gadgets work, we will try to to apply that in practice. So let me go back to the terminal. Now let's go to this next section about snooping versions. Let's see what we can do here. So first what I will do is start the xxnope gadget. And we will see how it works. Okay, I'll start a new screen. And I will execute this new gadget, xxnope. And as I mentioned before, there is many different selectors you can use to select what you want to trace. And then I will want to select only on the default namespace and only the pods with the label run equal cooking. So when I run that, it starts to show in real time the events, the execution of new process in this, in this pod. But we don't have any pod with this criteria. So it doesn't show anything so far. Next, we will start actually a container with this label cooking. For the purpose of this demo, we will use this anti-pattern that just curl a script on executed. So here we won't care too much about the security, it's just for the purpose of the demo where we execute something there. Okay, so if I run this script, it's quite difficult to know what it will do because we cannot see the script before it gets executed. But first it will download the container image and then when it's done, it will start to display something. Okay, here it executes the shell script at the top. When you see install something under the bottom of the script, you can see from the gadget xxnope, the list of commands that have been executed. So that's quite useful. Here I can see that it's executed the rpm command and then the install and delete tiles and so on. So this xxnope gadget allows you to see the different new process that have been executed. And then that's allowed you to debug what's being run. I will show the next example based on the nginx application. So here I have nginx application. If I go to the top, I see I have the content of website. And then the configuration from nginx and then the deployment where it will actually install nginx with a few replicas. If I install that, okay, now I see that I have the pod running. I see that I have three different nginx pod that if I wait a few seconds now they should be running. And then I will try to access it and what I will show is there is some bug in this nginx pod and I will show how to debug this with inspector gadget. Let me go back to the text. So that's what I've done so far. I've deployed the nginx application. And because I run on Minicube, the way to access the service on Minicube is to issue that command. So I will do that. I will use the other screen, sorry. And here I see I can access this to reach the nginx endpoint. Here I can take 404 error. Actually the URL that was supposed to access with this hello.txt. But when I do that, I see it still doesn't work. I still get a 404 error. So the question is why because when I look at the content of the website, I'm supposed to have a hello.txt file to serve on nginx. But when I curl it, it doesn't work. So to be able to debug this, I will use the openSnoop gadget. And I will use this openSnoop gadget and select the nginx application. You don't need to pay attention to the warning at the top because I don't have the right kind of feature, but don't worry it will still work. Here it will display in real time the list of files that are open by nginx while I try to curl this. So when I issue the curl command, I reach one of the nginx pod. And I see that nginx tried to open this file, slash atc slash nginx slash html slash hello.txt. And I see that the open system called return and error minus one. So it means probably the file it doesn't exist or you could open it. With this information, I can check again in nginx whether this file in atc nginx html was really there. So here I have a config map called nginx data. And this nginx data is served as a volume called nginx volume in this pod. And this volume is mounted over slash bar slash www. Unfortunately, that's not the directory where nginx looks for the file, nginx look in this other directory. So I will try to replace this by the correct directory and see if it fix my problem. One way to do this is to do kubectl edit and I can edit a deployment. And I will edit this deployment that is running. So I get the editor where I can look at the current definition of the pods. And here in this definition for the volume, I see it's mounted on the www. I will replace that by slash atc slash nginx html. Now kubectl is the mean that this has been edited. Oh, I see that nginx will start. So I get new pods. Let's see actually if I do get pod. I see the previous deployment is terminating and the new one has been started now. Okay, so let's try to do curl again to see if it works. Now it works this time. I see that the hello file return this high research, as you can see here. At the top, I see that every time I hit, not every time, so that seems to be a demo effect, but it's supposed to be every time I hit the nginx server, it will open this file and display this information. Okay, so that's the end of this demo for the using OpenSnoop to debug nginx deployment. Now I will just delete that so I can go back to the slide now. Yeah, so up until now we've been seeing how to use inspector gadget, which is one of the tools that we said we were going to demonstrate today. And now it's time to look at the other tool which is kubectl trace. So kubectl trace is similar in many ways to inspector gadget in that it's a kubectl plugin. It creates its own pod and it allows us to run BPF traces in our clusters, but it's also different in how it works. So we will first explain a little bit about how it works and about the syntax and everything involved, and then we will see some handsome exercises of kubectl trace in action. So kubectl trace is kind of a wrapper around another tool called BPF trace and BPF trace has its own domain specific language that allows us to write BPF expressions that then run what we want to like do what we want to do. They run our traces without us having to run write a C program. So we don't need to do all these complex things that Alan explained writing the C program, compiling it, installing it in the kernel. We should write this BPF expressions and BPF trace gets all the rest of the work done for us. Of course, these expressions are not simple. So they are a little bit complicated, a little bit scary at the beginning if you haven't looked at this before, but once you understand the general structure, it's a lot easier. So first I will go a little bit into what's the syntax of VPS trace like. So this is what the syntax of VPS trace looks like. This is just one expression but it has the always has this similar pattern of pro filter and action. And the pro represents what it is that we want to trace. There are many different pros available and in this example the pro is a kernel function that is called do nano sleep. So the trace will get activated each time this kernel function is called the filter part is actually optional. So we could have an expression that doesn't have any filter and the filter of course what it does is filter the events according to whatever criteria we give it. In this example we are checking that the process has a PID larger than 100 but we would also check things related to the common name for example. And then the action is what we want to do when the probe gets activated. This can be counting printing creating a histogram calculating the time spent on a lot of different other things. And this is usually where things can get a little bit more complex. And this example here is actually kind of complex with so few characters. What we're doing here is using first the the special variable com which represents the name of the command that is being executed. And then this at sign with the square brackets means that we are creating a hash map and using com as the key in the hash map and the action that we're doing is adding one to the value of the map at that key. So basically when this call finished running, it will have counted how many times each command with a PID larger than 100 called the do nano sleep kernel function. We will be able to see this in a table command per command. And so this is just like the basic syntax that as you can imagine these expressions can get really complex we will see a few more. And we will cover the basics here we will not go into like a lot of details. But if you want to learn more because like you think this is super cool. So here this workshop dot BPF dot SH is a pointer to a great workshop that goes into a lot of details about BPF trace. This diagram shows us some of the BPF probes and how they relate to what's going on in the operating system. In the example that we saw in the previous slide we were seeing this K probe, which is like a kernel function, as I said, and we can see like all of this are different probes that we have. Available, we can trace things in user space using the new probe or you read pro we can trace this calls operations and block devices network packets CPU cycles and a lot more. And so let's look at a couple more examples to have a little bit more idea of what kind of things we can do. So these are a couple more examples and they are taken from the BPF trace read me. In this case we're using trace points trace points are points in the kernel code that current maintainers or kernel module writers have identified as interesting for debugging issues. There are a lot of different trace points. And if you want to know what trace points you have available you can use the BPF tool with which let you figure out what is available in your currently running kernel because depending on the kernel version you might have different trace points available. This trace points are maintained by the kernel developer so they tend to be a little bit more stable than kernel functions if we are using the kernel function that K probe that we saw before. It can happen that the kernel function changes. Well in name, maybe not so often bad the parameters it receives and it returns like that might change. So the trace points are usually more stable because the kernel trainer decided okay this is an interesting debugging point and then it's probably going to stay that way. It's not guaranteed but it's more stable in general. And so in these two examples we are using the seas call trace points. And this sees enter point is what happens when we start when a Cisco starts. So basically this is how you would trace Cisco's in the first one it's very similar to the one we saw before it's creating a hash map, and it's indexing by the command line and it's using this count function instead of the plus plus but the effect is the same so what it's doing is counting how many Cisco's were called per command. The second example is more complex. It's counting the overall amount of Cisco regardless of the command so we are no longer filtering command per command. And then what we're doing here is clearing them up once per second. So every second this map is cleared and restarted and we would see like if we run this in the command line we would see like every second how many Cisco's were called and then the next second and the next second and so on. Like I said these are only like a couple of examples that are a lot more in the github repo of ppf trace you can see a lot of one liners and also more complex files not just one liners and you can use these examples to like get it some idea of what you can do you do not need to understand all of the syntax of ppf trace to start experimenting with it. So that's just a pointer of like more examples that you can look at. Alright, and so that was ppf trace and so how about QCTL trace QCTL trace is works a little bit different than inspector gadget because it doesn't have this handy namespace or label selection that we saw. So we need to tell it on which note to run or on which pod to run, but we cannot say just all the pods that have this label or everything that is on this namespace we can't use those nice selections we need to say either node or pod. And so these two examples that we have here show that we can give it either an expression, which were the expressions that I showed to you before, or a file, and the files basically have the same syntax the same. Expression same language, it's just that when it gets complex. This is no longer a one liner you may have like a three lines one liner. And so it's better to have this in a file where you can have comments. And it makes all of your expressions a little bit more understandable but in the end it's the same it's just a question of like it's complex enough to be stored in a file. This is a diagram of how QCTL trace works and it's extremely similar to the one that we saw before with QCTL gadget because the idea is very, very similar. It goes through the Kubernetes API. It deploys a pod and then the pod is the one that applies the VPF tracing. The difference is that this trace runner pod is deployed on demand when we want to run our traces instead of making like one deployment and then having this gadget like in the gadget case we had one gadget pod that run the traces. In the case of QCTL trace, we have a trace runner pod that is created each time we want to run a trace so we will create several of them. But except for that the general architecture is very similar. All right, it's time for another hands on. And for that, I want to go back to my console. All right, so now we are in the, sorry. Yes, so we are in tracing. And well, the first thing we need to do is get this QCTL trace. So we have this handy get QCTL traces script that is very similar to the QCTL gadget one. It will first check if it's installed. And if it's not installed, it will download it with crew or if crew is not available, it will download the file. So basically the same thing. This is just installing the plugin in your machine if you don't have it installed. So let's run it while it was already installed because I had run this before. But so as I mentioned earlier, one of the things that is tricky when working with BPF is that you need to have the kernel headers available. And if you don't have the kernel headers on the node, you won't be able to run traces. So we said we are running an instance of mini cube that already comes with the headers, which is great, but these headers are compressed using the XZ format. And unfortunately, this latest version of QCTL trace does not include the XZ the compression tool. And so it can't access the headers because it doesn't have the tool. So what we need to do is use a modified version of the container that QCTL trace runs that does have this tool. And so we prepared that already. And if you're following along in the GitHub repo, there's an alias that I'm going to copy and paste. Well, maybe I can do like what Alvin did. This isn't the read me. And if you go to the part that says the alias. So here, this is creating an alias called QCTL trace run that calls QCTL trace run and then passes a container that is modified to include the decompression so that QCTL trace can find the kernel headers. So I'm going to run these alias here. So now whenever I run this QCTL dash trace dash run, it will use the right container. Otherwise, I would need to pass this parameter all the time and it would be very annoying. All right. So we basically have finished the setup and now we can start running our traces. So the first thing I will do is look at which are the nodes that I have. So as I mentioned earlier, we are running mini cube and mini cube runs with only one node. But if you are running in a different cluster, you may have more nodes here. And we will be running our traces and passing mini cube as the name of the node where we want to run the trace. So let's run our first trace QCTL trace run. And then we will pass mini cube as the node name, and then we will write an expression. In our expression, we will write an expression related to syscalls, but a different one than the ones that we saw in the examples. So try spawn syscalls. That's the same. But now, before we start just sysenter, now we will say sysenter underscore star, and this will get expanded at all the different syscalls. And then in the action, we will create a hash map, but instead of per command name, we will say per probe and this will be each of the syscalls. And then we will count how many of them we get executed. Okay, so we have our expression. And when we run this, this tells us that a trace was created. So this is running. And we can deal QCTL trace get to see the traces and we see that this trace is running. And we can attach to this trace to see the contents. So this, this is running in the background and we can leave it running for as long as we want. And then when we want to actually see the contents of it, we can attach to it. So we do that by running QCTL trace, attach, and then this, this ID. Okay, we are attached, but the kind of trace that we wrote, it just prints a table at the end when it finishes. So we are not seeing anything because it will print something at the end when it finishes. To do that, we will press control C and it prints a table. Sometimes this fails and maybe in another of the examples it would fail, but if you see an error that says signal killed, it just means it fails and you need to do to do it again. All right, we have this table and we see that we have a bunch of differences calls and this this calls were executed different amount of times. It's very interesting all the things that are going on in our cluster, even though we don't have actually anything happening. So, I don't know, like looking at this long table of Cisco's I'm interested in right so why is anything writing when we don't have actually any pods running so why are we writing anything. So I will do a second control C, and I will modify my Cisco, my trace I mean to capture the right Cisco, and instead of counting the probes because now it's only one probe I will count the commands. Okay. So this trace is created I wanted to show another thing that I forgot so it doesn't matter. This trace is created I will now attach to it. And again, I will press control C to see the table. Okay, so in this table. That there's a bunch of different commands that are calling the right function and we see cubelet at CD cube API server. All of these commands are writing something somewhere we don't know what it is that yet they are writing but they are writing something somewhere. So, let's have a look at one of them and what it is that they are writing so I will pick the core DNS command. And I will try to figure out what it is that it's writing. So to do that, I will modify my expression. Once again, this time I will add a filter up to now all the expressions didn't have any filters so this time. Let's add a filter that says that the command needs to be core DNS. So in the action, instead of counting we don't want to count anymore because we already saw the counts what we want to see is actually see what it is that the command is writing. So in this case, I will do printf and print. This is the contents of the buffer that the Cisco receives. So this is, I want args buff. So this will print the contents of the buffer that the Cisco receives each time it's called this time we are not generating a table that counts and that needs to be printed at the end. So this time it will be an interactive function because each time the function is called, we will see this. And one thing that I wanted to show was that we can directly attach to it by saying dash dash attached so if you know that you want to stay attached if you don't want to like go do something else while the trace is running. You can just do dash dash attach, and you are immediately attached to it. So we see that the core DNS process is writing four or four note found something is calling it. I don't know what but something is calling it and trying to get slash and core DNS says I don't have any slash. So four or four note found probably this is some kind of liveness check and probably this is like working as expected, but this is the writing that core DNS is doing like every second or so. All right. So that was interesting. I hope. Let's move on to the next example. So in the next example we will actually use one of these files that I mentioned that have all these commands to make it like easier when the files are complicated. So we will look at this bash read line dot BT file, which includes some comments that explains what it is. And while this was taken from the BPF trace repo where all this information is. And here is what the actual code does. So what it does is it's using the you read pro which is one of the pros that runs in user space. And what it's doing is it's binding itself to the read line function and getting the return of the read line function. But to do that it's using this interesting variable called container PID that we see here. In this variable we write it in our scripts and then through the pipeline of QCTL trace it eventually gets replaced for the process ID that is actually running in that container. So it will not reach BPF trace as container PID but it will reach it as an actual number. Yeah, so what this is doing and why it needs to do it is because it needs to have access to the actual bash command, the actual file that contains the symbol of the read line function. Without the symbol of the read line function it cannot attach itself to this function to get the return value. And so that's why it needs this variable here to be able to find this file. All right, and what the action the action that it's taking is that it's printing the timestamp and then printing the return value of the function. Okay, so let's see this in action. So what do we do. We will. Okay, so first, let me split my screen. So the first thing we will do is run a pod where we will have an interactive bash session so QCTL run it. Restart never we only want to run this once. Image Ubuntu. And then we will call this pod test pod because we need to have the name of the pod in order to run traces. And then we will run bash in the pod. Okay, that wasn't it. No. What did I do wrong. I don't know what's wrong. Oh, it's just one dash. Thanks. Okay. All right, so this should download the Ubuntu image and eventually give me a bash prompt. Yes, I have a bash prompt and it's working. Okay. And now we can run this QCTL trace run F because we are passing a file bash read line. And then we will say we were running on this test pod and we want to attach so that we are already in interactive mode. Okay, the trace is created. It's attached and we are not seeing anything because we are not interacting with the bash yet but now whatever we do in this bash. It's printed over there. And it doesn't matter if the command exists or doesn't exist, whatever we type, it will get printed on the other side. And this is simply because what we are doing is attaching ourselves to the read line function. And so this is the output of the read line function. It doesn't matter what bash does with that output afterwards. Okay, so maybe you're wondering why are we like attaching ourselves to the read line function. Like this was not like a hacking your friend spots tutorial, but this is actually very useful if you want to debug. I mean, it's very common for us to try to debug and print stuff. If you edit your program, you add a printf to print something, but for that you need to edit your program, recompile it, and maybe it's like a lot of work to do that. And so instead of doing that, you can attach yourself to a function and print what the function returns and then you get more information about what's going on. So to do that, to see that in a different example of how you would use this for debugging, we'll now stop this. We'll quit this part and stop this one. And we'll move on to the last cube CTL trace example. And which is this Saturday YAML file that we have here. So this file deploys an application called caturday then what it does it's it shows an image of a cat each time you reach the container it's very nice funny application. It's a very simple deployment you just we just have a pod deployment and then the service and that's it. So what we will do is first deploy this application. Oh, something is already running that is using this port. So maybe I should run it on a different port. Okay, now it's running correct. So, we have this application running similar to what showed what Alvin showed earlier about accessing nginx to access this application we will need to run mini cube service caturday. And then in caturday because this is not in the default main space. And now we have a port where we can reach this caturday application. We can curl this URL and port. And we can see that it's working. Cool. And if you're doing this at home on your laptop, you can actually go with your browser and visit this URL and you should see a very nice cute cat picture and what every time you reload you get a different cat picture which makes it even nicer. Alright, so this is working. Now, let's say we are trying to debug a problem related to the counter functionality of this application this application has a counter and we think it might be not working correctly. But what we are going to do is run a trace that will attach itself to the function of the counter the yeah that there's a function called counter value and we will attach to the counter value function. And then we will see whether it's returning the right number or not. Maybe the problem is there maybe the problem is somewhere else, but with this we can see what's going on. I don't have the alias here. I also don't have the alias here. Okay, with all the opening and closing of the terminals, I lost my alias so I'm going to paste the alias again. Alright, so there, there's my alias. And now what I'm going to do is. Okay, so first, first I want to get the odd name of my Saturday application. And now I want to run this trace on this pod. And I want to attach myself dash a is the short for that. And then the, I'm going to use you read probe again like we use in the bash example, and the same thing about proc container paid. And then this time it's not being bashed it's exit because that's what the executable is called this is a go executable, and the function is called main counter value. And what we want to do is similar to the read lane example we want to just print. This time it's a number and red ball. So just a return value of the function. Okay, the trace is created this is running. And now on the other terminal I'm going to do the curl. Which I forgot the URL. So I'm going to just do this again. Okay, so I'm going to curl this URL. And we can see, you can see up here that it printed to because I only did two curls. And each time I hit this page, the counter gets incremented. So if we were debugging this application and we thought there was a problem with the counter we actually see that the counter is working correctly and the problem must be somewhere else. And, alright, so with that we've seen a few examples of Cube CTL trace. As I said, there's a lot more things that you can do. It's very flexible. And yeah, it can be very useful if you're trying to debug stuff that you don't really know where to start and you can like dig deeper and deeper. And now Alvin has a few other interesting things to share with us. Thanks Marga. Yes, so I have a few extra to show. I don't know if you will be able to go through all of them. But if not, you can just go through them on your own by following the documentation on this. So, let's go for it. Let's go to the extra directory. What I will show first is, I will go into, I will start Inspector Gadget Tracing and then try to understand how it works behind the scene. And in order to do that, I will demonstrate the BPF tool and some VCC tools as well. Let's start with this command. Let's first run this exact gadget on the selector all equal extras. At the moment, I don't have anything running with this label. So if I do Cube CTL get part, there is nothing with an extra label. And so to understand what it does, I will go to the node. Minicube SSH. I will be able to. And here I will try to understand what it is that the exact Snoop Gadget do in order to get the pod with this label. So let's see. I will try to find the exact Snoop process. Here it is, I see one path on program executing exact Snoop. And it selects here the role extra label by using this command here. So what it does is specify a BPF map. That contains the list of a container that it should trace. So this, if I look at it, it looks like a regular file, but actually it's not. It's on the BPF file system. BPF file system, it's a virtual file system that shows some maps that are attached to that place. It's not a regular file. If I try to do cat, I see I cannot just print the output. So that's because BPF map in order to interact with them, you need to use this BPF system call. On the command line, you can do that with the BPF tool command. Unfortunately, BPF tool is not installed on Minicube. So what I will do, I will run BPF tool using a Docker container. So I will show, okay, sorry, I'm inside the, I will divide the screen again. So I can show, I can copy paste the command because it's a bit long. Here it is. So what I will do, I will run this Docker run command and the container I will execute is Kintfork BPF tool. That's a container that just package this BPF tool program. And from there, I will run the BPF tool command and display the content of this map. So let me do that now. I will run this command and I display the content of this file, of this BPF map, but I will take the one used by my, sorry, XXNopeGaget. So if I do that, it needs to download the Docker image first. And then it will run the BPF tool, BPF tool container. It will display the content of the map. But as I mentioned before, since I don't have any part with this label, the content of the map should be empty because at an amount, it should not display anything, it has nothing to twist. But let's see how it looks first. So the BPF tool, say from zero elements, it means there are nothing in this map. So there are no parts to twist. And if I run it again, it should do the same thing again. Okay, so now I will start some parts with this whole equal extra on here. Sorry, sorry, sorry, I will take the command from here. So let me start a first container with this, as you can see, there is this whole equal extra level. Okay, I have a shell inside this container with this, with the correct level. And I will start a second container where I still with the same whole extra level, but I will call it differently. Okay, so now I have a different shell. And just for completeness, I will start a third shell at this time without the whole equally extra. So it's really true. Check that this one should not be traced. Okay, so now I have three containers. And two of them should be traced by this capital gadget execs no command. So let me execute some command in some of them. In this one, I execute echo shell one. In this one, I execute shell two. The third one, you can guess it, I execute echo shell three. So now what I see in this, see that execs node only traced the first two commands, but not the third one, because the third one didn't have the correct level. It's still on with this, with this filter here. So now if I look at the content of this map, I see that the map contain two entries, and that should be the two containers that issue trees. So the container for the first shell on the container for the second shell, but not the third one. How do we actually understand this. So what it did here, our filter, once again, what was the filter, it was a filter with this dash dash mount namespace map command. What it does is specify a map that will contain the Monday space ID of the container. So the Monday space is something you can see. When you're in the container in proc self namespace, you have different namespace on one of them is the month namespace. And you can see, in this container, it has this Monday space with this ID. If I look at a different container. It has a different month namespace ID on this one as well. Oops, does get another Monday space ID. So this month namespace ID is, is the content of this key here. So here that's a representation in exactly similar. And here that's our presentation in this event. So I need to convert that to make sense of it. I did a command here to show how to do that. Here it is this print of command. What he does is look at the Monday space, look at the I not number. And because I'm running on a little Indian machine, it means that the least significant beta display first, but the representation here is the reverse. So, yeah, I used to use, I need to use this command tack to reverse the order of the different bytes to display them in the correct order. And if I do this correctly, I have the representation of the Monday space ID in exactly similar in the correct order. And I can do that for three different content. Here it is. So in my map, I should have this one on this one, but not this one. So let's see if I have a 903 F zero a 903 00 F zero. And the other one 16, 004. So in this way, showed that how it works behind the scene. He choose a BCC tool, but the BCC tool doesn't actually know about Kubernetes level. The only thing it will do, it will look at the content of the map to know whether it should capture the events from this container or not. And the content of this map is returned by the inspector gadget to a manager. And that's what I showed in my previous slide with this GPC API before. Okay. I don't know if there is time for more example. No, I think we're, we're basically out of time. Okay, cool. So I will stop here, but yeah, you can, you can basically try this for yourself on continue the grade to understand more how it works behind the scene. And we will stay around in the chat answering questions. So if you have questions, feel free to ask them now.