 Hello everybody, thanks for coming. This is about Linux Tracing with BPF, PCC, and more. So my name is Albon. I'm a co-founder and director at InfoClubs, where we do consulting and engineering around open source projects, related to Linux and Kubernetes. We have a team dedicated to innovation on our own open source project, and we do consulting with other software companies on our same project as well. Hello, so I'm Mauricio. I work also as a software engineer together with Albon. And this is my social network data, just in case you want to reach out. Hi, I'm Mauricio. In fact, we have products on Linux distribution and Kubernetes distribution, but everything we will say today is not related to that. So you can try that on any Linux distribution or Kubernetes distribution. Okay, so this is the outline of the presentation we will give you today. We present you an introduction to Linux Tracing, then we will present you an introduction to BPF and PCC as well. We will also give you a quick demo about some VCC tools and we'll show you also how to customize a VCC tool. Finally, Albon will give you an introduction about tracing in the cloud. We'll show you what are the different tools available for that and we'll present you a quick demo about Inspector Gadget. So I think we are ready to go. So let's get started with the introduction to Linux Tracing. So before going into the details of the tracing in Linux, I want to give you a quick introduction about what tracing is. So tracing is a mechanism that allows to get information about the execution of a program. And we could say that there are two different use cases for tracing. One of them is for debugging and the other one is for troubleshooting. Debugging is the use case that usually programmers use. For instance, if you are developing an application and there is something that is not working with that application, you can use tracing to know what is going on. For instance, you can get some information about what are the functions that are being called, what are the arguments of those functions, and so on. On the other hand, an administrator can use tracing for troubleshooting. So in this case, there is something that is not going good with an application. For instance, there is a performance problem or something similar. The administrator could enable tracing in that application and get some details about what is going on again. The performance in tracing is very important because we want this to be as low, the overhead to be as low as possible. So it should be able to disable or enable tracing at compile or runtime. Okay, so let's go in some details about Linux tracing in particular. So we could say that for Linux tracing we can divide it into three different layers. So the first layer are the data sources. So those are the components that provide information about what is going on. Those are the components that are connected to the applications or to the kernel and get the information. And this can be divided into different kinds. So we have the probes and we have the trace points. So the probes are mechanisms are allowed to change the assembly code of an application at runtime. So basically there is an application that is running and these mechanisms change the code to introduce tracing at runtime. So there are two kinds of probes. One key probes are for kernel and user probes are for user space applications. The other kind of data sources that we have are the trace points. These are defined statically by the programmer. So the programmer says, okay, there is something important happening at this point of my application. So I want to infer the word that something is happening in this point. So there are two kinds of those trace points. One for the kernel and the other for the user space application. In this case there are called user statically defined traces. Okay, so the second layer is the layer that takes the information from the data sources and make this information available to the third layer. There are different options here here. Actually, there are many different options here, but I don't want to go into the details of those options. So those use different technologies, some of them are integrated into the Linux kernel, some of them are external modules, and so on. What I only want to be clear here is that there are different options that could be choose according to the needs of the user. Usually the data structure and the data sources layers run in kernel space, and we have the third layer that is the front ends that run the user space. So the front ends is the layer that presents the information that makes the tracing information available to the user. Once again, here, there are different options. There are different tools. Okay, so the compatibility between these three layers is not that straightforward. There are some tools that are only compatible, what I mean is that there are some front ends that are only compatible with some data selection layers, and so on. I don't want to plot all the compatibility arrows here, but just to show you that there are many different options and according to the front end that you choose, you should choose on that discussion component, and so on. In this talk we will concentrate particularly in VCC and BPF. I don't want to say that this is the only system option or that this is the best option but we want to present this one because we think it has some nice features and some nice advantages. Okay, so why are we talking specifically about BPF and VCC here. The first reason is that those are fully open source solutions for the BPF case, this is a very flexible and efficient technology, and this is included in the Linux kernel. It means that you don't have to install an external kernel module to make it work. For the VCC case, it has many and great already decent tools, so it's like it's ready to use, you don't have to modify that in most of the cases. And even if you want to modify that this is very flexible and easy to modify, I will show you a demo about that later on. VCC also have libraries of bindings available for some programming languages, so it has a lot of compatibility with different programming languages. An important factor for us is that there is a head to community around, so if you hit any issues, if you have any questions, there is always a community willing to help you. So now we introduce BPF. So BPF itself is something that I guess for quite a long time, it was initially designed for TCP dump, but recently it was extended into a BPF extended BPF. And one of the church from this initially this case for TCP dump for network capturing network packets is now it can do more things than that. You can trace it kind of who kind of program for tracing. It has a BPF system called that I will explain a bit later that was not there before, and it has something called BPF maps that I will explain after as well. And finally, it has BPF system, which can be interesting for managing the BPF objects. Okay, so here I want to present the workflow, the general workflow it works like a bit behind the scene what will happen when you write a BPF program when you load it. So first we look at the top. When you write a BPF program, you write some code in C on that program in C can be compiled with a C long LLVM into BPF by code. This BPF by code can be then used with a BPF system call to load it into the kernel. And once your BPF by code is loaded into the canal, the first thing that the canal will do is to verify. So there is this step called the verification from the verifier that will verify that your program is safe. So this is to ensure that it will not crush your canal and it will not do anything. Don't show us that should not happen. I will go back to that a bit later. If the verifier saying the program is safe to run, then it is in the canal and it can be attached later to one of the hook I will describe in a bit in a moment. And this BPF program can call the kernel functions with a BPF helper functions and interact with the BPF maps. That's the latest mechanism I will explain a bit. Okay, and then the BPF program when it is executing, it interacts with the application only through the BPF maps. So that's communication mechanism to retrieve information from the kernel into the tracing application. Okay, as I mentioned before, there are different kind of BPF programs now that we have a BPF is not only for TCP dump animal but is for more use case. So there are three categories about networking, about security, and about tracing, but in this story we have focused only on tracing. So it's about using BPF program of type K probe or trace brand or a probe on a few others that were shown before. So a BPF program can be attached to different hooks in the kernel in that way. BPF programs can use BPF maps. Map is kind of a global variable that is on the Linux system. And that's variable is accessible both by the programs are user space programs using the BPF system calls, or it can be available from different BPF programs using the BPF helper function. So if you're coming to go into this, the test of the API, I just want to show that it, it is a mechanism for many, it is used for BPF programs to transmit information to the user space program. Okay, there are many different kind of BPF maps. There was in development in the Linux kernel, so there was initially only a few maps about different data structures like array or html. But in the latest kernel, I checked there is 27 different types of maps now. So there are many different kind of data structure, including different kind of bring the first to transmit information from the kernel into user space. So there are many different types of maps that are specialized that are sexual, for example, logger space, logger space match, Q, and so on. How do we use the map. Yeah, there is this BPF system call that you can use to create a map. But when you're on the terminal, you can use this command line tool BPF tool to create a map and then you can make it visible on the BPF system. And then you can use the 12 band for that nice introduction about BPF. So, I will continue with the introduction to BCC. So BCC stands for BPF compiler collision so and BCC is a toolkit that for creating efficient kernel creation and manipulation programs. This means that the BCC makes it easier to write BPF programs. When you are writing a BPF program, usually you have to take care about some of the lower details of the kernel, and it makes it sometimes difficult to write. BCC tries to make it a bit easier by providing a wrapper so BCC provides a wrapper around lab and allows you to access the BPF maps to define the maps in an easier way. BCC also provides some libraries for Python, Lua and C++ also for goal and externally by a different project called Go BPF. So what it means is that you still have to write your BPF program using CID but you can control that program you can load and access the BPF maps using other languages like Python, Lua, and so on. So this is the first part about this BCC is about providing the user a framework, a library to create, to load out and to manipulate BPF programs. That's just only one part of the BCC toolkit. The other part that I will say that could be the most important one is that it has a lot of different tools and examples that are ready to use. This is like the BPF of BCC where we have here the different kernel subsistence and we have some maybe some of the BCC tools that are out of there. So we can see here how there are different tools and how they interact with different kernel subsistence. As you can see there are many different tools almost there is a different tool for each subsystem. In this talk I will present you a quick demo about some of these tools particularly about OpenSNOP, SXNOP, and TCP Connect. If you want to get more information about these tools you can go to the BCC repo and you can see the list of the different tools that are there. You can see a small summary, a small description about the tool. If you click the name then you can have the full documentation about that specific tool and there are also some examples about how to use those tools. So before going into the demo there are different options to install BCC. The first option is to install a package for your distro so if you are running a bone to Fedora or a popular distro there are packages available for that one. If you don't want to install anything on your system or you are running a different Linux distribution you could run BCC on Docker. In this case you want to provide some special flags so I just put in the comment here in case you want to run this. There is a small detail, there is not an official BCC Docker image so a key info will have some images that we maintain time to time with the BCC functionalities. The last one is if you want to develop BCC or you want to try a feature that has not been released yet you call install from source. In this case this is needed to make some attention about the dependencies, especially the kernel in order to compile BCC we will have to be running the latest kernel. Okay so this is time for a demo. So let me switch to my terminal. Okay so here I have the different BCC tools on my system. This is actually known as BCC installation by running, I have here a git clone of the BCC repo so as you can see here those are the different tools that BCC includes. Okay so let me show you OpenSnoop so we can, if we want to get more information about a tool we can run always with this option to get some help. So for instance we can see a small description about what the tool is about, we can get information about the arguments that can be used, and so on. So the OpenSnoop tool allows to get information about open files on the system so it shows when a process tries to open a file, right. So this is a way to execute a BCC tool is to execute as a root user without any comments. So just to introduce my password. It takes some time until it loads. Okay, so as you can see here there are a lot of different processes open a lot of different files on my system. If I want to trace only the files of a given process I can use this option so these options traces all only the files of a process that has cat in the name so let's try to run some cat commands here. So as you can see here we get information about this execution of the cat command so we get the PID of the process we get the name of the command. We get all this information about the number of the file descriptor inside that process. We get if there was an error or not and we also get information about the path that the process was opening so in summary OpenSnoop allows you to have information about what are the processes that are open in different files on the system. Okay, so let me show you. And there is another very simple tool that is called SxNob. So this tool allows us to get information about new processes that are being created on the host. So once again let's execute that we wrote permissions in this case without any parameter. So, okay, and let's run some comments. Let's do a cat. Let's go up in common. Okay, here we have the information about those comments. We have what is the name of the common we have what is the PID of that process. We have the information about the parent process ID. So an additional information about the arguments used in the common and so on. So this tool allows you to get information about what are the different processes that are being created in your host. Okay, so finally, I want to show you another tool. This is called TCP Connect. So this tool allows us to get information about the different TCP connections that are being created from the host. So in other words, when a process tries to call the connected call for the TCP protocol, it prints that information on the screen. So again, we can run the tool without any comments. Again, we can try to open some TCP connections. So let me try to KingFolk and let me try to open the Google. And, okay, so we have also some information here we have information about what was the PID performing that operation, what was the common, what is the source address in this case the address of my computer, what is the destination address, and what is the destination part of that request. So this tool is useful to understand what are the processes that are trying to open TCP connections to a remote host. Okay, so as, as you can see, there are many different tools available, you can go to the repot and get information about those tools. And finally, you will find a tool for all the tasks that you need to do, but it could happen in some cases that you want to do something special that is not implemented in some of those tools. Maybe you need to create a new tool and you need to customize an already system tool. So I will show you quickly how to customize an already system BCC tool. Before showing you the code, I want to present you how BCC tool is done. So the BCC tools are composed by two different teams. One is the BPR program and another one is the user space script. The BPR program is the piece of the solar runs in the kernel, it captures and filters events, so it's attached to a key program and so on, takes the events from the kernel, performs some filtering, and then sends those events to the user space script using a buffer. In this case, a parent buffer that is a kind of BPR map. So the user space script, what does is to parse the user options. It customized the BPR code according to the user options, we will see this in detail later on. Then it attaches the BPR program to the kernel, and finally it runs a loop, pulling this BPR map and printing these events in a human readable way. Here we have the two different components of the BCC tool, we have user space, we have kernel space. So this is the user space component of the BCC tool that uses the BCC Python bindings to create and to compile the BCC tool and then to attach that. So, use this bindings to attach the tool, when the tool is attached, a BPR program is created in the kernel and this is loaded to the trace point, key process, and so on. This program takes those events, saves those events in the map. And then the BCC tool uses the poll to get the information about those events. Let me show you a quick demo about how to modify an already existing tool. So I'm going to use the last tool I present to you, TCP Connect. So, as you can see here, for TCP Connect, we can filter the events based on the PID, based on the part, but we cannot filter the events based on the destination IP address. So let's suppose that for some reason you want to filter these events based on a destination IP address. So I will show you how to extend this tool to perform that operation. Okay, so this is the TCP Connect tool, this is a Python script with all the tool. I will show you quickly how this is done. So we have the imports here to import the different functionalities that we need, some of them are for VCC and some of them are for the generic Python libraries. Here we have the information about the different options that are exposed to the user. And here we have the BPF program. So, the BPF program defines some maps to send that information to the user space, application and so on. We have some definition of the structure. This is not important for you to understand all of these details here. I want to show it. I only want to show you a general overview of the program. Okay, so these are some of the functions that are called each time, and it will happen. For instance, each time there is a call to the connect, this call, this function is executed, and there is the logic to call it, the information for that. So, as you can see here, I'm here, there are something like filtering, something like a C macros. So these macros are the filtering mechanism. So, these macros are then replaced by this tool in order to perform the filter if the user wants to enable the filter by that option. I will show you in a second how it works. Okay, and here we have some more information about some more code for getting the event, for getting the full information of the event from the kernel. And again, here we have more and more code. So basically what happens is that the code is divided in different pieces and then the VCC Python script and user space puts everything together, compiles the script and loads the BPF program into the kernel. Okay, so as you can see here, based on the, on the user options, there are some cops of situations. So, for instance, if the user only wants to count, so we would use this code or there was the use did one. And if the user wants to filter by PID, so we replace this macro by this implementation that performs a filter. So usually a filter in BPF is something like comparing a condition if the condition is not true, returning zero to avoid capturing the event. So we have a condition here, if the condition is not true, then we just return a return zero to avoid capturing the event, right? So let me show you how to modify this. What we want to do is to filter by destination IP address. So the first thing that we want that we have to do is to add an option for the user to specify the IP address to filter. So we can reuse the same port, we can copy and paste. Okay, I'm going to show the call A address, right? And okay, here we just have to put some documentation. Okay, so I have the option here. Then I need to look into the BPF program where the destination address is available. So if we go here, here we have the filter for the port and here we have some code for the IPv4 specific case. So if we go here, we have the destination IP address here. So what we can do is filter IPv4. So as you can see, I'm not implementing the filter here, I'm just adding this string that is going to be replaced by the BCC when loading the program. So, okay. Okay, so what I have to do now to do an S is define this filter. So what this filter is. So, again, we can just use the same logic that is for the port. So if I respond to address, okay, so we have the IP that is going to be equal to something, I'm going to show you this in a second. And then we are going to replace the test. So this case, this is called IP, if IP is different to something, we are going to do the same logic to avoid capturing the event. So this is going to be IP and this instead of being for this is called IPv4. So, what is the IP? So the IP address that we have here is an integer that is safe on network by ordering. So probably this is easier for the user to provide this destination IP using the decimal notation so we have to convert for from decimal notation to a network by ordering. So I'm just going to copy paste that to avoid losing some time on how to do that. Okay, so basically what we are doing here is that we are taking the argument from the user, we are converting that to IP address object using this library from Python, we are converting that to an integer. And finally, we are converting that to a host network by ordering representation. Here we have also to define this to replace the filter this is in case the user doesn't want to filter by destination IP address. So I think we are almost ready to go. The thing we are missing is to import some of the extra functions that we are using so we have a host to network loan. And we have also the one about IP address. Okay, so we modify the tool so we can try the tool again. Okay, so before trying anything I just want to run the tool without any common just to be sure that is still working. So we want to get the addresses of these domains just to do this. So we try to core to this address. We can see that we have even here we will try to core to this address. Okay, we also have the information there. So let's suppose that we only want to trace the this one. This is a small error. Let me check that. Okay, I know where we're wrong. So in the filter here and using IP, but this should be the destination IP address. So this destination IP address is safe on this variable. So instead of IP, we have to use this variable that is the one defined in the BPF code. Let's try it again. Okay, so if we try to core to this address we can see that there is no any event there. If we try to core to the this other we can see that the event is there. Even if we try to core to a different part, the event is also captured there. As you can see, this is not that difficult to modify the DCC tool. So if I show you the difference, you can see that this is very few lines of causing the change. So you can modify the tools very easily. As you can see, this is all from my side. I hope it was some more or less clear how to modify the DCC tool. I give the word to our band that will show you how to use BCC in an environment in a cloud environment. Thank you Mauricio. So what we have seen so far is using BCC on a single machine. And instead, what I want to talk about is when you want to use it on a plant, meaning when we have many machines to inspect. So in this case, it's a bit more complicated because in this kind of scenario, the user will not necessarily want to SSH on an individual machine to test that. So that's something we want to avoid on some time. We don't want to trace other kind of energy of one process or even one node. It can happen, for example, that your own engine X on different parts. So you have many replicas of the engine X application running on you might want to trace all of them. But if you have a lot of answer, you don't necessarily know which which node the traffic is going to target. So those are some kind of limitation. If you want to use directly BCC on the cloud. And we will see how we can overcome that. So I put on the left side different kind of Linux dressing tool, including BCC that are designed to run on one machine. On the right, I put some tools which are more on the Kubernetes level. So when you have much different machine many different machines that you want to trace using high level tools rather than connecting to a specific machine. For example, you have Qubectl trace to use the BPF trace tool. In the case of BCC, you can use Inspector Gadget and it has some support for some of the tools that Mauricio showed before. I will not make a big demo of that because there is actually another talk in this conference about it's a tutorial for Kubernetes and BPF. It's already gone on that date to stay October 27, but you can see a recording from the talk. Or if you want to see to do that by yourself, you can go to this repository. You can go to the article.com where you can reproduce the workshop and see how you can learn how to use Inspector Gadget on Qubectl trace and so on. But for today, I will just show you a short demo of Inspector Gadget, a very short one. Just one second. Here it is. So here you see my terminal I split in two. At the bottom, let's see I have a Kubernetes cluster. Actually, it's just one node I started with MiniCube which is a tool to start a machine on your laptop and have a Kubernetes cluster running there. Here I see with the Qubectl command that I don't have any power running. Only in the other terminal what I will do at the top is to run the Qubectl Gadget tool. With the exactSnoop command. So this exactSnoop command is directly taken from BCC. That's the exactSnoop tool that Amoricio demoed before. On top of that, the Qubectl Gadget command will allow you to specify which pod you want to trace. In this case, there are different ways to select the pod by the pod name and so on. In this case, I will select it by specifying the Kubernetes name space, the different one. I'm selecting the labels around equal cooking. So let's try that. See what happened. As mentioned before, I only have one node in my Qubectl disk cluster. It's called MiniCube in that case. And here it has started the exactSnoop command. So far it doesn't trace anything because there are no pod that matched this level. So let me this time try to start a new pod. This pod is called cooking. So it will have this all equal cooking level and it will run this script. So it uses this anti-pattern code by bash just for the purpose of this demo. Let's see what happened there. If I can use exactSnoop on this pod. So here you see the script is doing something. On the top of the screen, you see all the commands that this script is running. So you can see, for example, cat on all the commands that this script is running. Here I see it's installing something. I can see that it's using RPM to install that. So that's what you can do with this spectra gadget. You choose a BCC tools behind the scene on the different nodes of your Kubernetes cluster. Thank you. Okay, just let me share my screen again. Okay, so before finishing this presentation, in the slides we have some reference material just in case you want to know more information about that. Here you can find a lot of different information about how to use BCC, how to use BPF, how to use Inspector Gadget, and the different tools and technologies we show you in this presentation. So I think this is all. Thank you very much. I hope you have enjoyed this presentation.