 Okay, hello everybody, welcome to this presentation. Thank you so much for attending my presentation. I'm Mauricio, I work as a software engineer for Microsoft and today I will be presenting you how to collect low level metrics by using BPF. So, yeah, in the agenda of today, I want to introduce the concept of metrics. I hope that many of you are already familiar with that, otherwise this is just a quick introduction to metrics and the same for the BPF. I will be covering very quickly what the BPF is and some concepts that we need to understand the content of this talk. Then after that I will be speaking what is the relationship between metrics and BPF and finally I will be presenting some of the projects that we can use to collect metrics by using BPF. Okay, so there we are. So, metrics, by definition, a metric is a measurement of a service capture around time. So we can think of a metric like a number that represents the performance of the help of our service. So, yeah, examples of metrics are the percentage of CPU that our system is using, the quantity of run that our system is using, the error rate of the response, the output of our system. Oh, yeah, basically any numeric measurement that you can do on your system that represents the performance of that. So why do we need metrics? Well, by using metrics we are able to understand if our service is available and also we are able to understand what is the performance of our service. Yeah, for sure, probably many of you get alerts when a metric is changing on the system when there is an outage. So, yeah, we can configure those rules to understand when there is an issue with our system. And another thing that we can use metrics for is to perform trigger scheduling decisions. So if we need to allocate more resources for our system or maybe if we have so many resources and we need to remove some of them for our system. There are different kind of metrics. So the first one is like the simplest one to understand is the counter. So the counter is a numeric value that only can go up. So we can use counters for representing the numbers of packets that are being sent on a system, the total, the number of requests that are being processed and so on. The second one is gauges. So gauges are also a single numerical type that can go up or down. An example of gauge could be the CPU usage on our system. So we can have high CPU usage but it can go down or up. It's the same for memory. So our system could consume more or less memory. So yeah, that's a numeric value that can go down or up. And the last one, well, I forgot to say that this is the definition according to Prometheus. If you go to open telemetry, there is like a different definition but in the end the core concept is the same for all of them. So histograms are the more like the most difficult one to explain. So the idea there is that we make a measurement, we divide the range of possible values into different buckets and then when we have the measurement we increase the counter for some of those buckets. So yeah, histograms are used for latency measurements in general. So they provide some statistical information about what is going on with our system there. So what is the average response time or if there are any outliers and things like that. Okay, and the other concept I want to introduce very quickly today about metrics are the dimensions. So when we perform a measurement we not only care about the numerical value but we only take additional information for that metric. So those are dimensions called labels or also attributes. So for instance, if we are taking the number of packets that are being sent on a system, we only care about all the packets but we want to provide additional information on those. For instance, what is the network interface where we are sending those packets? What is the IP protocol and so on. So dimensions are important because by using there we are able to aggregate the data. So in the example that we have there, for instance if we only care about a specific, if you only care about the network interface we can aggregate the data. We can remove the IP protocol label so we can get data by specific network interface. And also we can perform filtering. So for instance, if we only care about IP before we can filter all other values there. So the cardinality of a metric refers to the number of combination of the labels. So in this case, the cardinality is four because we have two different interfaces, two different values for the IP protocol. So what is the point about cardinality? When we capture metrics we want to keep the cardinality as high as possible because we want to have as much information as possible. But when we have high cardinality we, our observability system need more resources especially memory. So we have to find the right balance there. We need as detailed as possible information but at the same time we don't want to consume a lot of resources in our observability application. So this is going to be important in a second when I will show you how to collect metrics. Okay, I know it was a very fast introduction of metrics. Let's switch to the second topic that is BPF. So what is BPF? BPF is an Inkernel by code virtual machine. So basically with the BPF we can take programs that are provided by the user and we can run those programs in the context of the kernel. What it means is that we are able to change the kernel behavior by running those programs that are provided by the user. There are different use cases for BPF, tracing, networking, security. In this one we are more interested in the tracing one because we want to get information about what is going on on the kernel by using BPF. So this is just to give you a quick introduction about why BPF is so popular right now. So I will say that there are three different things. The third one is that by using BPF we are able to bring flexibility to the kernel. So we are able to change the behavior of the kernel without having to recompile the kernel. So that's very powerful because we can like something like implement new features in the kernel without having to recompile not install a new version of that. The second one is efficient. So it provides a just-in-time compilation approach. What it means is that it translates from BPF instructions to machine instructions on the flight and it gives us a very good performance there because there is no like an emulation in that. And the last one is for sure also very important. BPF is safe. So the kernel has a mechanism that allow us to be sure that the BPF program that we are running there are safe so we can not crash the kernel. We can no access to memory that they are not allowed to. So we can say that BPF program are running a sandbox in a way that those are safe to run. Okay, so yeah, I want to introduce a couple more concepts there just to be sure that we are able to understand the next topic. So if BPF programs are even driving it means that when something happens on the kernel our BPF programs are a secure it. And the source of those events are called hooks. So what are examples of that? For instance, network devices we can attach our BPF program to different network devices and when a packet is received or sent to that network device our BPF program is a secure it. But for this specific talk the most interesting one are the first one. So we have K probes and three points. So those are hooks in the kernel that allow us to attach those BPF programs to any function within the kernel. So for instance, we need to understand what is going on on a specific function of the kernel. We attach the program there and each time that function is a secure it in the kernel our BPF program is run. So from there we are able to understand what are the parameters of the function what is the return value of the function what is the process user or whatever that is running that specific function. So the interesting thing is that we can attach to almost whatever point in the kernel that we want. There are other cases but I will skip them for now because those are not so interesting for this specific use case. So this is how it looks like we have an observability application. So this is the application that performs those observability. It runs or it injects better different BPF programs in the kernel that are attached to different hook points there for instance, storage, networking, syscalls. And then we have the processes that we are monitoring on the system. So those processes needs to interact to the system by using syscalls. So by having those BPF programs in the kernel we are able to understand the activity that those processes are performing by sending network packets, calling functions in the kernel, accessing the disk and so on. Another concept that I need to explain is well with those BPF programs we are able to capture information from the kernel but then we need a place to store that information. So that is what the BPF maps are for. So we can think about those maps as a key value structures that are used to share information between different BPF programs and also between BPF programs and user space applications. So basically the idea is that the BPF program runs, it grabs, it gets some information from the kernel, stores that information in those BPF maps and then our application from user space pulls those data. Okay, so yeah, I know it was a very quick introduction on BPF but I just wanted you to have like a general idea before I go to the next topics. Okay, so what is the relationship of metrics and BPF? Right, so with a BPF we can get a very deep insights of what is going on in the kernel. So as I was explaining before we can attach to any kernel functions so we can get very low level information there. And for sure, also as mentioned before it has flexible efficient and safe. So it makes like the perfect tool to get these low level details on the kernel. There are different projects that provide metrics by using BPF. In this talk I will cover in three of them but for sure there are many more I'm not aware of. Okay, so let me show you the first one of them. This is called BPF Esporter, this is by CloudFair. So yeah, the definition that we can find in their website is this is a Prometheus Esporter for custom BPF metrics. So this is important to understand that this is for custom BPF metrics. So the idea there is that the user can write their own BPF programs to get that information and then the BPF Esporter is going to export and to expose that information as Prometheus metrics. So they support counters and histograms. And yeah, so as mentioned before when you are creating a program there or better when you want to use this project you need to create two things. The first one is the BPF program to get the metrics. And the second one is a configuration file that defines what is the format where those metrics are stored in the BPF map. So this is what the configuration file looks like. So you can see we have metrics, we have counters. So in general information about the metrics like the name, help. I will show you in a second what labels is about. And yeah, depending on the kind of the metrics you have more parameters especially for the histogram regarding to the bucket configuration and so on. So there is something important to understand here is when we capture information from the kernel using BPF many times we only capture like numerical identifiers of the things. So we need a way to convert that numerical identifiers to a human readable version. So the BPF Sporter project does it by implementing something that they call decoders. So the idea is that we provide an number and by using a decoder we are able to convert that to a human readable version. One example is the C group ID. So that's an integer on the kernel and by using the C group decoder we are able to convert that or we are able to get the C group path from that C group ID. And yeah, this is how we configure that on the labels. So we have the decoder. First we convert that to an integer and then that's converted to a C group path. Okay, so time for a demo about that. So if we go to the BPF Sporter website we can see that they already provide some examples there. So for each of the examples they have this BPF program and the configuration file that is a YAML. So yeah, as you can see they have a bunch of different examples ready to use. In this presentation I want to show you this specific one about syscalls. So this is an example that provides metrics for different syscalls. So provides a counter for each syscall that is executed on the system. So this is the structure of the BPF program. So we have an BPF map to store the metrics. So as the key of that map we have an integer. So this is the number of the syscall and value we have another integer. So that will be the counter. And then this is the BPF program that we use to capture when syscalls are executed on our system. So yeah, the only thing that is done by that BPF program is to increase a counter on the BPF map. And then we have the configuration file for that. So yeah, we have metrics. We are defining a counter that we call syscalls. And in the labels here we only have a single label that is called syscalls. And yeah, again in this case we have to use a decoder in order to convert the system call number to the name. Let me show you how we can run that. I don't have the time to go into all the details of this YAML manifest. I will only show you the most important part. So in this case I'm using the official container image and here I'm saying what is the directory where the configuration files are stored and what is the one that I want to run. So in this case it's only the syscalls one. Let me apply that. So as demon said was created, Prometheus was also deployed. And if we go to the Prometheus interface we can see that our metric is available there. So if I query the metric I get the different values. So we have the name of the syscall. We have the counter for that. Yeah, so there are a lot of differences called that are being executed on my system. But what I want to highlight here is that we don't get any information about the Kubernetes spots. So we have like the syscall but we don't have like what is the pod, what is the container that is performing those syscalls. This is because the BPF project supporter is not integrated with Kubernetes. So they don't provide that information. Okay, the other project that I want to talk to you today is Tetragon. So the definition of Tetragon is a flexible Kubernetes aware, security, observability and runtime enforcement tool. So by default Tetragon traces different events like when processes are already secured, syscall activity, IO activity, this includes networking and file access. Yeah, so Kubernetes aware means that it understands the different components of Kubernetes. In other words, it is able to provide information about the Kubernetes pod, the Kubernetes container and so on. So let me show you a demonstration of that project. In this case, I already have Tetragon running on my system. So there is a Tetragon pod there, also a Tetragon operator running there. Prometheus is configured to scrap the metrics from their end point, so. So if we go there, we can see that it produces different metrics. The one that we are interested in is called Tetragon events total. Again, so if we query the metric, we can see the information there. So it provides when a process was executed. So it provides the counter there. When the process finished execution, it also provides a label for the binary that was executed. And yeah, so looking here is also that for some of them it also provides Kubernetes information. So we have the name of space, we have the name of the pod where that activity was happening. So those are the metrics that we get by default when we deploy Tetragon. But what is interesting about Tetragon is that we can configure and we can get other metrics if we want. So Tetragon defines a tracing policy custom resource. So by using that custom resource, we are able to say, hey, count a metric on that or on that on another specific point on the kernel. So let me show you that. Actually, I should be showing that before applying, but anyway. Okay, so this is the tracing policy that I have configured there. So there I'm telling Tetragon, okay, attach, approve on a trace point on the Roses calls on this specific event. So this is like other way to tell the program what to do. So in this case, we don't need to write BPF code. We are able to configure the metric by only writing this YAML file. So yeah, this example is very similar to the previous one. We are content this calls, but in this case, we don't care about the BPF code. Okay, so let me go back to Prometheus. So if we list the different metrics available, we see that now we have this Tetragon's calls. So if we query that, we can see that the result is very similar to the BPF Sporter one, but the difference there is that we get information about Kubernetes, or well, this is what I'm trying to look for there. So yeah, there it is. Yeah, right, so as you can see, it provides information for Kubernetes components there. Okay, and the final project I want to show you today is called InspectorGadget, so as a disclaimer and one of the maintainers of this project. So I will try to keep as neutral as possible, but for sure, this is the one that I like the most. So InspectorGadget is a tool designed for the creation, deployment, and execution of BPF programs, both on Kubernetes and in Linux machines. So we can think of InspectorGadget as a Docker runtime for a BPF. So the idea is that you, as a developer, you create your own BPF program, you put that in an OCI image, you give that to InspectorGadget and InspectorGadget will take care of injecting those, running those programs in the kernel. Yeah, another interesting thing about BPF is that, as I showed you about InspectorGadget, sorry, is that when we get information from the kernel, we usually get low-level information. So we get the PID, we get the user ID, but there is no container concept in the kernel. So InspectorGadget provides that mapping, adding a context about container, pod name, and so on there. So specifically regarding about metrics, there are two different ways to support metrics in BPF in InspectorGadget, sorry, one is in user space and the other one is in kernel. So in user space metrics, in this case, we have different tools that already provide information, already provide events. So the idea of configuring the metrics in user space is to count the events that are generated by those already existing tools. So those tools, each time something happened, they send an event to user space and then there we perform the counting, the aggregation and filtering of those events. So yeah, for sure this solution is less performant because we are sending all the events from kernel to user space, but it was also easier to implement. So that's like the initial support that we got for Prometheus there. And yeah, for sure this is up to the user to configure how to count and how to aggregate the events. So this is configured by using a configuration file. This is based on the BPF-Sporter one. So yeah, as you can see, we have metrics, some generic information about the metrics like the name, the type of the metric. This category and gadget refers to the existing tools that we have. So if you want to use this, you have to go to our website, check the existing gadgets that we have, check what is the information that they provide in order to understand if they provide some events that are useful for you. And yeah, so we have selector labels. This is explained here. So selector is to filter out some of the events that we don't want. So maybe we only care about events on a specific name space and a specific pod. So we can configure that. And the labels define what is the granularity that we want for our metrics. So in this specific example, we are going to have a label for the pod name or for the container name. So let me show you how it works here. So this is a YAML manifest with all the configuration there. So this is the configuration file for InspectorGadget that is stored as a config mac on the cluster. So yeah, there I am defining a secure process metric that is going to use the TraceEsec gadget. So one of our existing tools. In this case, I didn't configure any filtering and those are the labels that I configured for that example. So the coordinates name space, the coordinates to spot the container and the name of the process that was executed. Okay, and yeah, this is just to show you how we run that. So I'm using the one of the official InspectorGadget container images. And this is the command line that we have to use to run this. So we say this is Prometheus, we pass the path to the configuration file. And that's it. So I deploy that. Again, a demon said was created, Prometheus is running. So this is really similar to the VPF supporter case before. Yeah, in this case, it takes a little bit more until the metrics are available. And yeah, as you can see, we got our metric available there. So yeah, in this case, we have the labels that we configure there. So we have a coordinates name space, spot the name of the process and so. So what is interesting about this approach is that again, you don't have to worry about writing VPF. You can use our existing tools to get some metrics, but it's less performant. So there is a trade-off there. And the other way that we can use to collect VPF metrics in InspectorGadget is to count or to call it then directly in kernel space. So in this case, the user has to develop their own VPF code. The user has to define what is the granularity that they want to use. So yeah, very similar to the VPF supporter case. And for sure, this is more performant than counting on user space. So the idea that we have is that we are going to provide some tools that support some common metrics. So maybe in some cases, you don't have to write the VPF code by yourself. So let me show you a demonstration of this one. Okay, so this is the same example to count CIS calls. So as you can see, there are two files there. One is the VPF code. The other one is like a configuration file. So this is the VPF code that we use to gather that information from the kernel. So very similar to the VPF supporter case. The only difference there is that we are providing this mount in a space ID. So this is a number that we can associate with each specific container. So in SpatorGadget, we'll automatically map this number to the container pod in a space and whatever there. Again, this is very similar. This is our VPF map that we are using there. And this is the VPF program that we use to gather the data from the kernel. So yeah, it really similar to the VPF supporter case. And let me show you the configuration file in this case. This is simpler. So we have, what is the name of the tool? And we say, okay, this tool provides metrics and the metrics are stored on a VPF map that is called CIS calls. So this allows InspectorGadget to understand what is the VPF map that I have stored it in order to provide the metrics. So again, this is how we export this. In this case, I'm using like a custom image for this presentation. There I have the IG binary and the compile version of the VPF code that I showed you before. And this is the command line that we have to use to run this tool. Okay, so let me deploy all of that there. Very similar to other cases before. If we go to Prometheus, we have to check again. So there we can see that we have our CIS calls metric. And as you can see there, we provide a lot of information related to the container. Not only the name space but we also provide more information like the ID and so on. One thing that we are still missing is that we only provide the number of the CIS call which will be providing also the name but this is something that we are still working on. It should not be that difficult to implement. Okay, so I think this is the most important slide of the presentation. The thing that I want you to take away from this this is metrics are used to understand what is the health and performance of our system. And the VPF is a really powerful mechanism to get to collect data from the kernel. And there are different projects that provide metrics based on the VPF and based on the dependent on the obstruction level that you want to have, you could choose one or another. So maybe you want to write your own VPF code. Maybe you only want to configure a YAML manifest and also dependent on the labels that you need. So if you only need operating system labels you can use VPF as part of it. If you need also coordinate this information then you will have to use something like Tetragon or Inspector Gadget. So yeah, as usual in this presentation I really like to prepare some reference material so all the things that I explained today are better explained there so if you want to go and get more details you can check it out. Actually, the last one could be the most interesting one. So all the YAML manifest, all the files so reproduce the demos that I showed you before are available on the repository there. Okay, and finally, this is really important if you have any feedback about the presentation. If you like the presentation, if you didn't just go ahead and give us some feedback. And yeah, that's it, thank you. Happy to take any questions. Thank you.