 Hi, my name is Christen Fekete and this presentation is about finding out what your processes are capable with the help of eBPF. On this slide you can see an agenda. Basically at the first we will cover what is eBPF. Then we will take a look at capabilities and at the end we will see how you can integrate this into a proper monitoring tool to get some insights into your actual processes. As I mentioned, I'm Christen Fekete and I'm working as a field engineer at Solo.io. Previously I was working at an infrastructure SRE DevOps engineer. Here at Solo I really like to help companies to design service mesh solutions and solve their infrastructure challenges. And I also really like to work with eBPF. A few words of Solo. Basically we like to think about Solo as the next step in the cloud journey. As you can see, we are well funded. We have tons of satisfied customers and we are using these leading open source technologies to provide a really nice to use and scalable and secure platform to solve application and footprint challenges. So let's start with eBPF. What is eBPF? eBPF is basically a way to inject custom logic into the kernel and with eBPF you can do it as a safe, fast and flexible manner. The origins of eBPF can be dated back to the days of TCP dump. If you are in the game for long enough, you are most probably already used TCP dump to troubleshoot certain network related issues. There are multiple eBPF use cases such as security, tracing, profiling, networking and also royalty. In this talk we will talk about security and also royalty use cases. But all the other use cases are also quite interesting. But I think also royalty is basically where eBPF can really shine. So it's quite natural to cover this use case regarding security for example. On this slide you can see my favorite diagram related to eBPF itself. Basically if you have an eBPF program you have to have a user space and the kernel space program. The user space is what will handle the lifecycle of the eBPF code that will be injected into your kernel. And this is where you can basically interact with the problem. This is the program that will display all the information that it interacts from the kernel itself. On the right hand side you can see the kernel box. Here you have a verifier step and that's quite important because as I mentioned eBPF enables you to inject custom logic into the kernel safely. So with the help of a verifier it's quite hard to crash an actual kernel with these little experimental code snippets. So that's the first step when the actual code is loaded into the kernel. Then we have the actual eBPF logic. It's an event-based logic. So every time a certain kernel event happens that can be related to kernel props, user props, K-Pros, U-Props, TracePoint or certain other events. These can trigger your actual logic, kernel space logic that you injected into the kernel. Then to be able to exchange the data between user space and kernel space, we have the maps. These are data structures to provide this exchange between the two sites. Okay, so now that we know that we have eBPF is what are capabilities, you can see the official man page for capabilities here, but I really like to have to show you a simplified version of this. So let's think of capabilities like this. These are basically aimed to add superhuman capabilities to programs but without the need of using root user because you cannot root. Hopefully you are not using root user for everything. I like to focus on these words that are underlined here. Basically, capabilities can be quite dangerous as well because it can give superhuman capabilities to our processes and you don't actually need to use root for that. So it's better to have proper monitoring for these capabilities. That's a concrete example of capabilities. If you look into where the ping tool can be found on your machine, then you will probably see something like this. In this case, I have ping on that location. As you can see, the root user and the root group is assigned to it. And with get cap, you can actually check what kind of capabilities this tool have. You can see that it has the cap net row capability attached to it. And you can also see that I have a user called nutroot and I'm able to use ping to ping localhost. So this is what basically capabilities can do, but what actually is cap net row in this example. I again forward back to the main page of capabilities and crept for the specific capability. You can see that it can use row and package sockets, it can bind to any address for running sprint proxy. What does that mean? That means that basically the packets can be forced and the actual senders can be faked this way. And if you take a look at the bind to any address for sprint proxy line there, you can also guess that manually attacks are possible this way. But that's not the highest power that you can have with capabilities. There are others. And if you have processes using these capabilities, then that can lead to dangerous situations. So let's take for example cap net admin, you can see all the various operations that it will be enable for your processes. Basically, you could have full control over your network interfaces, you can alter firewall rules, so it can be pretty bad if these capabilities are abused. So how would you monitor these capabilities? That's not quite easy without eBPF, but basically the kernel itself keeps track of all the capabilities that your processes are invoking. There's the cap capable function and there's also kernel probe attached to it. So you just basically need to watch out for this kernel probe in your kernel and you need to make this data available in the user space. If you have this power basically you can understand, really understand what capabilities are actually used by your processes and applications. And if you put some monitoring in place based on this information, you can basically get notified if certain applications are using capabilities that they shouldn't use. You could think that, okay, but how would I get this capability? I'm not a kernel engineer. The good news is that you don't need to be a kernel engineer, kernel engineers before us created tools to tackle these issues. On this slide you can see the BCC collection that's basically all of these are eBPF based tools that can help with you to solve various issues regarding for example your runtime schedulers or network devices. All of these has tools already available for your use. Brandon Gregg, the guy who was working at Netflix, now he's at Intel. He's one of the most famous person of the eBPF ecosystem and community. He created the tool called capable and basically he was monitoring the Linux security capabilities with this tool. You can see the output of that program. To sort of modernize this application and to make it cloud native, we can use Bumblebee, which is an open source project by solo.io. Basically with this you can build, publish, and run eBPF based tools and it can also turn into these events to primitive metrics. Basically the promise is that you don't need to take care of the user space code because that's not something that's very exciting. But Bumblebee basically lets you just reuse kernel space code and it will generate the user space code for you and additionally it will expose all the events as primitive metrics. The user experience is really nice. You can basically use Docker like comments like build run, push to handle these OCI images. You can also take existing eBPF tools as an input to Bumblebee and you can port those to Bumblebee itself and expose the metrics as the events as metrics. If you go to this library, this folder on the BCC repository, you will find two or maybe three files per tools and basically you just need to take care of the eBPF.C version of the tool. That's the kernel space code. All the other ones should be handled for you. So let's see this in action. Here you can see a virtual environment. I have a Kubernetes cluster up and running. You can see that I have three nodes. I have Bumblebee already installed. You can see these comments here. Now what we can do is that I already have a capable of .c file here. Basically this is the kernel space code. Here you can see what kind of information are we interested in for all the kernel events that can invoke the CAP capable core. And this is the map that we will use to exchange data between users in the kernel space. The generation of Prometheus metrics is quite transparent to you. So basically we have this .counter suffix added to this map and after that all the data will be exposed as a counter. That's a Prometheus metric type. And that's it. Now after this, now that you have the code, you can do something like build, reference the local file and compile it to an image. After this, we can push this to a local or remote registry. I have one running locally, so I will use that. And after that I can run the actual program. For that we have a text-based user interface and this can basically show the output of the program. This is what we will expose as Prometheus metrics. So for that let's deploy Bumblebee as a demo set. As you can see I'm using a regular demo set. I'm using the Bumblebee image. That's the same CLI that I just ran a few seconds ago. We are actually working on an operator to make this work better for Kubernetes workloads. So check out our GitHub repository to see the work on that front. And I'm passing the OCA image as a variable as an argument to this container. So this is already deployed. I can check the ports that I have. I have three ports running. It's a demo set, so I have one for every node. And now that we have this, we also have Prometheus installed. And you can see the data here. Basically you can see that we have all the capabilities listed here. You can see these over time. It's a different illustration to see these actual capabilities. That's a histogram if you are interested in one of the most noisy capabilities that are being invoked by your applications. And here we are filtering for tasks that are actually using this. So let's say for example you are interested in a number 12 CapNet admin. You can also use the selector filter for CapNet admin. And you would see that these applications, these processes are using these capabilities. You can see that one of them is more frequent than other. So it's a quite nice way to monitor all your capabilities that are being used. And you can set up all our things based on this because that's basically just regular Prometheus metrics. You can see the roadmap here. Basically these are the tasks that we have on our roadmap. We want to have the auto-combinities integration, new metrics types, log support. So keep this project in mind and feel free to experiment with it on our GitHub repo. You can also visit academia-solid.io where we have free open source, free workshops on all these open source technologies. For example, EBPM, you can find out more information about AmbientMesh, which is also quite interesting and exciting. And we cover all other open source products projects as well. Feel free to join on our Slack as well. And I was happy to be here. Thank you for watching this session and feel free to reach out with questions.