 So, hello, welcome everyone on one of the last talk today. So it's going to be about securing the superpowers and who loaded the EVPF program, so basically auditing EVPF programs. So who is speaking today? I'm Natalia. I'm a security product lead at Isawayland and here is with me, John, who is a tetragone lead and silyum maintainer as well as staff engineer at Isawayland. So I wanted to start with the background and the motivation to this talk first. And as you already know, like EVPF is on the rise or already rose, it became one of the leading Linux kernel technology today. So more and more projects are using it under the hood for different use cases. For example, networking, observability, and security. So this is just a small subset of projects that are using EVPF, but this would give you a glimpse of what can be done in different areas. So for example, in networking, it can be used for high performance and load balancing in modern data centers as well as in cloud native environments. So one good example is Cutron, which is a high performance load balancing from Facebook that they created to replace IPVS. This is another software-based load balancing solution. So they switched actually to EVPF and so a massive performance increase. This is actually open source, so you can actually go to the GitHub repository and check it out. It's Facebook specific, but good for other Linux-based software infrastructure systems as well. So we have Celium, which is providing networking, load balancing, security for Kubernetes services. On the observability side in the middle, we have BCC and BPF trace. So these are for application profiling and tracing. So for example, understanding what my application is doing, what my application is not behaving the way we expect. For example, how many blockhouse it's using and so on. We have Hubble, which is the visibility component of Celium and it can be used for example for network policy troubleshooting, auditing. So for example, which network flows are denied by which network policy, what is the source pod and so on. And then for security, so we will dive into these use cases later, but we have Datragon and Falco, which is applying BPF for container on time security. So in-spectic kernel functions, system calls and figuring out if a malicious behavior has happened. We can also implement, for example, these privilege policies and then we can create profiles based on the observed event and then we should say like these are only the allowed events by that source pod and then we should prevent everything else. We can also use BPF for preventative security. So terminate any kernel functions or system calls inside the kernel instead of having like a user space agent observing it. So for example, these are two security use cases that we can do. One is data exfiltration, so security team can observe occupant's name space and figure out like which pods were the most outbound talkers in the last hour. Is it expected? Who sent out the most traffic? Can it be data exfiltration? And we can also do like file integrity monitoring. So which pods or workloads open sensitive files with which binaries, who was root and is it even expected? We can also do capability and namespace success monitoring. So security team can monitor, for example, certain namespaces and ask questions like which pods were running with Copsys admin or Cupnet rule. Do we expect this? Which pods had, for example, host bid or network namespace success? Do we even expect this? How long who started this pod and so on? So EVPS became cross-platform. It got on Windows machines with the Windows runtime recently. It's available on most Linux distributions. And then all the major cloud providers also support. So as a motivation, since EVPS became so powerful, security teams needs to answer questions like who is watching EVPF. To remain secure, it's actually important to keep track and audit which BPF programs were loaded and then which BPFs map were created. So what does audit mean exactly? Like who loaded it? Which Gbertis workload, which process, which binary from which ancestors? When was it loaded? Should this program be expected? Have we seen this program or process before? And should this process touch BPF at all? So these are the questions that we are trying to answer today with Tetragon. Great. Hey, everyone. Awesome. All right. So first I wanted to talk about Tetragon. So that's our tool, right? And we saw some of the use cases that it solves. And we saw some of the dashboards that you can create. And so I'm going to kind of talk a little bit about the architecture here from this cartoon, but at least it gives you the big kind of bullet points. And then from there we can dive into how we address the BPF monitoring piece of it, that specific part that the talk is going to focus on. So first what is Tetragon? It's our security observability and runtime enforcement. What this means is you take BPF, you hook the Linux kernel, and you can hook all of these locations, right? You have the TCP IP stack. You can hook system calls, process execution, file systems. You get all of your C groups in namespace monitoring, right? And what Tetragon can do inside of Kubernetes, since we're at the CNCF here, is you can take all of that kernel level data and put on top of it the Kubernetes metadata and get something that operators and people that monitor and manage Kubernetes environments can make some sense of, right? Because if you didn't do this and you just said, OK, I'm going to put a BPF Pro here. Maybe I'm running BPF Trace or I'm running BCC tools. If you're familiar with that, these are kind of low level BPF things. What you would really just get out of your system then is, you know, PID, name a random PID, launch some file, FD3, and by the way, it's in some random C group on some random node in your system, right? And so what you really want to do is up-level all of that. And that's what Tetragon is really good at doing, taking this low level data so you can monitor all your files, monitor all your execution, create a nice execution trace for anywhere in your cluster along with timestamps into a data stream, right? So then you can analyze all of that data, right? You can ask questions like what executed three days ago on this system and what programs did it run, what files did it open, who did it connect to, this question. You might even ask, you know, what are the DNS entries for those IP addresses that I saw these connects for and what's really interesting for today, what BPF programs did it launch, right? Because that's what we want to answer. So when we talk about BPF, the next thing we want to talk about is what is a BPF program, right? So typically I think of an application program, you're thinking of a set of like an executable, something that you launch and so on. But BPF is slightly different. For one, it runs in the kernel. That's also interesting, but what is kind of interesting from the question of what's running, is that BPF is not just a set of instructions, there's an entire runtime around it. So you can think of it as a set of instructions plus all this other stuff. And the other stuff here include core. What core does is when you load your BPF program, it's actually a runtime rewriting kind of framework. And so it takes the program, it loads it in, you're gonna replace a bunch of the instructions with kernel specifics. So if you think about, I want to read a structure, a field and a structure, maybe the PID of a task or something. And when you do that, that PID location, that structure, you're gonna say, read the task struct and then read the offset into that task struct. It's not the same across all the kernels. So what it will do when you load your program, what the core infrastructure does is rewrite those instructions so that they actually read what you're interested in. So for one, that's interesting because now your set of instructions that you downloaded onto your program is not the same instructions that you actually run. Just one point on why your BPF program is sort of different than just a set of instructions. In addition, you have a bunch of maps. So the maps are the piece between BPF, between the kernel and user space that communicate between the different programs. So your kernel piece might write to that BPF map, but also multiple user space applications might write to that map, or multiple kernel pieces might write to that space. So if you think about it, it's almost like a message bus between different programs and applications. It can be memory mapped, for example. So that influences what map that program has attached to will influence kind of what the system is doing. And then there's what are you connected to, what type of program you are, and so on. And all that together, in my mind, is actually what you wanna monitor. You wanna monitor that bundle of BPF stuff because if you imagine a BPF program that's connected to the wrong map is gonna be buggy or worse malicious, a BPF program that's attached to the wrong function isn't gonna work either in the way you expect. So you really wanna get that full profile. This is a diagram that Brendan Gregg made, and then we modified it slightly. Daniel Borkman, one of my colleagues, also modified it. And what it shows is that flow graph of what loading a BPF program involves. And so on the right side, you see the application side. What is the BPF application, sorry. Yeah, left side, you see the BPF application. And what we have over there is like the instructions, like we talked about. You have a compiler, so at some point you need to create, probably C code into byte code. And then you create your maps and all this kind of stuff. And then what we see between the left and the right side is the boundary between the kernel and the user space slide. And you see a bunch of different objects. You can modify the bottom. There you have Perfring, BPF ring buffer, shared maps. Perfring is just a high performance way to get stuff out of the kernel between user space and kernel. Shared BPF maps are any of the other map types we have. Hash maps, ray maps, stacks, so on. People are always adding new things there. And then the top is where you actually load your programs, which is actually done through a system call and loaded that way with an opcode. So when we think about monitoring this, the question is like how do we want to monitor this? Well, we don't want to hook the syscalls directly, usually, there's a good reason for that. One of them being that we don't want to hook every possible syscall that interfaces with BPF. And so what we do is we hook the BPF verifier. The verifier is the piece that when you load your program, it's gonna make sure your program is safe to run in the kernel. It's gonna make sure it's not writing to random data, trying not to make sure it's not reading random data. It'll also do other things for safety, check for divide by zero, those kind of things that you would think of which would normally cause segfaults in normal programs. So there's one reason to put it there is just because it's the central place where all BPF programs come through. The other advantage of having it there is it's after the user to kernel space copy, okay? So one thing about BPF, if you hook a syscall, what can happen is if you try to read user memory, which is just a pointer into user memory, it's entirely owned by the user space process at that point, right? So from a security standpoint, we don't wanna try to read user memory and race with a user, right? Cause a user owns the memory, they can change the data, right? So if we're trying to do a security property and we say I wanna read your instructions or read the name of your BPF program, and I'm gonna say do some analysis of that program from the BPF side, or even if I'm gonna copy it out to user space for sort of post analysis, I need to make sure that I can't do that copy and then have the user just change the data, right? There's also a more fundamental reason would be that user space can actually just fault, right? You try to read user memory, it's not in cache, it's gonna fault, and then because BPF on most instances does not wanna sleep, will not be able to pull that fault into the memory. And so what you'll get is just an error, you'll get no data, and now you have a big glaring gap in your security analysis tools, presumably running behind this. So that's why we hook BPF verify, the summary is we see every BPF program that's loaded, we've put a BPF program right on that call, anytime you load something, we'll get a call back and then I'll talk in the next couple of slides what we do with that call back. And then we also hook these other entries, you have to be a little bit careful for the syscall reason, they're actually done on the other side of the copy inside those syscalls. So they're not technically syscalls, but in this box, we just show them on the edge there. And those are for all your maps. And the important thing is, if you look at that dotted line, everything that's going from the user space to kernel space, going across that dotted line, can't get to the core networking part in this example, but we have other slides that show kind of K probes and other things, can't get to the far right without going from one of our red boxes from left to right. Yeah. Oh, okay, so the text in red is where we're going to put hooks, BPF hooks to capture the BPF loads. And so the, sorry if I wasn't clear, so when you look at like this flow chart going from left to right, we want to make sure that we go through a red box before we get to the far right where the programs actually run to ensure that our analysis tools that are running in BPF get an event that something happened from the application to the kernel side, is that clear? Great, perfect, thanks, sorry about that, if I didn't say that up front. So then once we get this flow chart instrumented with BPF, we now have a way to get these events. So then the question is like, what are we actually looking for and what do we care about? So the first case in our kind of cartoon picture here is the good case, where you have some application, I call it Alice here. That application probably has a hash associated with it, that's like a SHA-256 for the executable. I called it foo here, but usually it's 256, a SHA-256 or 512 or whatever you set up your system for. And then it has a program and some maps. So Alice.o is the BPF executable here and then I call it map A and map B are those two maps that let you push data between kernel and user space. What that application is then going to do is call that sysbpf call, that's the system call to load a BPF program. If you go back a slide to this, what that's gonna do is cause that BPF loader block in the middle top middle there to call the BPF verifier with that Alice.o, which is then gonna trigger our event system so that we get a notification that a program has actually been loaded. And then on the bottom here, as I put what our system spits out as that event, we see that event, we say there was a BPF load happened. This is annotated with the Kubernetes namespace. So we put the Kubernetes namespace there, BPF and S, the pod, Alice in this case, along with the application that loaded it, SBIN, Alice, we can put the SHA-256 there if you have the system set up for it. We gave a talk at QCon about that so I won't go into details there. We give you the program type which will tell you what kind of program it was. If you're familiar with BPF, there's K probes and there's networking hooks and there's security hooks and all this kind of stuff so you kind of wanna know that as well. And then we give you some details about the instructions that was loaded. They have a name, Alice function here, extraction count. And because the BPF program now has access to the entire BPF op as it's being loaded, we can put arbitrary other data there as well for pretty printing, pick some things that we think are as interesting but we actually have access to the entire BPF instruction. So you can do things like copy that program out so that you have a copy of every BPF program that was loaded. They can be kind of large 4K instructions but one page BPF loading BPF programs is usually not something that you're doing very frequently so doing a copy of 4K bytes is usually negligible that depends on what your use case actually is. So this is the good case, program gets loaded, events happen, nice pretty printing of the auditing flow. So let's just explode our example slightly. Let's check my time. Explode our example slightly and we'll add Eve and Bob here. And of course they have their shaws and I think we called it bad and bar there. So the next question is what happens if Eve tries to load something and we don't expect Eve to load something, right? Well, just like before, since the tooling doesn't care, right? It's gonna give you the event for the two. But what's actually a little bit extra interesting about this is I think you would make the argument that if Eve is running in its own pod you probably should never give it cat BPF if it wasn't meant to load BPF programs, right? Like basic capabilities here for pods. But what's extra interesting about this is even if Eve happens to be running in Alice which you gave cat BPF because you needed Alice to load a BPF program you still get the audit. So if for some reason something inside your pod is loading BPF programs besides what you expect to you'll get an audit record for it. And the next interesting thing is with Tetragon we have enforcement and which allows you to say match on binaries or SHA-256s if you want. It's easier to look at program names than SHA names. What you can say is if I see this call and it's not Alice, stop it from happening. So even if they're in the same pod so you need to have cat BPF but you know exactly what programs should be loading BPF programs you can encode that into your policy. That's a CRD in Tetragon world. And then what happens is if anything else inside that pod tries to load the program under BPF program, we'll block it outright. An extra layer of protection from things trying to load BPF programs for you. And of course like I mentioned you could also use SHA's but that's a different talk. Requires a little bit of extra setup. So the next question would be what happens if you have a file system problem where you have since maps inside BPF are usually file descriptors or they're always file descriptors. Sometimes those file descriptors are pinned into a file system. And so you might say well Eve tries to access a file from Alice and this could be like a policy file it might have sensitive data in it and so on. Well your first layer of defense should be well don't let Eve actually access the file. Maintain your amounts. But if you can't do that for some other reason Tetragon can also monitor file access so you'll be able to see which BPF programs access which files. Those files are to maps so you can see Eve is accessing map B there and again you can enforce or you can just create an audit trail for it. So this is so that if this is the other discussion we're talking about Bob loading BPF and I'll just skip this one I think for now. So in summary what we've covered then is Alice is the good case Alice loads a program we get a nice audit log for it. Eve is the malicious program or at least erroneous program tries to load something gets blocked even when it's in the same pod because we built this policy. And then we have the file monitoring. So there is one case that I wanted to call out that we're working on inside the BPF foundation and there's kind of a working group around this if you're interested is what you can say well what if Alice or Bob in this case is allowed to load BPF programs but they load a different BPF program than they are supposed to. Meaning you launch Bob you checks you did some integrity check to make sure it has the right shot somewhere along its lifetime it decided to load some random instructions that are not the instructions you expect. Which it's allowed to do but these are perhaps instructions you didn't want it to load like a malicious program because maybe Bob has been compromised somehow. So there's a couple things you can do. One thing we've suggested is if you just keep a log of all these programs that are loaded copy the instructions out at least you'll know what programs are loaded and you can kind of post analysis and say this application is loading this BPF program it's not the one I expect it's a new one I've never seen this one before please fire an alert off to my alerting tools. There's a working group also going on about how to get the right how to put a SHA on that BPF program itself. And this is actually more difficult than it may sound on the surface because if you go back to when I talked about the core that application is not static it's being rewritten by the runtime. So if you were to just take a SHA of the program up front it wouldn't match what the SHA was after the program's been rewritten to load on your kernel. So what we're doing is trying to get the kernel to do a lot of these rewrites and have sort of a consistent SHA before and after the modifications. Find me afterwards if you wanna talk about it it's really quite interesting but it's a kind of a work in progress I don't think anyone has it fully deployed yet although some of the newer kernels can support most of the base functionality for that. And I think in summary, like you said, Alice is good to load, we get an audit, we can block Eve, we can watch the files and then we got a plan at least to audit the kind of erroneous program loading bad instructions and then in the future we'll have a full solution for signing the BPF programs. Yeah, so in the last 10 minutes, five 10 minutes I will just show you like a quick demo like how could we actually do it with Tetragon. So how we will do it, we will introduce a test environment, apply a security policy, which would actually observe BPF program loads and then map creations and then we will have like a simple use case just from like a test pod, it's called BPF droid and then the last one will be, we will actually see like what kind of BPF program selium loads and then what kind of BPF map selium case during its boot up. So how the test environment looks like, it's going to be a one node GK cluster, we are going to have Tetragon deployed on it as a demon set and then we will apply a security policy which would generate events during BPF program loads and then BPF map creation. So for example, if you would have like multiple node cluster, then you would deploy Tetragon on each node as well and then you would use a GRPC collector to observe actually all these events. So this is how actually the policy would look like. We are going to observe like three main kernel functions, BPF check. So this is when the verifier actually checks the program before loading it. We have security perf event alloc. So this is actually creating a map to transfer events between user space and the kernel and then the last one is security BPF map alloc. This is basically when we create a BPF map. So we have a simple pod and then what it's doing, it's actually loading a BPF programs yet a BPF prog and then via BPF tool and then it's going to sleep for 30 seconds and then it's going to create a map a Tetragon BPF map with hash type and then it's going to sleep for 30 seconds as well. And then this is actually how the events look like. So we extract it as JSON events and this is just a CLI like pretty printing it. And then basically in the first row we can see that the BPF tool proglot process started. We can see the Kubernetes namespace and then the pod. We can see Washington namespace and the Seattle BPF draw it pod. And then on the third row we can actually see like that our program was loaded. We can see the function name, the amazing function and then the instructions and then a couple of rows later we can actually see like when the map was created. So we can see like it's Tetragon BPF. We can see the key and values and then we can see also the hash map type. So for Celium it's a bit more complicated. So I will just like highlight certain events and steps that are executed during the boot up. So it's going to probe a bunch of features and maps that are available in the current version. And then it's going to check and remove some IP table rules. And then it's going to execute a script. Basically figuring out like which programs to load and then what network devices to load them on. And then basically it's going to compile those programs and then basically it's gonna load those programs for each pod on the node. So basically this is the screenshot that was captured by during the demo and then we will see it alive. So these are all the programs that are going to be loaded for each pod on the node. So we can actually see here for example, an example like a program named sand drop node. So this is basically when we send a notification whenever a packet was dropped. We can actually see how many instructions did it count. We can also see the call handling programs. We can also see policy handle. So this is related to how Selium handles actually that for policies and so on. So I will just switch to the terminal and then show it live if it works. So I'm just connected to the GK cluster and I should have Tetra going running on it. All right, here we go. So let me just start to observe the events related to the BPF trade, the test pod first and then after I will show the Selium use case as well. All right, so I will just apply the pod and then once it gets created we should see some events as well. All right, so basically we will see like the BPF2 prog load. We can see the program type here. Like BPF prog type CREPO. We can see the functions and the instructions and then we can see actually the sleep and then we can see also the Kubernetes information here like Washington namespace and the Seattle BPF droid pod source pod. And then it's going to stay for 30 seconds and then we are going to see actually the map creation. So it's BPF map log. Then we can actually see the BPF map, the hash map. It's a hash map and then we can see the type here and then we can actually see the name, the map name and then basic key and values and then mock centuries. So I will actually just stop this and then I will start to observe the events from Selium. I actually have Selium set up on a different namespace. It's the Selium namespace. So it's actually running on the Selium namespace. So I will just restart the agent and then we will see all the BPF program loads and map creations. So it's going to load or going to probe a bunch of features like what features are available on that kernel version and what BPF map types are available on that kernel version. So we will see a bunch of probes in the beginning. Okay, so we will see actually the termination. We will see the put up. So these are all the probes that Selium is actually like running to figure out like what features are available and what map types are available. We will see some IP table rules like figuring out like what IP table rules are loaded and then what should it delete. So I will just like highlight the ones which are related to program loads. So there is always like a recurring pattern here and then it always starts with like filter replace and then basically it ends when the command exits. So for example, these are all the BPF programs that are loaded for example for each pod and then we can see for example, the send drop notification here and then for example, all the telcal related ones and then we can also see for example, handle policy which is like how Selium handle policies and then basically it just does it for each pod on that specific node. So this is for example like a real world example like how could we create for example, audit list for each programs like from each pod from each binary and then basically keep track of like what has been loaded and should we expect this or not. So let me just switch back to these slides. All right, so as I wrap up like ABPF is on the rise and then we see more and more applications for networking, observability and security. It's basically cross platform and then what is important to keep track and that would it like who is watching ABPF like which programs and maps were created by who and when and then basically we saw an implementation how Tetragon could do it. So a bit more on like how to contribute. There we go. How to contribute, yeah. So join the Slack channel, come to the get up page and I'll slip it up there, sorry. Oh, we're always looking for more use cases. So if you go to the Tetragon repo there's the CRD's examples. They're sort of incomplete list of things that we care about but mostly it's things that I thought were interesting but either contribute or if you just file an issue and say like I have this use case. The nice thing about getting them in there is that we generally run those before we do a release. So if you're using Tetragon for something or you want to use Tetragon for something put your use case in there and that's a really good way to make sure that we know if we break it somehow. Even better would be to write a CI test case but hey, we do run those before we release so that's kind of your first gate. Yeah and any feedback let us know if you find a bug let us know of course we are always fixing things and adding new things. So if you have it a new use case let us know and documentation and all that good stuff would be great. So thanks a lot. Okay, thank you. Any questions, right? We have just a few minutes for questions if anybody's got anything? If not. Yeah. Yeah, go ahead. Yeah, okay. I'll just repeat the question. The question is does Tetragon use BPFL-SM for anything to preemptively block things? No, we do not use BPFL-SM. We do preemptively block things so we have a synchronous way to use to kill processes. So instead of blocking the call, we just kill the process if it's violated in the policy. The action would not have gone through because we do the send happens in line in the kernel. So we kill the process from the BPF program which will then cause all of the cores on the application to see it as a terminating signal that it cannot block because it's a SIG kill. So your application cannot catch and block a SIG kill from the kernel. SIG kill from the BPF probe. From a BPF probe, yeah. So there's a BPF helper to do it, yeah. Yeah, and so that does it. There is some slight caution. You can't do that from anywhere in the kernel but when we deploy this we ensure that we do it from the right places in the kernel. So it requires some care when using but is doable. Anything else? Maybe I just add, that's where that, when you looked at the BPF, when we stop block the BPF program, that's how we did that. Anything else? Cool. All right, well, thanks for coming. Yeah, thank you for coming. Thank you.