 I'm KP Singh, and I work for Google Security in Zurich. And today, Leonardo and I are going to talk about how LSM and BPF change everything. We're going to talk about why did we go about writing a BPF LSM, what can we do using it, and then Leonardo is going to show an awesome demo showing the BPF LSM in action. So why did we go about writing a BPF LSM? I work for a group called Detection and Response. We try to keep an eye on Google's internal systems and detect when unusual things happen from a security perspective. Google has loads of Linux systems all the way from servers that run key services down to the very laptop that I'm using right now. These endpoints are constantly piping security relevant telemetry data to our software pipeline, which is doing some clever stuff on top of this data. This information is then used by security analysts who use this data for responding to incidents. My team builds the agent that runs on these internal Google owned machines at the very beginning of the pipeline. So what is the background? What was going wrong? Why did we make the BPF LSM? So the traditional ways of doing this would be using AuditD. And AuditD was not very flexible. I remember when we wanted to audit environment variables and the amount of work that needed to be done on Audit was quite high. We needed to pass the kernel and the audit user space. We tried using kernel modules, and the kernel modules are a pain to maintain. Even with regular tracing and BPF programs, the number of places where one can attach to is so high that one doesn't really know what's the best place to observe a particular behavior. And this will be very relevant once Leo shows his demo. And of course, policy enforcement was out of the question using any of these systems. Security can be broadly classified into two approaches, which work hand in hand. Monitoring what is happening on a system and enforcing, which is taking action based on the monitored data. Prior to BPF, one could monitor using Linux audit or use Perf or other performance API or trace points, very custom kernel modules or use trace probes. And the enforcement was completely separate from all these systems. One could enforce or do Mac policy enforcement using security modules, like app armor, SELinux, or one could do some sandboxing using SecComp, which is based on the Cisco layer. The BPF LSM, which we also call KRSI sometimes, sits somewhere comfortably in the middle and provides a unified and flexible way to do both enforcement and monitoring. Now, there are about 200 LSM hooks. These LSM hooks are a layer of abstraction higher than the system called API. For example, one can execute a process using both the exec VE and exec VE at system call, but they both call the same set of LSM hooks. The LSM hooks correspond to changes in the kernel objects and are placed strategically in the lifecycle of the object and the ongoing operation. This strategic placement ensures that an attacker cannot use attacks like time of check to time of use to trick the security logic into believing something that it's not actually doing. United is going to show the importance of this in his demo. The BPF LSM can happily coexist with the other LSMs like SELinux and app armor and does not do anything by default. One can load an LSM EBPF program to implement their own custom logic for the LSM hook. In this example, the BPF LSM for the process execution gets invoked after all the other LSMs. And of course this order can be changed in the config LSM config parameter or the kernel command line. The LSM hook can then write audit data back to the user space using ring buffers and maps and can also make policy decisions as we talked about. LSM programs are compiled once and run everywhere. They have verified pointer accesses which is made possible by the BPF information present or compiled with the kernel. We also added local storage or security blobs for inodes, task structs and sockets. These blobs live and die with the kernel object, making the memory management much easier and can be used to tag security metadata. These blobs are especially useful in conjunction with the Atomics API we added to BPF and especially for generating custom unique identifiers. For example, namespace IDs or container IDs. Think of implementing a custom logic for container IDs and sending a pass to the kernel. People have tried that, tried to convince people for multiple years on what a container is and has not succeeded. LSM programs can also sleep. This is especially useful if you want to get them to read data from user space. For example, command line arguments. The kernel needs to page in memory which requires the program to be put to sleep for the memory management unit to page in memory. We also added helpers to get the IMA hashes of files which can be useful for binary fingerprinting. And of course, we have all the other BPF goodness like the BPF ring buffer and maps which help maintain state and send data back to the user space. So what's happening next? My colleague Flora is working on DNS auditing which has some interesting challenges and we would like to do a presentation about it at some point. We also want to add helpers for Mac policy enforcement and really make BPF LSM powerful enough to be a major LSM or be able to implement a major LSM in the kernel. We also want to provide more places that BPF LSM programs can attach to. Especially useful for auditing events like Mmap or kernel module loads. And we also want to make the BPF LSM users to use it more easily by providing an API like BPF phrase or something that is more abstract from the kernel internals. Right now you still need to understand that BPRM check security means a process is being executed, which is not ideal. I'm now going to hand over to Leonardo who's going to show an awesome demo of the BPF LSM in action. Hello everybody, I'm Leo. I guess it's my turn now. First of all, I want to thank KP for all the wonderful explanations and for introducing me. And now to explain you why I believe LSM BPF will change everything and for the better, let me first tell you a story. Some months ago, while I was still working on a daily basis on Falco, a security researcher contacted me presenting a very cool bypassing technique. As some of you may already know, Falco, just like many other internals security tools out there, does its job thanks to a BPF, specifically in its case, thanks to the C-School 3-Spawns. The Falco BPF probe basically attaches to the kernel C-School 3-Spawns with little BPF programs that are responsible for any C-School being called on your machine to extract its argument plus other data and put all of them in a ring buffer. Then, Falco rules a set conditions against those data. For example, the default Falco rules that contains rules to detect cryptomining activities on your host by looking at the outgoing connections happening on your machine. Such rules simply trace sent to send message or the connect C-Schools and the set conditions on their arguments, alerting if the destination IP, for example, or the destination domain match those of known mining pools out there. Just to give you a sense of what I'm talking about, let's take a look at those rules. Let's take a look at this net-miner pool macro used by the role detect outbound connections to common-miner pool parts, as you can see here. This macro basically asserts a condition on the send to and send message C-Schools and on these other macros, which are defined above here. If we inspect them, we can see that the miner pool other macro checks for destination part and domain name to be in lists of well-known mining pools out there. For example, these miner domains list contains things like this. Okay, so in this case, Falco, when detects specific C-Schools with specific arguments containing these values, alerts the user for outbound connection to mining pools. Also, it's worth taking quickly a look at these pretty generic and pretty useful outbound macro that a lot of Falco rules use. As you can see, basically it checks for the events of the C-School being executed on your machine to be either a connect or a send to or a send message and for the connection being in progress and not towards private networks, okay, or towards local addresses and to be EPV4 or EPV6. All right, so far, so good. The problem is that the code that software acts, thank you, my friend, presented to me, was able to reliable bypass those Falco rules by exploiting a talktow in the kernel, 100% of the time. I was like, oh my, how is that even possible? Well, such code was exploiting every window between the moment the BPF 3-point read the argument, for example, to this nation AP, and the moment the kernel actually executes the C-School with its arguments. The core issue is that C-School arguments are at two different points in time. The delta between those two moments represent the time window into which the attack replaces the malicious argument with a good-looking one that the BPF 3-point will eventually read. It's important to note, anyways, that the beauty of this bypass was that it worked again any tool using BPF C-School 3-point has a mean of detecting malicious activities and threats, not only against Falco, which I'm using because having worked a lot on it, I'm very familiar with it. Let me say this aloud before people excuse me. I will not explain the bypassing technique here in detail because it's not the goal of this talk, but I strongly recommend you to go watch C-A-R-X talk at DEFCON for more details on it. It's worth it, trust me. So, long story short, this attack connects to IP 1.1.1.1.1, which we are considering malicious just for the sake of this demo, while the Falco BPF probe detects another IP, and for this reason, it doesn't trigger any orders at all. But let's see it in action. I feel pragmatic right now. First thing, we need a Falco rule that detects connections to 1.1.1.1. I've already prepped one. Let's take a look at it. Okay, as you can see in this little YAML 5, I basically coped the outbound micro I showed you before, and I wrote a Falco rule called unexpected outbound connection destination that basically used this micro together with the check on the destination I pre to not be 1.1.1.1.1. So, if this is an outbound connection and the destination IP is 1.1.1.1, Falco should alert. Okay, let's check it. Falco starts. Now, simply let's call 1.1.1.1, and it works. Look, as you can see, Falco detects outbound connection to the target IP in normal situations. But what if I run the bypass I was mentioning before? Notice that for this bypassed work, and we have to be vulnerable, it needs to have a privileged user fault of the activated. Let's check its CCTV knob. It's one. Okay, in this case, we are vulnerable. But first, let me run TCP dump because why not? And now let me run the attack. As you can see, the attack succeeded connecting to the 1.1.1.1.1.1 IP. This is its integer representation, and TCP dump catches it while Falco doesn't. A, A, B, P, F, C, S call trace point. Okay, in Falco 0.991, I implemented an initial workaround binding support for tracing the user fault C school, which is the attack driver that XO phase code is using to swap the C school arguments under the hood. In fact, if we run Falco 0.29.1, we do the fault rule set that contains a rule for it, and the attack, we can see that Falco means a critical act about user fault to FD getting called. But aside from the fact that there are also other means that I will not present here, you know, to meet the same preconditions that this bypass requires, as it is at the moment, my patch just makes Falco scream about user fault FD C school getting called, but does not provide any detail of deep visibility on what is really happening after it. Furthermore, with little modification, this attack called work also for other C schools like renamed, unlink, create that, send to, you name it. So what would be an actual and truly meaningful fix? Or more, there's even a way to avoid this swap and at all, this is the question. In my opinion, a certain condition against the actual arguments getting used by the kernel would be the best way to go for detection tools, but first of all, how to grab them. And to do so, we need to first understand a little bit better the C schools flow in the Linux kernel. Let's take a look at this whole image of flow of the open C school that I have on my laptop. The flow for other C schools in the end is very similar and and Sophie also provided a bypass for the open. So when a processing user space calls the open C school on a file path, the C school is dispatched and the path strings used to obtain a kernel file object and a new object. If the parameters are incorrect, error is returned as usual, but then the normal DC discrete assess control file permission checks are checked. And if they are satisfied, the LSM framework enters the game. It tax for each of the file LSM Oaks for holding enabled LSMs. Also the VPF ones, if there are. And finally, if all those security checks pass, the file is opened for the user space process and the new file descriptor is returned to it. It's important to note how deep in the C school flow the LSM hooks are, which is not the case for the VPF C school to respond. Not convinced yet? When I take a look at the kernel code for the connect C school and just one of the various LSM network hooks. Okay, you went, don't blame me. Well, taking a look at the kernel, it's always the right thing to do. So here in the LSM Oaks depth either, you can find all the hooks the LSM framework provides. For example, this one at line 208 is the one we are interested in or blocking Sophie to connect it to 1.1.1.1.1, isn't it? In fact, if we go take a look at the definition of the connect C school in the socket.c file here, C school define, C is connect, C is connect, move address to kernel, C is connect file if the arguments moved successfully to the kernel and C is connect file calls this function security socket connect. If we don't expect what this function is and does we suddenly realize it is the function that calls the socket connect LSM hook. So is this a ringing bell for you? As we have seen, LSM hooks are very low in the C school's flow and happens after arguments have been moved to the kernel. So to prevent our friend Sophie or any problem using this fantastic technique to connect to 1.1.1.1 to provide a real security delay we may want to use this LSM hook and attach a BPF program to it. And it's easier than you think but just be aware that this feature is new and it requires a kernel greater than 5.7. Other move ensured your kernel is configured with the options config BPF LSM and config the bug info BTF. Also you need to have LSM BPF hood activated. To check it, take a look at the LSM kernel boot parameters of the config and the score LSM configuration. So basically like this. Okay, we can write some LSM BPF. Time to get our hands dirty with some LSM BPF. Finally, please make restrict connect this little BPF program that I've written to attach to the socket connect LSM hook. As you can see, it's simpler than you think with the sec helper macro and marking the following function to handle the LSM socket connect LSM hook. Then by using BPF program macro I'm gaining the benefit of not having to mess around with the registers and I can simply declare the signature of the function to contain the same parameter as the socket connect hook that we've inspected before in the kernel. Notice also that I'm appending a red parameter that I check in the function. In fact, since various programs can be attached to this LSM hook, I have to respect the cannot override a denial rule meaning that if this function is not the first one attached to the LSM socket connect to and the previous one returned a value different than zero then I have to hex it from it and respect the decision of the previous program attached to the LSM hook. The rest is pretty straight forward. I know non-EPB for protocols because this is just the example I obtained destination IP and I check it against a value the value that I want to block a value that I've articulated here for the sake of simplicity but here we could also look up internet BPF map. Please notice this is the same number that the attack connect outputs when it succeeds. So it's the IP representation of 1.1.1.1.1. Let's run it now. A lot of relocations that's attached to the trace pipe. Let's first simply test it by pushing this command called connect, whoa, look, okay. But let's be serious and try it against the bypass. Error connecting, operation not permitted, LSM blocking. LSM blocking, oops, operation not permitted. Connection to 1.1.1.1 block. Sorry, my friend, this is good, isn't it? In the repository I'm going to share the end of the work you'll also find the C code to attach restrict connect thanks to the BPF to the lesson book actually make file to compile everything and stuff. But now that we are deep into this house on rabbit toe, let's keep the focus. What if we do not want to block connection to 1.1.1.1.1 but we want to receive the correct Cisco arguments so we want to have deep visibility about this nation IP address. We've seen BPF Cisco respond and not that reliable in our use case. So let's make a step forward and let's try to answer this question. First idea that I come up with is let's attach a kernel probe, so a key probe to the security socket connect function that calls the socket connect LSM hook that we've seen before in the kernel. Do you remember? Let's say Cisco arguments at this stage should have been already moved to the kernel so we should be able to see 1.1.1.1 and not another IP. But we need to try. I need to try. I was not sure. I wrote this other little BPF program called key probe connect. Yeah, the name is misleading, but who cares? Let's take a look at it. Also in this case I use the sec helper macro. Please notice that instead of the BPF PROG helper I'm using the BPF key probe one year. It serves the same purpose as BPF PROG but this is specifically for key probes. Thanks to it, I can declare a signature like this and easily having the arguments of the security socket connect function without getting crazy. In a way very similar to the previous BPF LSM program this key probe BPF program checks for the protocol. Okay, I'm only caring about EPV4 at the moment. And then in this case I restrict the outputs in the three-spot of this program depending on the program causing the connection. TAC connect, okay? The rest is simple. I cast the address to an EPV4 basically and I bring it up. Let's run it. The TAC ready, set, go. All right, fine, correct destination. Hey, please wait a minute. Let's not get carried away so easy. Are we really sure this approach with key probes works 100% of time? Are we sure it's reliable? Or it gets easily fooled by the TAC connect by just like the BPF SysCulture response? I'm short on time and I'll probably write some tests to assess the reliability of various approach soon. In the meantime, take a note of the repo. It's orally this one. You can find it on my GitHub. Here you'll find also other BPF programs to play with and understand how much secure and connect info they give to you. Now, let me also tell you why I wanted to show you various experiments and do some considerations. The main reasoning was to prove why a lesson BPF are here to stay and the other way to go. They were shaping, improving it, the way that we currently do security buff on prevention and on detection side. But they are a fairly new feature that is not available in all the kernel. So I thought that showing you a key probe approach given they exist on older kernels too was worth a try. Even though, pause a second to consider that attaching a key probe to the function calling an LSM mode makes it some more dependent that this new feature that's not available in older kernel. So maybe it's better to experiment with the key probes on something else like TCP before connect function that this is another exercise for the audience at all. Furthermore, what does it take to use LSM BPF programs for auditing? It's just a matter of changing a bit the restrict connect program we've seen before by making it always returning zero so not blocking connections to specific APs and taking note of the detected destination AP maybe putting them in an BPF ring buffer or in a BPF map. I prepped this audit connect example that basically implements what I've just said. Let's check it out. As you can see, I'm always returning zero. I'm just clamping the program depending on the binary making the connections and I'm outputting the destination AP in the trace pipe. As you may notice, it's very similar to the restrict connect previously shown BPF program and it's an LSM one. It's time to test it. Let me run TCP dump to see where packets go. Attack time. Okay, the bypass is not able to use the race window and it's connected to 13.107, et cetera and the LSM it's catching it correctly. Let's try it again. Oh, the attack it's connected to the malicious AP as you may see also from TCP dump and from its output and the LSM it's getting the actual destination AP the malicious one. Let's try again. Not able to use the race window. Not able to use the race window. It's working. Easy peasy lemon squeezy, isn't it? So not only using LSM BPF for auditing is possible it's also in my opinion the recommended and future proof way to go. Another work would be to investigate whether BPF product type tracing BPF programs and your kind of tracing program are vulnerable to this family of talk to attack or investigate the network trace points in the repository of both an example for you all to try out. Now, go right and try them and please bring me back with the results of your experimentations. I'm so curious. My treats are always super recap time. Today we've learned various things. First, that by providing Linux with a standard API for policy enforcement LSM ensures to enable the widespread deployment of security added systems. The protections provided by LSMs do help protect your system from being hacked when an attacker uses flows in one of the running programs. LSMs are definitely an important layer in any serious defense in depth strategy on Linux machines. We've also learned that BPF Cisco trace points are not that reliable as we may have hoped. And also that LSM hooks happens at a very, very deep level in the Cisco's flow and they can be used for both prevention and auditing with little to zero effort. So I'd say it's all for today. Thank you for joining us. Ciao.