 Hello everyone, welcome to our next talk in Security Dev Room on FOSDEM, and our next speaker is Florent Revest, and his speech about kernel and time and security instrumentation. Let's welcome Florent. Can you hear me? Great. Thanks for uploading. Maybe the talk is bad, so wait until the end, and then you can upload. So yeah, my name is Florent Revest. I am a software engineer. I work for Google, which is a company you might have heard of. I work in a security team on a project called Kernel Runtime Security Instrumentation. But as you know, engineers like to use acronyms to sound more intelligent than they actually are, so we like to call it KRSI, and for even more acronyms we even say BPF plus LSM equals KRSI, but by the end of this talk we will understand what that means, hopefully. Motivation. How did we come up with this thing? So as I said, I work for Google, and we have a huge fleet of Linux machines. I want to stress that those are the corporate computers that the engineers use, the laptops, the workstations, and not for consumer products. We happen to care about the security of those machines, and we have teams dedicated to working on, to monitoring the security of those machines, and we basically do two things. We monitor what happens on those machines, and the second thing is we want to enforce policies when we detect something wrong happening on the fleet. Then we want to easily deploy a rule to all the machines. That makes it so that that prevents this action to happen again. And since the fleet is so big, it's around 200,000 machines, there are some strong scalability requirements. You can't require every engineer to resolve the machine, and so on. So on one side we want to gather signals. Those are pieces of information that something fishy is going on a machine. I just wrote some examples of signals here. An example could be a process tries to delete its own executable. Doesn't mean it's bad, but it's usually a bad sign, so you want to catch that. You want some detection rules that raise a flag. Other examples, you have a kernel module, it gets loaded, and then it removes itself from the modules list so that it doesn't appear in LSMode anymore. It doesn't mean it's bad, but it's fishy. Yeah, if you have suspicious environment variables, if someone is doing something really very deep reload, you want to know about it, and so on and so forth. On the other side you have mitigations. So once you detected a behavior that you want to prevent, you have lots of different ways to prevent it. For example, you could prevent known vulnerable binaries from running with black lists. That's just an example. You could also have a white list of kernel modules, whatever. You could change signals, and then, based on that, write macro policy. The way you have to do that currently in the Linux kernel is you need to go through lots of security subsystems. On one side you have the signaling part with subsystems such as audit and perf, where the kernel lets you know about events happening in your machine. And on the other side you have mitigation subsystems like a Linux app. Apart more, we'll come back to them later. Or you heard about SecComp and so on. The problem is that the place where you get the data from in the auditing subsystem is not the same as the place where you enforce the policy. And also the language is not the same. So when you get data from audits, you get them from a certain place in a certain format. And when you want to prevent the action from a cynics, you need to do that in a different place with a different file format. For example, if you want to add a detection policy for an environment variable, you will need to edit audit from the kernel space, also the user space program of audit. And then, once you detect something happening on your fleet and you want to deploy your mitigation, you need to write a policy in another language, for example, for SELinux, et cetera. So what we wanted to do is to bridge those two worlds, bridge the signaling world and the mitigation world. And that's how we came up with Clarence sign. So there are two things I want to talk about. The first thing is LSMs. LSM stands for Linux Security Module. And it's a kernel subsystem that is the basis of SELinux and Aparmo. So when you use SELinux, the way it's implemented, every time there is an important security behavior happening in the kernel, there is a security hook, an LSM hook that will be called. And all those LSMs will have a say on whether the action is allowed or denied. So let's say there is an execution event. All LSMs will be notified of it and then they can allow the denied operation. I want to stress that LSMs work on a different level than SysCalls. We heard a bunch of things, several talks about SysCalls today with Falco, for instance. LSM work at, I would say, a higher level of abstraction where you, for example, you work on the execution event, not on the exactly SysCalls. We used to work with SysCalls before and, for example, we missed the exact VAT SysCalls, which was unfortunate. Yeah, so those LSM hooks are implemented in each LSM and the return value of the function specify whether the operation is allowed or denied and with that you can implement MAC, mandatory access control. Now for something completely different. I want to talk about DPF. So it's the third talk today that talks about DPF. I'm sorry. But I will try to quickly introduce DPF for those of you who don't know about it. Essentially at its core what DPF is, it's a bytecode that can be jitted inside the kernel, executable pages, and what happens is you, from the user space, you can write programs in C. You can also write assembly. That's your thing, but usually what you would write them in C. You can compile them with a LVM, for instance. Then you get an object file and this object file can be loaded in the kernel and attached to hooks. The nice thing about DPF is that when you load a program to the kernel, the kernel does static analysis on your bytecode. So, for example, the kernel can verify that you only have read-only access to memory. Or it can also verify the number of instructions in your DPF program to make sure it terminates. There are some restrictions that make sure DPF programs terminate. And one last thing I would like to say about DPF is you can exchange data with user space. So there are several ways to do that. One of the ways to use the perf ring buffer, it's just a ring buffer that you can use to output big buffer. So if you have, for example, if you have org V pages that you want to send to the user space, you will typically send them on the perf ring buffer. And then you also have simpler mechanisms, like maps, which are better for sort of structures that you want to share with a user space program. And now maps are even encapsulated with... as global variables. So from the BPF program, you can write into a global variable. And then the user space can read it. I will show you an example later. So what KLSI is, is the combination of LSM and BPF. KLSI is a new LSM, similar to SELINIX and APARMO. But the policy is implemented as EBPF programs so that the user can create more flexible policies in C. And the nice thing about it is you can also do auditing in the exact same place where you write your own security policy, all of that in C. We heavily push this upstream. So we are now at the patch three on the LISC kernel mailing list. And we are quite optimistic about the future of the patch set. The reason I'm here today is because we are really interested in finding new user for it. As I say, we use it internally for our corporate fleet but it can be used in lots of different contexts. For example, at another conference, we heard of an automotive company and they were interested in limiting... restricting access to the canvas with the BPF programs. So I will walk you through a very simple dummy example just to give you an idea of what it takes to write KLSI policy and what you can do with it. So let's start with something simple. There's code. Please concentrate. The first thing you want to know is what do you want to monitor? So let's say you are interested in mProtect events in the kernel. So you go to the LSM framework and you find the LSM hook that corresponds to mProtect operations. There is one called FileMProtect. Then you open your text editor. You create a C file. You start writing your BPF policy. You use some macros that define the section of the ELF file in the eBPF object file. The section will tell the kernel where to attach the program. And then your eBPF program gets the same parameter as the LSM hook in the kernel. So the signature of the LSM hook is exposed via eBPF. So in this case, you get a pointer to a VM re-astruct, whatever that is, and two run sign logs. And then you just have to return a value. For now, we return zero. So what can you do once you are inside the BPF program? One of the nice things you can do is use BPF helpers. Since eBPF is so restricted by the verifier, there are operations that you cannot do. There are things that are not possible within BPF. So for that, there are functions that are implemented in the kernel and exposed to eBPF program. And eBPF programs can just call them to something interesting. So for example, if you want to get the current PID, there is a BPF helper called dpf getCurrentPID tgid. And this one returns you the PID and tgid. What I want you to remember here is there is a nice API that you can use to access interesting information in the kernel. But if you want to do more in-depth introspection in the kernel and you really want to access fields that are relevant to you, helpers don't scale everywhere. Every time you want to add a helper to the kernel, you need to create a new function. It has an opcode and it takes time and effort. So there is a new feature in the BPF next tree called btf. Sorry about the acronyms again. And what BPF allows you to do essentially is to access structure fields by their name instead of their offset. So that if you migrate your eBPF program to a newer kernel where the structure layout changed, if it's a different architecture or whatever, you haven't outcoded the offset of the field in the structure, you access it by name, and then once you load the eBPF program into the kernel using, for instance, eBPF, the memory access gets relocated based on btf debug information. It's similar to the wealth. You have access to the structure, fields, and padding. And the way it works for the eBPF program writer is first of all you have to define the part of the structure that you are interested in. So as if you were actually defining the VM-ary abstract that exists in the kernel, but you only use the fields that interest you. So in that case, VM start. And then from the BPF program, you can just access the struct as an extract. Another thing I want to talk about is how you exchange data with user space. So I say there were different mechanisms, and we show the simplest example. Again, I think it's still in BPF next and maybe not in the main line tree right now, but you will learn there soon. You can define double variables that are actually shared between the eBPF program and the user space. So in this case, let's say we have an mprotect count, and every time we go through this function, we just increment the function. And then from the user space, when you load the program, you can look up the symbol of that global variable, and you can read the mprotect count value anytime you want from the user space. I believe that the way it's actually implemented is via BPF maps, but I'm not really familiar with implementation. And then the last thing that is important is how you do Mac. You just change the return value. So it's a really dummy example. It's quite stupid, but let's say that you want to deny mprotect after the 100 mprotect call. Then you can write a simple condition like this, and you can deny the operation or allow it. So thank you very much. I am very eager to learn what you will be interested in building with KSI. So any questions? So what parts of those functionalities are already in the mainline, because you mentioned the patch v3, but then you mentioned something about BPF next. So what parts are already accepted and which are under discussion? Okay, so there are things that are part of BPF features. When I talked about global variables that you can access, this is part of BPF. We don't upstream those. We only work on the LSM itself. What I wanted to show you with this talk is how you would use our LSM. What we upstream is the LSM itself, and it's the patch v3 that is sent to the main list. Okay, but the feature you mentioned for accessing the structure... The structure of the BPF. This is in BPF next, and it's written by the BPF maintainers. We don't develop this ourselves. It's BPF feature. Okay, but the kernel part of that, I guess, must exist, and it is already there? It is in BPF next, which branch that we learned in Mainline soon. And another question is... Well, that seems quite a similar approach to what K-Probes could do. Were you trying to use that, or it was... I mean, you didn't even want to use them because, well, they are... We did look into K-Probes, but we thought that the LSM hooks really map very well to what we are interested in. And also, if you try to hook into LSM hooks with K-Probes, then you wouldn't... So there are things I haven't talked about here, but LSM are a bit more than hooks. They are hooks and also security blobs. The LSM itself can store data within the important data structure. So if you want to store something in TaskStruct, LSM has an infrastructure for that. And if you use K-Probes, you can't do those things. That's nicely, certainly. And also, the way LSM are called inside the LSM hook cannot be replicated with K-Probes. So in the end, we landed on that design where LSM was a better decision, we thought. So some other questions? Do you provide example policies of what you managed to do with this new LSM? Yeah, there is an example that we send to the patch set where we... So it's also an unprotected example, but it actually does something interesting. I just made the shortest code that I could... Yeah, I guess that was the short version, but I'm more interested in knowing if you have more policies that could show... But it's still quite early in the upstreaming process, so we will publish example policies after it gets upstreamed and everything is finalized. Okay, and my other question is, is it stackable with AC Linux or APARMO? You can still use them if you use this LSM. I think so, yeah, it really can, yeah. All right, thank you. I am sure you can. So any other questions? Someone else? So if no, I guess we can finish the talk and if you've got some other questions, you can talk to rest after the talk. Thank you.