 Hwyl iawn, ddweud! Mae'n gwneud i chi'n gweithio lluniau cyntaf yma yn dweud a'i bod gennych, yna'n fathio ddim ymlaen. Felly ddim, mae'r Llyfr Gwyddo Llyfr Gwyddo, arfergwch oesio ymddangos ymddangos o'r llyffan i Gwyddo Llyfr Sfwrth, rydyn ni'n bwysig o'r lluniau gyda'r C&TF's Llyfr Gwyddo Llyfr Gwyddo Llyfr Gwyddo, gallwn yma'r gweithio eich lluniau cwyddo i gael cwyrddau I've just recently written a new report about EBPF. Hopefully we will be giving away copies later this week. Apparently they have been released from customs and are on their way here now, I hope. So if you want to hear more about, or learn more about what EBPF is, I hope that book will be as useful to you as I know some of you have said the container security book was. So today I want to talk about how we can use EBPF for run time security. And how many of you were here for the loxy talk just a couple of talks ago? OK, so some of what was discussed there will dive a little bit in more detail in this talk. I'm going to show you some different uses of EBPF and a new tool that we've added to the psyllium family. A lot of the discussions I think at security calm, there's been lots of things about supply chain and dependency management, things that happen before you deploy your workloads. What I'm talking about now is the real time detection and prevention of malicious activity while your workloads running. So we need a way of detecting when something that looks suspicious is happening in real time. At a minimum we need to be able to report that, perhaps that's going to generate alerts or it's going to go into a SIM or some kind of notification that something suspicious or malicious has happened. But even better if we can actually spot when something suspicious is about to happen and we can prevent it. The kind of activity we're talking about I think falls largely into four main categories. We want to make sure that if there's network traffic to a suspicious destination or from a suspicious address that we spot that. Access to files, we want to make sure that workloads are only accessing files that we expect them to access. We only expect our workloads to run a certain set of executables. I often use the example that if my engine X pod suddenly starts running a cryptocurrency minor executable that's probably not what it was supposed to be doing. So we can particularly in a microservices environment, in a cloud native environment, we can often reason about what executables we expect to be running and what file activity and network activity we expect. And also we're interested when an executable appears to have or a process appears to have more privileges or appears to be trying to gain more privileges than we expected. And the way that an exploit of vulnerability kind of takes hold is by escalating privileges. Now all of these different types of activity require support from the kernel. So just to make sure that we're completely on the same page about user space and kernel. We write application code in user space. Whenever we want to do anything interesting, really, certainly anything that involves any hardware, we have to ask the kernel to do that on our behalf. And we do that through the system call interface. So those four categories of things that I was talking about, the file access, the network access, even memory access and processes and privileges, they're all accessed via system calls. Whenever we're doing any of those activities in an application, the kernel will be involved. So if we want to spot those potentially suspicious activities, if we can look at how the kernel is behaving, then that could be a mechanism we can use to spot potentially malicious activity. And we've had several mechanisms for doing this over the years. And many of these are available in open source or proprietary tools today. I'm going to just run through these pretty quickly. But LD preload, is anybody here familiar with LD preload? I'm guessing so. It's quite often used for kind of red team work, I think. Ptrace, I'm sure a lot of you have used Ptrace. Setcomp, I'm sure you've all come across. And then finally we will turn to some things we can do with EBPF. So LD preload assumes that your application is using the standard C library to make those system calls. So whenever you're going to do those operations that require support from the kernel, you'll actually do that by making a call via the standard C library, which gives you an abstraction for those system calls. And with LD preload, because that library is typically dynamically linked, we replace the standard one with an alternative one, which might just be a thin shim layer that does some detection and then passes through to the normal standard C library. And that's all well and good except for if you're using statically linked applications. For example, if you're using Go applications, they don't use the standard C library. They have their own statically linked interface to SysCalls. So LD preload is a perfectly valid way of trying to intercept what the user space application is asking your kernel to do, but it can be pretty easily bypassed, especially these days with lots of Go applications. So then we get to the category of tools that hook into the top layer, if you like, of a SysCall being processed within the kernel. And Ptrace, SetComp, and EBPF-Kprobe, what's called a Kprobe, that hook into this sort of initial processing of a system call. And they all have the same problem. It's actually been well understood in the kernel community for a very long time, and it's turned out to be, you know, it's the kernel working as designed. If you look at the system call parameters when they're very first passed into the kernel, the kernel hasn't had a chance to inspect those yet. Oh, I've just remembered I was supposed to be doing a demo before I go into this thing. So let's just do detecting a system call with EBPF. So this is a very, very simple, the simplest of EBPF programs. This is going to run in the kernel. And I'm going to attach this to the system call for changing the permissions mode of a file. I'm using a Kprobe. A Kprobe is a kernel probe. It's attaching to the entry point of this function. And I also have some user space code that loads this function that's not really that important for this demo. So let me just make this tool. Let's make this little tool. OK. This tracing that it's going to generate is going to go into a kernel trace file that I'm going to just start piping the output from. And I'm going to run my little... Oops. I have to be root to do this. Hopefully I can type. OK. So that's running. If I look at the BPF programs that have been loaded into this virtual machine, we can see one here. It's a Kprobe called Hello. That's my application successfully loaded into the kernel. And now if I try to change the mode of a file, we'll see some trace generated. Very, very simple trigger to detect that that system call has been triggered. So that EBPF program was triggered at the entry point to that system call. And it suffers from a well-known time of check to time of use issue that affects all of these tools that hook into that first point where you start processing system calls. This was very well explained by Leo Di Donato and KP Singh at Cloud Native EBPF Day last year. And the exploits that... A demonstration of exploiting this was shown at DEFCON 29 last year. It's called Phantom Attacks. And it can affect all of these tools that inspect parameters before they've been copied into kernel data structures. What we need to do to avoid being vulnerable to this time of check, time of use issue is we need to look at the parameters to assist call after they've been copied into kernel memory so that there's no possibility that the parameter has been changed before it gets copied into the kernel. And there is a well-known way to do this. It's called Linux security modules. So the LSM API is a stable interface within the kernel that gives dozens of function call hooks where relevant information has already been copied into kernel data structures. Essentially all of these function calls on this API are the kernel saying here you are, here's a data structure perhaps that describes a file or describes a socket. Here it is. This is what I'm about to act on. Do you have an opinion on that? Is this malicious in some way? Are you okay with me to go ahead and do this operation? So Linux security module function calls are a safe place that don't suffer from this time of check or time of use issue. With BPF LSM that you heard referred to in the LOXE talk, we don't have to use a kernel module. We can use EBPF programs hooked onto those LSM function calls. So we're using that stable interface. We've got these kernel data structures that we can inspect. We can run dynamic programs. We can load them into the kernel as we choose to do our checks on whether or not we think that activity looks suspicious in some way. The other thing that's very cool if we're doing this with EBPF is it protects any process. It doesn't matter whether the process was already running. When you load the EBPF program, it has visibility into everything that's running on that virtual machine. So we can protect against malicious behaviour in pre-existing processes. So let's have a look at an EBPF program. Again, a very simple EBPF program that uses an LSM hook. I've actually got it in here already commented out. I'll just uncomment that. And it's extremely similar to what I showed you before with the K-Pro EBPF program. Again, all I'm doing here is I'm just going to trace out some message. But in this particular example for path to mod, which is a function on that LSM interface, I get this path structure. Let's just compare that to the system call before where we just had a void pointer to a context structure. Here we have a data structure that contains the kernel's pre-populated information about the file that I want to change the permissions on. It makes it very easy for me to find the name of that file. So let me just build this again. And this time I've got... I didn't save it, did I? Let me just remove that. I'm not confident whether that actually did the right thing, so I'm just going to rebuild it. So let's go back the tracing. Both my EBPF programs will be loaded this time. So let's run the application. Let's check the programs that have been loaded. This time we've got the K-probe called hello, and we've got an LSM hook called path to mod. So this time, if I change the mode, we get two traces, one for each entry point. And we can see the name of the file that I was operating on. So that interface is super useful. It's stable. It doesn't suffer from any kind of known vulnerabilities. But it needs a modern kernel. It was only introduced into 5.7. So unless you're running with that kernel or newer, you don't really have the option to use EBPF against that LSM interface. Do we have an alternative? Well, yeah, we do. Just because that is a declared staple interface doesn't mean to say there aren't plenty of other function calls in the kernel that haven't changed for quite a long time. With EBPF, we can hook into anywhere in the kernel. So what if we were to pick other stable, not by sort of declared this is never going to change, but stable in the sense that they haven't changed for a long time, there's no expected reason why these functions would change, and hook EBPF programs into those. And that's what we're doing with a new project that we've added into the psyllium family called Tetragon. Tetragon is actually the open sourcing of something that, as ISOvalent, we've been including as part of our commercial offering for quite a while. So we're open sourcing something that has been used in production in some pretty large-scale deployments. So we know that this is useful and that it really does help our customers or the commercial version has helped our customers detect malicious activity. But now we're making it into an open source project so that we can all use it. So it's using this knowledge of what functions in the kernel are safety use and de facto stable enough that we don't anticipate them changing, and if they do change in the future, well, we'll deal with that, but there's no expectation that they will. And we have the ability to coordinate many different EBPF programs to provide security tooling. In psyllium, we have knowledge of Kubernetes. We have all this contextual information about what processes are part of which pods, which namespace they're operating in, and we can use that knowledge to make it very efficient, filtered in the kernel or filtered in userspace. You might be wondering why it's called tetragon. This is why. The tetragonisca angostula is a kind of bee. There are a few other tetragonisca bees in this family. If you look this up on Wikipedia, this particular bee is a very small bee. It builds unobtrusive nests. It produces lots of honey, and it's considered to not be a threat to humans. I feel that that's very apt as a name for a project that is very small, efficient, resource unintensive, and produces lots of really useful, sweet information for detecting security violations. So context is everything. When you get these, if you define a policy for the kind of security event that you might be interested in, but you want to know what the context was where that suspicious looking activity took place. So we can see that some suspicious activity happened. We can combine it with this information about the Kubernetes context it's running in, and that will be really useful for both determining whether or not it really is something to worry about, and more importantly, as forensics, for trying to track down how the pod got compromised. The other thing that's really different from all of the other approaches that we've seen is the ability to prevent a malicious or suspicious activity. So the kind of kernel probes that we've talked about previously, whether they're EBPF kernel probes for CIS calls, or Ptrace, or any of that kind of family that hook into the early point of system calls, what they will do is notify a user space application which can then take some preventative action. And that could be as simple as just killing the process that's responsible. But the problem with this approach is that it's asynchronous. And between spotting that there's some malicious activity, if you have to notify something in user space and then take your preventative action, it may be too late. There may have been time to exfiltrate some data or change something that allows an exploit to kind of persist into the future. What's different with Tetragon is that we can asynchronously trigger a SIG kill from within the kernel. So as soon as we detect a malicious event, we can kill the process immediately. And if the demo gods are with me, we will see that in action. So I have a kind cluster here with a few pods running. And I think at the moment we have tracing policies as the name of security policies that Tetragon enforces. And I'm just going to run... Well, I'll start by showing you the raw logs from Tetragon. There will be some... Whatever's been happening in the past will appear in this log already. So it's been generating a bunch of logs. Not very human readable, but you can probably make out these Jason structures describing the event. We can see here this one is a process exit. By default, Tetragon will tell you about all of the process entries and exits that happen. So, basically, whenever a new executable is started or stopped, you'll get a log. So what we can do with this output is pipe it into what we actually internally for a long time before we knew we were going to call it Tetragon. We just called this the amazing CLI. And I'm going to filter out... So we just see... We just see things happening in the default namespace. So, again, there's some historical... what's been happening before going on. So, if I exec into, let's say, an X-wing fighter, and I'll run a bash here, and we can see immediately that we've seen a process started for bash. Is that big enough for you all to see? Is that okay? I see nodding, that's good. Okay. So, let's say we want to protect files in the etc directory. And I have a policy here that will do that. Did I use underscore or dash? Let's see. Oh, I'm in the wrong directory. So we have a bunch of example policies. And let's take a look at the etc one. Now, Tetragon policies are actually pretty low level. They allow us... We can think of Tetragon as an engine that allows us to attach policies into these generic points across the kernel. In this example, we're using a kernel function called fd install. And that's essentially when the kernel has a file descriptor that it kind of has in its hands. And we're essentially going to follow with this policy any actions that happen on that file descriptor. And we're also going to be interested if there are some read system calls, close system calls, and if we go down to the end write system calls. So essentially, with this policy, we can track when anybody opens a file. Oh, I missed the bit where it matches on slash etc. OK, so we'll see any file being opened, reading, writing, and closing events on files inside the etc directory. So let me apply that policy. And if I were in my X-ring fighter to... Let's try and edit the password file. OK, we're allowed to. There's nothing preventing us from doing it, but we can see data about it. We can see that here the file has been opened and then 1,200 odd bytes have been read from it, and then vi apparently closes the file while it's showing it to you. OK, so suppose I am the... I don't know, rebel commander, and I want to stop people from being able to change passwords in my X-ring fighters. Perhaps what I should do is have a policy that not only tells me that it's happening, but that prevents it from happening. I have another version of basically the same policy, but this time I've added in a kill action. This is part of the write. So when we match on a write operation, we're going to issue a kill to this process. So let me apply this version of the policy. This is a kill, and we're going to come along, and we see the detection of open, read, and close. Let's say we want to add in... Maybe I'm on the dark side and I want to add an account for a stormtrooper. Let me try and write that file. We can see here, it's a kill. Did that happen in time? Let's find out. Let's look at that file and go to the end of the file. There is no sign of permissions for the stormtrooper. So we were able to intercept that write operation before it took effect and killed the process that was trying to perform it. So file operations is just one example of the kind of things that we can build profiles to protect against with Tetragon. We've got a ton of examples in the repo. We can protect against all of those categories of suspicious or potentially the kinds of behaviors that we might find suspicious be that network activity, file access, memory access, privilege escalation. If you come and find us later in the week, I think we will have on the stand, on the psyllium stand, we should have a demo for detecting privilege escalation. And there's, you know, example tracing profiles for a variety of different things you might want to protect against. One thing to be aware of, though, is because it's quite a general purpose engine, you could write a policy that just looks at the entry points to syscalls and would still have the same time of check to time of use issues that we've seen with other tools. So you do have to be a little bit careful about what policies you write and how you apply them, but we have lots of examples there. So go check out psyllium Tetragon, put a star on it, that would be great. It's EBPF based, so it's very lightweight and high performance. It doesn't rely on a kernel that's so modern that none of you are using it. I'm pretty sure that all of you will be using kernels that are compatible with Tetragon. It gives you that contextual information about Kubernetes, and I think most importantly and most excitingly we can block events before they happen if they're suspicious. So, yeah, come and see us during the rest of this week. Check out the project on GitHub. I also want to mention a book that my colleague Nathalia Ivanko a Jed Salazar worked on called Security Observability with EBPF, which talks a lot about Tetragon and how you can use profiles to detect and prevent different types of security relevant events. So with that, thank you very much. I hope you are excited about EBPF because I really am, and do come and find me or my colleagues during the week, so that we will chew your ears off about how excited we are about EBPF. So, thank you very much. Do you want to do a question? Yeah, I think we have time for a question. Do you have any questions? Questions anybody? I see a question. Thank you. Thank you very much for the presentation. I have the question about context. Is it easy to link, for example, do two types of detections? For example, would be an interactive session and then the process of spawning doing suspicious activities? Or it needs to be events happening only on that's called in particular? So, because Tetragon is so kind of general purpose, you can write the profile to, I mean, in the example that I showed there, it had like three system calls because they're all related to that particular file descriptor. So it's not just any old file. It starts with that FD install that says, here is a file descriptor that I'm interested in because it matches, slash, et cetera. Now, if I see any subsequent activity related to that file descriptor, be it a read, a write, a close, then I'm interested. So it's kind of all in the way the profile is constructed, how you link the context of those different events in the profile, if that makes sense. Perfect. Thank you very much. Any more questions? Okay. Just a quick one. Just wondering for, I'm not too familiar with the interface, the SM API, but how does it handle buffers in the kernel? If you're writing to a file where, you know, you're writing one byte at a time, is it able to inspect the entire buffer? I would refer you to LSM hooks, the header file that will tell you all the different. Now, actually, that is, that does make me think a sort of important point where if you want to use the LSM interface, you can, but you don't have to. And actually what we've used in Tetragon is not the LSM interface because we've chosen to use some different entry points. Specifically about buffers, I'm sure we can find someone to answer specifically. Thank you. Yeah. Thanks Liz, this was a great talk. Thank you.