 Perfect thanks. Thanks for coming back from lunch for my talk So I wanted to start with this and then we'll get into some slides but what you see at the top is Our CLI showing us all the ppf events and then on the bottom we have various recorders Which are recording some things so on the left we're recording all of this shot to 56 digest of the binaries That are running in that pot on the middle. We have The applications that are making connections so you see curl is reaching out to ten two four four zero six and then we see All the BPF programs that are being loaded so we've created two a funk and a test Just for fun. So I'll explain this later, but let's jump in See if I find the right screen There we go, cool So my name is Joan fast man. This talk is actually prepared with Natalia. She's not here today But she she helped do a lot of the slides and the work to get it going. So Let's give her some credit as well What we want to talk about is you and your security profiles All right, so we want to talk about how to get least privileged profiles using BPF implemented in our favorite tools using BPF So the problem statement is how do we secure this right? So you look at your Kubernetes cluster and you have lots of stuff in it Probably some subset of those boxes. Hopefully not every one of those boxes in your cluster, but maybe who knows Right, so we have all these things and we're trying to secure them And we have this sort of guiding principle of least privilege, right? so the idea here is we want to only give the pod the privileges as needs nothing more and And the question is then why is that hard? So the first part of the talk here will set up like why why this is difficult and then what tools we used and then finally like How we solve this problem. So why is it hard, right? Well? Like we said, you had lots of things That previous picture with all the stuff You need to somehow figure out what your behavior should be Right, you don't necessarily know what everything's going to execute inside your pod You don't know every file. It's going to open every network connection is going to make Right, so if you want to sort of create a minimal policy that has just them Connections it needs just the binaries that needs files it needs and so on you need to somehow learn all this stuff to start with You need some notion of identity right like what is it that is doing this? Is it curl? Is it somebody who renamed their program curl and decided to use it? What version of curl is it? And so on Right and so the other reason is is you just have Lots of these things you might have thousands of nodes thousands of pods right you can't manually go in and read You know s trace every application and figure out what it's doing or you can't see it You know you can't be expected as a security person or an ops person to know what every application in your entire cluster is doing And so you have lots of data even about an application, right? If we think about all the stuff that BPF trace can do BPF We can trace networking we can trace Applications this calls all this good stuff right and we saw a few few talks about different types of data. We can see today But then we also have all of the metadata that comes with your network So you have kubernetes labels and pods and namespaces as well, right? So you combine this all together and we just have lots of data And lots of information, okay And then the last one I want to talk about is there's sort of a gap here Right, so your security and your obstacles they care about pods you care about your cluster They probably don't care like you know, I'm a kernel developer They probably don't care about the C group ID of that binary, right? I might I might really be interested in that there's a network namespace there and a mount namespace and a pin namespace and all This good stuff that Linux is doing Right, but when you run cube kettle and you look at your multi clusters and your thousands of pods Do you care? No, you care about the pod you care about the label you care about sort of the policy at that next level up And so what we do is we need to bridge this gap, right between what BPFC is Just you know an i-node on a system in a container back to what the What you see as an operator or a security person trying to write a policy for this for this system, okay? So that's why it's hard Hopefully you're convinced nobody stood up and said no That's easy yet. Anyways, all right. So what's our tool? Well, it's BPF day So the obvious choice is BPF right and why do we like BPF? I think people have gone into this sort of at length already, but there's lots of reasons why we like BPF, right? It's safe, right? We can't can't break your system easily at least We'll load our program And we can hook anywhere in the kernel we can read the kernels data structures so we can let the kernel do some Of the work for us Right, we don't need to like re-duplicate everything the kernel is doing and It's also quite Feature rich and especially if you think of a 5.5 kernel or a 5.4 upgrade or a 5.10 I think I've even seen some 5.15 kernels run around, right? You have lots of data structures you can use You can build your trees you can build your hash tables Do some looping all that good stuff, right? There's even locks and spin locks now, right? The next thing is it's transparent and atomic so the atomic is important from a security standpoint, right? You want to make sure that when I upgrade I can do that seamlessly Like I don't need to for example restart all my pods or somehow get a hold of the the network admin and tell him Okay, please shut down the network. I'm gonna restart, right? You don't want to have these like world-changing Upgrade processes and so BPF gives you that you can just switch the BPF program on the fly You can do it atomically so it's either on the old program or the new program never in between and definitely never without any BPF program, right? So there's no gap in your security security model Yep, so then If we think about how things that are done today, right? We when we were setting out to do this talk we looked around right? There's a lot of syscall profilers out there That will do syscall tracing and Give you some security on the syscalls so you can build an allow list so open maybe imagine like open and closed or allowed But I'm gonna block Set in s because I know that hey this pod should never do a set in s or set UID It should never change its UID right some of these sort of standard syscall things And this has been around for quite a while se linux can do these kinds of things as well, right? and This is great But the question that we wanted to ask me to talk is like can we do better better than just system calls? All right, and so what we did is we turned to our favorite tool here, which is tetragon I'm the maintainer for tetragon So it was a sort of a natural pick and what does tetragon do? I'll talk to you a bit about in more detail in a couple of its lights But what it does at a high level is you run it on the node and it attaches BPF programs to all these points in your Kernel to collect lots of data about your system All right, so you can attach to the C group create hooks so you know when C groups are created you can attach to the files So you know when files are created access read or write can attach the networking stack and so on and so forth Right all the way down the stack and then provides you an infrastructure to to build those hooks So you can apply these with a CRD in your kubernetes environment and you can get the data back out Through fluent D or through a GRPC tunnel if you need if you want to Right, so it gives you that sort of infrastructure to start building tools on top of And then here's some key points I think for for tetragon like why why do we like tetragon so much? The first one may sound obvious at first, but it's actually quite tricky It's to be able to always know what is running on your system. It sounds very Basic right but now Let's talk about what that actually means It means you need to have some sort of identity and you need to have some sort of location Right because it's not a single node. We're talking about we're talking about an entire cluster possibly multi clusters, right? And so what is an identity, right? It's not just a binary name because I can rename files It's not necessarily a PID because Pids change over time, right? It's includes all the libraries that are loaded. I want to know Not only did I run this binary, but I ran this binary and it loaded this version in open SSL Is that version of open SSL vulnerable or not vulnerable doesn't have my patch in it for my fix Right. It's the args that the binary was run with Build ID. This is something that your compilers will put in to tell you kind of trace it back to where it was compiled And then we do saw 256 digest here To give you kind of exactly what code is running So that'll give you an identity, but that's still not enough. You also need this location So inside Kubernetes. What is the location clusters nodes namespace pods containers and What else did I have in their time time is important, right? You care about when this thing is actually run Was it run yesterday two months ago? Five minutes ago it matters, right and what you get when you put these two together and you get a unique ID Right now you can put that in a database and you can ask queries on your database Which are quite interesting you can say, you know, when did this this thing execute what time? What executed between these times on this node, you know slice the data anyway and create sort of interesting data sets The other reason it's hard is because the Linux kernel When you execute things, it's not a straight tree like you might think right you might think oh, it's just a tree But actually the kernel itself can do execs it can do clones Which means it can just exec over the top of its program Python libraries do this Java libraries do this a lot Just save space, right? They don't want to create another stack space Just exec right over top of the old image and when you do that you lose this idea of parent-child relationship that you sort of as a user expect But it's not always there in the kernel, right? It also when things exit if a parent exit doesn't necessarily mean that the children exit, right? And in fact, they actually in the kernel get assigned a new parent So there's all these semantics in the kernel that if you sort of naively take just the kernels viewpoint You lose the sort of abstraction that a user wants, right? You always want to know that tree Okay And then there's a lot of other types of execs that we can trace to etc No, not just the straight normal one where you think of you have a bat binary path name Can you execute it? But you can think of all kinds of other things like you open a file name open a file descriptor delete the binary and then run it shared memory mem fd files So on and so forth. There's a whole list of these and if you want to play with this come find me Let's chat. There's probably more ways than that I'm always happy to sort of improve that the tracing of tetra gone here or if we miss something let us know The next thing that that's really important is we can we can hook any kernel function As long as it's a function in the kernel we can hook it this the tetra gone crd is generic Which is great for building tools If I want to hook I think here I said Some function in the kernel I can write a crd and deploy it and then all of a sudden I get this data out on every node in the system You know every node the cluster will now have this new data without having to restart Upgrade change my tooling So sort of new use cases come in easily The advantage of this is we almost never hook system calls for security things right So if you think of a system call wire system calls good, they're good because they're convenient and they're stable but they're not nice because If you hook the with BPF here if you look at the diagram there on the right If you hook this is called and there's a pointer to user memory, and then you read the user memory That's still a user memory, right? So it's entirely possible that the user could then change the memory Okay, at which point your security tool is no longer reading the right set of bytes reading some other data But actually what's even more common is it'll fault right if you run this in a big cluster What you'll see is a bunch of faults happening you'll be like why am I getting fault on my system calls? It's because before that BPF program ran the memory was paged out BPF will just go I don't know how to try to track. I don't know how to track down that memory. So I'm gonna fault on it So for security things we generally try to stay inside the kernel Not on these system call boundaries and if you look at stuff like sd Linux and stuff They'll try to they won't actually trace these right they'll say that's a user pointer memory I know that's not secure. Don't write a policy against this. They'll stop you from doing that The other reason that this is interesting is because we can have virtual types right or slim types in some cases What we call them what this means is like do you really care what the file descriptor of an open call is? Probably not because it doesn't mean anything to you right if I'm in a multi cluster environment And I tell the security operator it open file descriptor 5 Okay, he has no no idea what file descriptor 5 is right So we do a lot of these virtual file virtual types where we take the file descriptor and we put the path name in there So now in your database you have the path name much more useful. Maybe the I know number even more useful It's the same way with C group IDs It's like I can tell the operator that the CI group ID was X But what they really care about they care about the label and they care about the namespace most likely And the same goes for all these other things like if you want to look at a task structure in the kernel It's a bit, you know might be bytes and bytes and bytes of data What do you actually want out of that task structure? You don't want to just print random data You want to print the things you care about right? Saves makes it more performant because you're not copying tons of data around but it also makes it more useful The other thing then we have is to directly address this like firehose problem right if I hooked fd install Which is the kernels version of loading a file? I'm gonna get every file in the system And if you we've done this what'll happen is you will get just lots and lots of data Linux systems like to open files Right. That's what they do So really you need some way to scope this down. You need to say I only want these files Here's a path name or a directory or I only want files that belong to this binary or you know You need to start scoping these things down think of network connections Do I really care about all the network connections that my DNS server is doing? Maybe maybe not right? There's gonna be a lot of DNS requests in there a lot of them are gonna be duplicates You know as you query the the local node local pod node so on and so forth as it goes down the line So what we do here is we try to scope that down for you give you a bunch of CRD selectors here We have match arcs and pads and a few other things So you can start to say like what do I care about you know from a security standpoint? Maybe you just care about things that are privileged. I want to know anybody who has capnet and men Tell me everything they do right these types of things And then the the last thing that really I think is really interesting about tetra going is it does the actions in the kernel? Okay, so it's done in line with the with the call So if you want to kill a process we will kill it from the kernel side What this means is there's not a delay right? You're not looking at a delay from when that events gets pushed from user set from the kernel to user space Through some user space logic, and then maybe later at some point it decides. Okay. That was a bad application I better stop it right it's likely already done the thing that it was you trying to stop right? So tetra gone has this ability to do it inside the kernel So we detect it. It's trying to open a file that we don't like or open a network connection We don't like kill the process right away Okay, so this gives us sort of our tool set and let's go and start implementing least privilege using this All right, I got about 10 minutes. It looks like All right, so what do we want to do? What's our task list to do this using our tools that we have so first we need to test environment So we'll build that up then we're going to collect some stuff and figure out what our pods are doing because we don't actually know What they're doing and then we're going to create a policy We'll apply that policy and then we'll see that it works okay, so The first way we do this is we have set of pods and we have this gRPC collector thing it's basically listening to the output of each each version of tetra gone each node and Creates a list of Events that you that you would want to watch for All right. Oops. Sorry. I'm going backwards knowing you And so I could talk to you about it, but I'm going to try to demo it here So let's do it and see how it goes All right, so this is what I talked about at the beginning of the about the beginning of the talk, right? and so this top thing this is a monitor of our Test pod we called BPF droid And we have this tool Maybe I can scroll up and see the thing for you Unlikely all right Anyways, we have this tool which kind of pretty prints the slimmed-down version of the events Right like this just sort of the relevant information the actual events are quite quite large and they have more data in them And then we have our collectors on the bottom and so I started three collectors here And you can see this top thing is doing stuff. It's basically on a 30-second timer. It runs much of stuff And then on the bottom we have our collectors So the left one like I said is the digest so this is a list of all the binaries It's running so each one of those digest corresponds to a process on the top Okay, the middle box is a curl. It looks like curl tried to reach out to something and then The right side is in our little demo pod. We try to load some programs a function and a test Okay, and so the goal here is to make sure like this is all that happens and then to make it a little more interesting What we're going to do is we're going to decide that we should never load test functions in our production pod Or at least our demo pod right, okay so How do we do that? go over here and Let's talk a little bit about what I have loaded because I've read said a lot about CRDs But here for example is a CRD for TCP connect. This is monitoring TCP sessions Right there On what this basically says is that I want to hook it to TCP connect It's not a syscall like we said we don't want to try to avoid hooking syscalls if we can and it tells us what the argument is It's a sock argument is the index zero There's some other arguments probably in that function call, but we don't care about them Right, we just care about this sock thing because we're trying to figure out what it's connected to And then we care about TCP close and TCP send message so connect Connect what do we have connect send and close and I put them in the wrong order Should put the close above the above the below the send and Then I have a BPF collector and these are just sort of trivial examples to show You might expect it like a production will have a whole list of CRDs and they might be longer But this this hooks the BPF load So when you load a BPF program one of the things the kernel does is it verifies that it's correct This hooks the verify call and then checks to see did it succeed? Did it fail and who did it? All right, three things you want to know and what did they try to load? What was the what was the program name? Pretty useful information if you want to know what's loading BPF programs on your system All right, so there we go and Then we have this data like we said over here We have all the data from the bottom all collected So let's try something Over here. I have this droid thing. I can do clear. This is just a shell into that test pod, right? I can run commands It's not locked down. Oh, I shouldn't run too many commands because they're gonna get sucked into the digest See like the digest changed But anyways, you see I can run commands. So let's do this I'll show you what it looks like let's look at our exec guard This is this is coming from the streaming data. I actually stopped it From streaming This was from a slightly early one because in case I did what I just did there and ran a bunch of commands I didn't want to get plugged into here, but there's some exact digest Let's apply it. All right. So now it's applied Let's go back to our pod. Here's our nice little pod My pods now locked down it would be nice to run clear, but it probably won't work. See everything is locked Can't do much of anything, right? So like maybe I want to try in a center Nothing works. All right, all killed and all synchronously, right It's not like it's not like I did actually got my innocent over to run it failed immediately and here you can see My little CLI is telling me. Yep, that happened in a center killed and tried to do bad things All right, and if you look at the background still running, right because those are on the digest I Can show you here too. Definitely on the digest BPF tool is fine All right Interesting Then we have a connected. Oops. Sorry, then we have a connect guard as well This pulled in data from that middle box I saved it from earlier So hopefully I don't have to regenerate it and I still have the same DNS names But let's give it a try and see. All right, so it's running so we should be able to curl Like we try to curl curl works by the way because it was listed as one of the commands in the digest We can curl other things. Let's see what Google has to say to us today Yeah, Google doesn't work because it's not in the digest sad, right? So we've locked down all the connects The only connect that does work is the thing we let work was the cppf dot IO right up Right up here Or I guess in this case we're doing Alderman base so slightly different one But so it doesn't matter the old the old part that we had used to do ebbf.io And I guess maybe that's an interesting point is I don't actually have to know what this pod's doing because I just collected the data and applied it and Then the last one and then I'll wrap it up here is we'll try to lock down the BPF tool So like if you look here, I have this test function running in my in my production pod My demo pod here, and I want to get rid of that. I don't want people to load test things in my in my pod So let's add one more policy When it tries to run what'll happen is it'll it'll fail it'll do a sick kill on it Or if we get bored I'm an impatient person So let's do this. Let's just run it directly. Just steal the command out of the this thing here and apply it to the pod and I can't even load this program anymore because I tried to load test so I'm pretty well stuck But all my stuff that is running normally in the background keeps running Right because that's part of my part of my list so That's the that's the quick test the demo I'm gonna skip through all these slides that explain the demo because I think we just walked through it, right? So here's the same pretty printer that we just saw And what is telling you here is just that you know this pod does this thing where it connects to the world And we can block it If BPF tool tries to load something we don't expect doesn't load Curl still works though except for when it goes to somewhere. We don't want to So let's wrap up because I think I'm low on time Yeah, two minutes So what this does for us is we get a least privilege We don't have to know what the pod is and sort of we go beyond syscalls, right? We're at least privileged as far as what's executing with this digest and like a full identity we We we sort of Believe that this is usually very hard to do right because you have to somehow collect all this data So if you don't have something like BPF, but you can collect all this data Pull it into your system and then generate these policies that you push back out Gonna be really hard to do. I'm not sure how she would do this, right? And I'm not sure even what tools you might use if BPF Wasn't there to help you So by using this BPF we can then add all these filters add these actions in the kernel and kind of get this least privileged thing That we want right and you saw that when I tried to log in to that shell I can't even execute things that are no longer no longer available to me So I like to end with this if you want to help I Would love it always looking for people to help. There's lots of things you can do There's the github page there silly I'm touch are gone I mean you don't even have to be a coder you can just use the tool file bugs Tell us where our documentation is not great You have a use case that I didn't think about go ahead and file a feature request can't hurt right? The documentation is always needs to be improved You know there's always something that we miss especially when were the users of the tool are writing the documentation a lot of times We forget things if you have a use case that you want go ahead and add it We we have a crd's examples directory where we just keep example directories in the bar of entry there We try to keep it pretty low right you have a you have something you want to trace somebody sent me a an umkiller thing the other day I don't have a use for an umkiller thing But we'll put it in there right and then it'll be with every code and we keep up to date and then we test them Right, so if you have a use case you put it there We'll test it if somebody tries to break it. Well, you know revert their patch nice for makes it nice for you If you have some use case that just doesn't work that's sort of interesting too, let's have a talk And then the other one is like if you have feedback on the crd's ui's that's always interesting for me I'm coming from the kernel side right networking side, and you guys are a lot of people are coming from the kubernetes side Right, so I talked with the kubernetes people. That's where we get a lot of this good stuff about labels and namespace and stuff But you know this room. I think there's a lot of people with expertise that could be quite useful Of course, and if you want to fix bugs more power to you so I'm not gonna stop you With that I Will stop and say thank you Brilliant round of applause for John. I feel like I just want to mention we have a city in project meeting tomorrow morning I don't know if you're gonna be there, but yeah, I'll be there Yes, if anybody does want to come and chat about contributing to tetra gone tomorrow morning the city in project meeting would be good opportunity Maybe Alex could come and start getting ready for the talk and while he's doing that Let's see if there are some questions for John. Good. I see a couple back there. Hey, this is exciting work I just have one question. So you mentioned that you don't want to hook into the system calls for good reasons, but many of the functions that You would like to hook into in the kernel in line and not available. Is there any guarantee that you catch sinks early before anything can happen So so is that is the question that sometimes you want to hook the kernel function But the kernel functions in line so you can actually hook it. Is that really the gist? Yeah, basically If there is a gap or if there's a guarantee that there is not the gap between the system call and the place Where are you actually looking into? Yeah, there's always this challenge right where perhaps it's not in line So you have to go read the kernel code and try to figure out where it's where the hook is And you know in worst case, maybe you have multiple hooks in the kernel that would be just one thing Usually I would argue Typically for a lot of things we want to hook kernel developers like to reuse their code So you can usually find like in the base like fd install for example is like an open one Everything that opens in a file descriptor calls that thing Same way with exec. There's like a hundred different ways to exec But you know the kernel developers don't like to write a hundred different lines of code So there's like one or two function calls that exec things in the kernel So usually you can find them it can require some digging There are a few cases that I know about where it's just not possible and what we do then is we Annotate the kernel so that that function no longer it gets in line and we have a hook there and You know that's sometimes not as very satisfying because it means you got to get a newer kernel But you know that's the worst case I've only hit that case a few times and usually we can find workarounds even if the workarounds are a little you know not Not as pleasant as you'd like them to be Another question here Thank you One of the points you made on the slide about you know the benefits of ebpf from a user perspective You know I was cheering for you the whole slide and then because this is all the stuff that I that I need to use At the end there was a line that you kind of glossed over which is about that it can be made portable Oh, yeah, I'm not I'm not asking you to like give a presentation on portability here But what I was interested in is where are the sticking points you see about portability from that end user perspective like what what are the challenges? to like some building some like tetragon that that you're that you could potentially start to work right so I would say from In users point of view from tetragon we strive to handle all that for you right so like as a user of tetragon You shouldn't have that problem. That's the goal Btf is sort of the catchphrase that solves all your problems You'll get the function names you get the structure offsets of course the if you look at the tetragon code What you'll see is that we pivot on some kernel versions There'll be some sort of duplicate functions that say if it's five four kernel We're gonna load this function if it's 419 will load this and if it's 414 we're gonna load this so from the developer side There is some pain there But I don't think it's that big a deal usually the code is pretty well organized and those functions tend to be pretty small They're like, you know Colonel again kernel developers don't generally Decide to rewrite their guts of the operating system on Dave and day-to-day right like if exec if exec loading is gonna change It's gonna be once and you know We're not gonna do it again for ten years probably right the same way with connect It's like the connect functions been there since to you know three x kernels hasn't Substantially changed the functions change but the BTF handles the function So the only real problem you would have is if a function exists that doesn't exist now Or somebody completely refactors the kernel and it does happen, but you know we deal with it and Try to make it transparent to tetragon users. That's the goal right from those users don't have that pain