 Okay everybody we're going to go ahead and get started. Today we're going to be learning how to survive eBPF deployment on Kubernetes with a new tool that we like to call BPFD. I am Andrew Stoikis I'm a senior software engineer at Red Hat working in the office of the CTO. I am a BPFD maintainer and one of my other cool projects is also Network Policy API which just got done with the talk on. So if you want to check out the recording that'd be awesome. I'm Shane Utt, I'm a staff software engineer at Kong and a chair of SIG network and a maintainer at Gateway API and contribute to BPFD which kind of happened through the Gateway API project. So we're kind of thinking most people in this will be generally aware of eBPF but if not we'll do a high level and then if you need to learn more please do feel free to talk to us afterwards. But basically you can kind of think of it similar to something like a kernel module where you can write code that you can load into the kernel using the VM that's provided with BPF. So you can do things like networking, security, observability and tracing by writing a little bit of code loading it into the kernel and it's a lot more lightweight than kernel modules used to be. There's a bunch of different projects that do this obviously there's a ton of them at this point and basically you have a variety of different ecosystems that have kind of built up and all of these different projects and that's part of what we'll talk about today. But more importantly and more specific to the purpose of this conference we're going to talk about why eBPF and Kubernetes and how actually. So eBPF is like we said is very good framework to extend and like go into kernel space and do things that you want to do like observability and networking but it comes with some costs and there's many different examples all of which do kind of different things like Selima's and Calico are great examples of CNI's doing networking, Pixie's open source observability, Kube Armor does security Blix which is actually a project that's kind of Andrew and I work on that's kind of connected to BPFD is a L4 load balancer that we started in the gateway API project and then there's net observe which is network observability in OpenShift. There's lots more projects and we'll have some links to like the sites and stuff like that but there are issues with eBPF programs and Kubernetes it's difficult right now and we'll talk about some of those difficulties. One of the big ones is security. Security is a big problem with eBPF today and especially in a Kubernetes environment. One of the top problems there is that eBPF programs are not namespaced so they can easily escape container isolation they're not isolated and that rely and they rely on highly privileged containers so people will like use highly privileged containers to load these programs and work with them and they're also pretty vulnerable to supply chain attacks which we'll be talking about more in upcoming slides. So digging in a little deeper eBPF not being namespaced so like you have to be eBPF has to be loaded from rootful containers so basically they're just in the root namespace so even with basic permissions we've seen evidence of like privilege escalations that were you know that have happened in the past we had like this particular one which you can go look at if you want to with unprivileged mode which is now patched and that's gone but there will be more in the future and since eBPF and containers are able to modify the host kernel they're it's just like I just said a couple of times they're just not really contained it's not a container in the way that you have to use the container to load these programs. So some features are just not restricted too so like some types of programs and that's what we got going on in this diagram here are just going to be able to do a lot of things like a lot of things that you might not want them to be able to do. So digging deeper into highly privileged containers you at least need cap eBPF which we generally you should just kind of look at that as I'm giving it root more or less. In practice there are several different capabilities that you can give it like sys ptrace net admin sys admin but in all cases it's it's it's a lot of privileges you can't really get very fine grained privileges to like get least privilege for your program the thing that it needs and it also has the problem of like you can't it's not common at least for people to have like I load up and I use the capabilities briefly when I need them and then I don't have them anymore it's usually for the entire life cycle of the application they have these high elevated privileges that they're they're running with so there's a lot of examples of how you can do scary things scary things with eBPF our friend Alessandro over at the IAP project built snoop which lets you snoop on or snuffy it lets you snoop on all SSL traffic you can do things like redirect and basically like fish SSH connections you can deploy there's all kinds of key loggers this is just one example if you go and get hub you can find others you can also do things like just attach probes to things that are like performance sensitive and then lag the system you know that's no good and in particular we're going to show you a demo of using the trampoline pod threat to exploit a cluster so security issues yes we also have functional issues so some functional issues are like a lot of programs operate in these silos or at least they they present themselves in a silo so like with CNIs in particular um you have where a CNI will be doing a bunch of networking eBPF code and it has no awareness if there's another one out there so if you're running a CNI that does eBPF programs and then you maybe write your own program you might end up in a situation where it you load that program and you don't really know which one's running first and you don't really know if you're like hijacking and like breaking traffic for your CNI it can become hard to understand if you're having like interoperability problems and this can lead to instability um the one that hurts me the most usually is just general visibility and debugability like at the cluster scope especially when you have large clusters trying to understand all what's going on like all these eBPF programs running in there is really hard to look at today um and then there's also issues with like versioning uh fine-grained versioning so like your user space and kernel space like you have some people do embedded bytecode some people don't like actually put the eBPF bytecode into their project and some people don't so it's not always clear like what version of which is running necessarily and with if you have different kinds of eBPF programs they all might be doing that differently um and then kind of in the same vein as that almost every implementation does something different from almost every other implementation in terms of how they actually load and manage their eBPF programs so it's not a great ecosystem for uh it's not a great ecosystem when you have lots of different eBPF programs from different kinds of you know applications that are doing different things so enter BPFD this is where we started working on eBPFD and uh thinking about some of these problems um so BPFD is an open source project which started out in the red hat emergency uh technology group and it is a program manager that manages the lifecycle loading unloading pinning of eBPF programs and kind of removes cat BPF from the equation um it enables program cooperation so you can do things like actually prioritize which networking application runs before the other and know which one is there and like visibility look into them uh it provides essential process for managing loading policy for security and visibility um BPFD is built in Rust and it's built on top of the Rust library IA um it includes and important to this particular crowd a Kubernetes operator and API is developed and go with controller runtime so we like literally have an eBPF program API which we'll show you in some of the upcoming slides Andrew yeah sweet thank you Shane for kind of setting the stage um at a high level before I hop into some of the core features of BPFD we are really focused on the eBPF in Kubernetes at a generic level use case we're not really specifying a single type of app we aren't focused on just observability just traffic shaping um or any other specific use case we want it to be kind of a generic um coming together of the community to try and figure out some of the problems and fix things so some things BPFD provides today starting with productivity we work to avoid application uh duplication in the loading and management stack by taking care of that for various applications BPFD is now the central process that is going to be loading and managing your BPF programs um if you want it to we also have other use cases where BPFD doesn't have to sit in that load path um but it seems kind of like a cool futuristic one that we've explored so far a thing we do to help with that is we allow users to distribute eBPF programs via OCI container images so this is not a completely novel concept other projects out here have done similar things um many of you may be familiar with even inspector gadget they package their gadgets in OCI images but it's a little more constrained to just their use case um we've tried to make this kind of big us to use case like we just want to um package eBPF bytecode in OCI container images we've written a spec for that we'd love to core uh work with other groups who are also doing similar things so security wise so nowadays BPFD if you're using it to load is the only thing on your system that needs to run with cat BPF that reduces your um wide attack surface and kind of helps leave helps remove the proliferation of privileged Damian sets that we see oftentimes in Kubernetes with BPF enabled applications today um with this and the Kubernetes integration we also get the benefits of Kubernetes RBAC um it doesn't really make sense yet but you'll see later on in our Kubernetes design slide we have specific CRDs uh that allow users to load programs those can be controlled by RBAC and we have future plans to define even more policy around BPF lastly we also have co-sign integration that works in tandem with our OCI bytecode image spec because now we can basically sign BPF programs that are in images and verify the ownership of those programs so this is something that the kernel's been working on for a long time they haven't gotten it dialed in yet so we said let's try to figure it out in user space so that's what we're doing for today in terms of terms of observability this is kind of outside the load path and is something that we've been exploring even more lately we've heard from users and and other folks that being in the load path for a lot of applications isn't perfect it's not perfect at all so we've also been exploring other things specifically around ebpf subsystem monitoring at a cluster scope because there isn't really much of that in Kubernetes today um we do this in Kubernetes by reflecting some ebpf subsystem state back up through the Kubernetes API and we're going to see more of that later on in this demo in terms of program support we have native support today for xdp tc tracepoint u probe k probe along with u rep probes and k rep probes um so that's where we're at today we have plans to add more programs but that's kind of what we've been focusing on last but not least we leverage the lib xdp protocol to provide a multi-program cooperation for xdp and tc programs there has been great work going on in the kernel to provide multi-program cooperation for tc programs so one day we hope to be able to not have to maintain all of this by ourselves we want to move to you know using kernel native apis if we can so that's some of the core features um and i just want to reiterate what the world looks like today again and again um so today we have multiple bpf enabled applications in this case blix net observe cube armor they're all using their own um bpf management libraries to interact with the bpf kernel subsystem for those of you who may be newer to bpf you're basically loading and attaching programs via bpf syscalls and then your user space portion of your program is interacting with those bpf programs running in the kernel via bpf maps um and everything here obviously requires cap bpf and so already the use of bpf has exploded so the number of these privileged amians running on every node is gonna continue to explode um and that is gonna we think one day turn into a nightmare for for big distros and cluster admins foreign wide yeah so this is an idea of the future with something like bpf d um in this scenario your bpf enabled applications basically just write yaml they create a specific dedicated program crd whether that's one of our supported programs so xdp or tc etc that defines the intent for bpf um across the cluster um so then bpf d takes care of loading and attaching those programs and also sharing those maps with the bpf enabled applications so then really the main kind of change for the applications is all they have to do instead of of of loading via one of the libraries is create a crd and then use one of those same libraries to interact with their maps via pins that we provide back to the applications and how we do that is with cni i'm going to go or csi sorry i'm going to go into that a little bit further on the next slide the last thing i really wanted to highlight here is now your bpf enabled applications do not need cap bpf we've kind of consolidated it into one privileged amian which is bpf d okay so i'm not going to go super deep into the architecture but i'm pretty excited with it i think it's kind of fun um so if we start up on the i guess for you all it's the left side of the screen a user would create a explicit program crd and we have a process written and go called the bpf d agent that is basically our kubernetes controller and it interacts with bpf d over a unix socket all pretty standard stuff that's telling bpf d to load a certain program from a certain bytecode image um etc etc which bpf d does is is loads and attaches that program and then also manages its pin points in a really cool way um we store those pin points on the node but one of the big problems originally was we needed a way to share the bpf maps and pinned maps uh with the ebpf enabled applications in a way that didn't require privileges um it makes a lot of sense just to use a host path right but host path uh volumes require privileged containers so we decided to actually implement the csi spec in order to share bpf um maps and pinned maps with the applications uh specifically we use ephemeral inline volumes as you can see in our pod yaml here uh we really liked the api of ephemeral inline volumes it's super simple an application can say they want to use the bpf d csi driver they can say what program they want maps from and they can even specify the maps they want from that program and then simply mount it into their container for use via a typical loading and management library um and none of that requires the container to be privileged so this is a pretty recent development on our side and we're really excited about it so that's kind of the overview of the architecture sweet okay so i'm gonna hop into a demo i'm gonna move somewhat quick because there's a lot to cover if you want this qr code takes you to uh a branch of bpf d where i've implemented the demo you can kind of follow along if you'd like okay so as we talked about before um most bpf applications today uh are privileged they maintain their own stack for loading and management and that's exactly what we've done here we've done a example application using an xdp program attached to the main network interface on the node and all it's doing is counting packets okay user space and bpf are compiled together like many of you are very familiar with today almost everyone does it like that and that bpf program is just running on the node counting packets as we see here okay so let's think about what it would take to turn this typical bpf program evil um for the purposes of this demo i got really excited and implemented a service account token stealer so what this service account token stealer does is uses bpf tracepoint programs to essentially easily break out of the pod's container boundaries the gox to be counterpods container boundaries it's going to sit there and listen for any process on the entire host to open a service account token um it's not isolated to its container boundary and then we can steal that token from chronal memory and write it out to standard out in our evil pod this is kind of an exploitation of the trampoline pod threat which some of you may be familiar with it's um basically where evil actors can access service accounts which allow um the evil actor to like degrade a whole cluster just from one single node um yeah cool sweet so looking at this in a little more depth if the binary containing the user space and bpf programs is compromised via whether it's a supply chain attack a privilege of abuse user spoofing etc exploitation is relatively simple the application for container is already privileged so it can load other malicious bpf programs alongside its original program in this case we've loaded four trace points attached to um sys enter open at insist enter and exit read which allow us to do the service account token stealing um so now we'll hop into a little bit more look at that i'm giving you a look ahead um i've implemented this it's running on a kind cluster which will show next and we're able to in our go xdp counter pod get information like this so some scary information we'll dive into it a little bit more here and one of the really scary parts about all of this is that it's can basically be invisible for the sake of our demo we're dumping stuff to standard out so it's obvious something is going wrong right like we have all this extra standard out in our go xdp counter but if you're just a cluster admin your cluster isn't necessarily degraded if you're more malicious than i was for this talk you could just be opening a port and sending those tokens over the network or you could be writing them somewhere on disk um there's a lot of malicious ways that this can happen okay so i'm gonna go ahead and start our demo here speed it up because i'm a slow typer and you all did not want to watch me type so i'm gonna start from the beginning i'm creating a kind cluster on my local machine um that kind cluster is pretty standard it only has one node it's not very exciting that's not what we're focusing on so that is up and running now we're gonna apply our evil xdp counter application and you have to watch me type okay so at first it looks like everything's progressing normally right packets are being counted uh that number is going up but if we watch a little further oof what is that oh gosh something bad's happened we're dumping a lot of privileged information in this pod the service account token is at the top and this a service account token for those of you who don't know is just a jason web token and we've actually parsed it down below and it gives us a bunch of information that's a little bit scary in this case it's uh relating to the kind net pod so we get kubernetes contextual information such as namespace pod u id etc along with pid in this case i decided to print out cool so another thing i'm going to show is like for me i'm a developer if i thought something funky was going to go on i would get into my node and try to run bpf tool because that's how i learned to do bpf in day one um but of course bpf tool doesn't exist on most distros nodes by default so that's kind of tricky so what we're going to do now is go back into our evil pod we're going to see that core dns has been excluded and we're going to copy that token and save it for later great so we're going into our xcp counter pod now we're dumping the service account token for that pod and then we're going to use it to try and list all the pods at a cluster scope this is not going to work as you can see the service account go xp counter is forbidden to do that but now we can copy and paste that core dns token that we had scraped and saved earlier do the same thing oh shoot it works that's kind of crazy we've essentially stolen core dns's identity in this cluster and just to verify that we've done that i show here i do something we can't do with core dns uh system account service account token and it shows us like wow we're in go xcp counter pod but kube api thinks we're core dns sweet yeah and that was just like a really simple example with core dns being impersonated imagine if when you're one of those poor souls that has something running with cluster admin so yeah so how does bpft help so first off we're going to start by just installing bpft and check out all of the bpft programs running on a given node so in this scenario bpft isn't in the load path but we can still use it to kind of help us out here so installing's pretty easy we start by just installing bpft crds and then we're going to install the operator which takes care of deploying the daemon we're going to make sure everything is up and running correctly yay still coming up we all love container creating things look good okay next thing we're going to do is dump all of the bpft programs which we have a crd for awesome this only has one node so it's pretty easy we're going to check this out and go hmm a lot of those are kind of erroneous system d programs that are running on my fedora node um i also see on there my xdp stats program running at the way bottom which is my example xdp counter application and i see some other weird ones right enter open at enter read exit open at an exit read i don't really know where those are from so i'm going to drill down into those programs a little bit more here in a second love my spelling there okay so now we actually check out that program and you can see that we um get a bunch of kernel related information that you usually get with bpftool so things like it's kernel id what time it was loaded at map id is it's using etc along with what type of program it is and this was a program not loaded by bpft so we're working out the load paths just providing observability now what i'm going to show next is is the finding the startup time of the go xdp counter application and correlating that to the loaded at time of the bpf program which helps us correlate what evil actor actually was in charge of that cool and that's what i wanted to show there so this just reiterates what i just showed in the demo um correlating you're loaded at time to the pod started at time um at the end of the day this is still really early in bpf d we'd love to clean this up so that it's a little more automatic but this is how we do it for now okay so we started about we talked about discovery we've we've installed bpf d we've discovered something is gone wrong how does bpf d actually provide some mitigation techniques so first things first we're going to delete the evil application then we're going to redeploy it with bpf d so we're going to write an xdp program reamble um but if you would notice the bytecode image that we're using is still using that evil program but it's not going to matter because we're all we're completely declarative in terms of our bpf behavior on the node another thing we're going to notice now is our go xdp counter user space program is not privileged which is kind of the goal all along and you can see the csi volume amounts that i had kind of talked about a little bit earlier the ephemeral inline volume amounts so the reason i can still load um that evil xdp piece of bytecode is because um we are only going to load the xdp program from that bytecode there may be other tracepoint programs in there as well but because the go xdp counter service account token does not have the ability to create tracepoint programs they are not going to get loaded and attached by bpf d additionally we're going to see some behavior of how we use cosine to verify whether these bytecode container images are signed or unsigned and we're going to see this again really quick in the demo so first things first we're going to delete our our evil xdp counter and then this is one little step i'm going to highlight here we are having to enable the csi support in bpf d we're using zero dot three dot zero which we just released recently csi is not enabled by default it will be very soon so that's one little in between step i wanted to call out as you can see now bpf d has three containers running in its pod one is implementing csi next thing i'm going to do is deploy the xdp counter program application with bpf d as you can see that's working it's up and counting packets again we are going to look at the xdp program kubernetes object as you can see it specifies things like our image bytecode which is still evil we have a priority for that because we support xdp multi pro cooperation we have things like interface selector which in this case you don't have to provide an interface selector we can just specify primary node interface we also have proceed on what defines how the behavior between ordered xdp programs happens and obviously it's still pointing to an evil bytecode we can also triple check that those evil trace points aren't there anymore you can see our go xdp counter example is there but the evil trace points we saw before are no longer there in terms of the bpf programs on the node yay that's a good thing sweet the last thing i'm going to show is that we can see some sign that something's wrong that we're still using the evil piece of bytecode by looking at bpfd's logs specifically we can see that that bytecode image is unsigned awesome this is really rudimentary in the near future we're going to have more policy around who we support in terms of deploying bpf onto your cluster what actors so the last thing i end up showing here see i'm still typing slow is editing your xdp program and hot swapping the bpf bytecode to a container image that we now trust yay go away with the evil obviously it's not this easy in real practice but for purposes of the demo i think it's great and you can see that bpfd reports that the bytecode image is signed see if i can get to the next slide okay i'm gonna hand it back to shame thank you andrew so yeah and that's what we've been working on and we'd love to have other people get involved um we have uh if you want to get started with the project itself there's a bunch of different ways you can do that bpfd can be loaded as a standard linux uh uh daemon and you can use bpf cuddle but also you can use the operator on any kind of kubernetes cluster you can do it on a kind cluster locally if you just want to try it out real quick we provide examples in the repository and we also have a website bpfd.dev which has examples guides and kind of can get you started if you're interested in just trying this out and seeing what this looks like um we also work in with it in sig network now so um blixed which is a layer four load balancer um that we created in the gateway api project is actually using bpfd as its loader and manager for its tc ingress and tce egress programs um it is used mainly for ci and testing scenarios today and we are the maintainers of that as well um it may be if you're not interested necessarily in like just doing basic ebpf examples but rather would like to see bpfd kind of an action in a process in a in a project that's actually using it right now this could be a good place you can use that qr code there to go check that out in kubernetes sigs and then in general we have a community we'd love to see you uh join um we have weekly community meetings for bpfd on thursdays and we have the bpfd channel on kubernetes slack um which actually if you need the community meeting links they're posted automatically in that slack channel we also have ebpf uh for general ebpf discussion in kubernetes um so if you're like there's also like selium uh selium slack for bbpf and stuff like that but there's quite a few people who are like focusing specifically on ebpf and kubernetes hanging out in there if you need some help please don't feel for or please feel free to go in there and ask for help um we're also very active in the aya and rust communities so that's actually on discord rather than slack but you can find us in there as well and there's links to aya and the bottom left which if you're interested specifically in in writing ebpf stuff in rust and then our community page to get like them the channels and stuff like that um so we want we have a bit of a roadmap we actually literally have a github project roadmap because we're trying to provide some transparency on like where we're where we think we're going with this so if you want to go take a look at that and see what we have so far we'd also love to hear from you if there's things that you think we need to be thinking about on our roadmap uh we currently plan to apply as a cncf sandbox project um we're in the midst right now like literally during coupon of going to a daemon list daemon less design the community is deliberating it well we're working on this some design documents but um yeah we're excited it's it's an in progress thing so if you're interested please check that out but in general um in this project we're trying things out we're experimenting we're basically seeing these problems identifying these problems that seem to be kind of very universal dbpf problems and we're trying something to solve them however we're more interested in the high level goals we're more interested in solving these problems than we are in the exact way in which we solve them we're not married to the solution that we have today necessarily so joining the community is particularly important if you're working in ebpf we want to work with you to maybe change course significantly just so that we're all doing it together so that we're actually solving these security functional and like ergonomic problems with ebpf because there are a lot of them today um so that's kind of the bigger highest level goal of this project thank you and we have a couple minutes for questions i think we're done yeah i think we have a minute yeah we got a minute the cosine binary included in the in in the program in the daemon set nope we're actually using the cosine rust bindings yeah built into bpft is it possible to um check against like a private sig store stack rather or is it just the public good instance of sig store so to be completely honest i am not the cosine expert in our community um i'm sure it is i do not have an answer for you today though all right i'll jump into our slack i'm sure we have enough thanks oh excellent talk it was thank you thank you thank you hey i just wanted to do a quick plug on a couple of things um so in the itf we're actually standardizing a bunch of things around how people load bpf programs the bpf instructions they had a whole bunch of stuff like that so everyone is welcome to join that it's like itf bpf working group so you can google it you'll find it um and related also in the linux community um it kind of there's a lot of ideas around how enforcement around rback so i think like the rback thing was like super interesting in terms of how uh you know you can say these bpf programs can load you know these trace point yeah hooks so that sort of thing so um i think that's like a super active area of discussion also um here so like we should definitely uh collaborate yeah thank you and we would really appreciate it if you would come join us and uh and like check in with us because we want to work with you on these things um yeah thanks so much