 Well, hello everyone, welcome and thank you for coming to the last session of the last day of the conference. This is Run Fast, Catch Performance Suggressions in EBPF with Rust. I am Everett Pompey. I am the founder and maintainer of a tool called Bencher, which is an open source tool for catching performance regressions in CI, and we'll talk more about that later. And I do ask that if you all have any questions that you wait to the end. Alrighty, so what is EBPF? At kind of a high level, EBPF is a virtual machine within the kernel that runs a special kind of bytecode. So you've got user space, then you got the Linux kernel, and so you've got your source code here. And then you compile it into EBPF bytecode, and then it goes into the verifier, and the verifier evaluates your EBPF code and basically has to solve the halting problem with it. And yeah, and figure out whether or not it's going to break your kernel. And if it isn't sure, it's not going to pass it. So once it's in there, it's got to get a good check mark where we can go on and go into the EBPF VM and be able to run as an EBPF program. So some of the things you can do with EBPF, you can set trace points, think tracing syscalls. You can set U probes, which are user space probes, K probes, which are kernel probes. And you can do packet filtering, which is XDP, which is actually where the name EBPF comes from. It stands for, or used to stand for, extended Berkeley packet filtering, and it no longer stands for anything. It is just a jumble of letters. And so there's also LSM, which is Linux security modules, and you can use that as well. And then what is Rust? So glad you asked. Rust is a programming language, one that I have fallen in love with. It is designed for performance, reliability, and productivity. And it makes it a great language to do systems programming in. Now, a lot of you may have heard that as of 6.1, it was added to the kernel, but we are not going to be talking about that today. That is not what we're doing. We're going to be talking about creating EBPF bytecode with Rust and not working on the Linux kernel in Rust. So different things, but buzzwords about. And what is a performance regression? What is that, right? And at a high level, performance bugs are bugs. They are instead of a feature kind of messing up, it is your performance getting messed up. Something that should not take that long, taken way too long at a real high level. And in EBPF, you can actually add up to 100 milliseconds of latency if you really try for a single call. And so you really want to make sure you keep that in check. And production is the most costly place to find bugs. So we're going to try and not find them in production and find them earlier. And usually performance regressions, if you're looking for them, they can get detected in development. Most folks don't actually have anything to try and catch them in CI like they do for feature regressions. And then that means that they tend to get caught in production when things are on fire. So an overview of the kind of where we're going with this talk is that we're going to go over an basic EBPF program written in Rust and then we're going to evolve that EBPF program. And then we're going to look at benchmarking that EBPF program. And then finally adding continuous benchmarking. So high level of EBPF tooling that exists out there. There's libbpf, which is written in C and you use C to interact with it. There's BCC, which is C, Python, and Lua. There's also some Go tooling, which allows you to use Go and C. And there's also Rust, which as you all might imagine, we're going to go with Rust. And just an idea of the Rust tooling, there's some choice even within the ecosystem there. There's libbpfrs, which is basically a Rust wrapper around libbpf. You would still have to write your EBPF code in C though. Then there's redbpf, which allows you to write the code that gets compiled into EBPF in Rust. But it is no longer that actively maintained because the creator of it has created AIA, which is a tool that allows you to both write the code and the user space in EBPF but also gets rid of libbpf all together and uses Rust for calling this calls there. So then that now onto our overview of getting into that basic EBPF program. We're just going to do an XDP program, which is we're tracing and able to intercept packets and make decisions on them. Basically just give a good thumbs up or thumbs down on that packet. So this is our version zero of our EBPF app. And all we're going to do is log the IPv source address of the packets that we get in. That's it. Pretty simple. So on the user space side, we're going to have a user space agent that is going to spin things up and not do all that much actually at this point. And then we have the kernel side of things. We're going to have the EBPF code, which is going to get that XDP packet, get the source address and log it. And as we go through the code on the upper right hand corner, you'll see these icons indicating where we are. So now we're going to go into the EBPF side of things, the code to do that. This is just a function in Rust that takes in the context that tells us kind of all the information that the kernel provides to us. We're going to use a macro for AIA to tell it that this is an XDP program, and we name it FunXDP. So we're having lots of fun here. We have a helper function called TryFunXDP that we'll get into in a second. And based off of what that returns, if we get an OK, we just return that return value. If we get an error, then we're going to abort, which is just a way of telling the kernel something bad happened. So that TryFunXDP function, we from the context go in and we are going to get the Ethernet header. We're going to have to bounce check using that pointer at helper, which we'll look at in a minute. We then only really care about IPv4 right now, so anything else, we just go ahead and pass it along. We extract the header again using that helper function, and then we get the source address and we log it. And then we pass. Real simple. And that pointer at function basically just gets the start and length and does a bounce check and make sure within memory. Otherwise, that verifier that we talked about might get upset if we're outside of our bounce. And so that's all that we're doing there is making sure we're good. And this is a panic handler because Rust can panic. And again, that verifier is finicky. We make sure to handle any possibility of panic in order to make that verifier happy. So that's the EBPF side of things of our super simple app. And now we're going to move over to user space. And in here we have our main. We parse the arguments that are passed to it. We init logging. We get our bytecode for our BPF program. We're going to init logging on it and then load that into the kernel and attach it to the interface that's passed in with just the default flags. We'll wait for control C and then that'll exit our program. So that is the user space side of things. And we are done with version zero, Brave New World of our first application in EBPF. Pretty basic, not doing all too much here. So we're going to evolve this a bit and create another version using EBPF maps, which is where you're allowed to communicate between user space and EBPF and really allows things to start getting interesting. So we're going to implement our fizz feature here, which is push fizz into a queue. If IPv4 source address is divisible by three, otherwise just return XTP pass. So this introduces kind of this middle common area here of these maps. And from the kernel side we're going to push over and then we'll pop from that queue on the user space side. So we'll start off looking at the maps here. For to implement this fizz feature, we have an enum in Rust and basically the only message we're sending over possibly is fizz at this point. We need to tell it that it can be represented as C. We also need to say that it can be copied and cloned, which allows us to use the a trait, which basically says that we can send this over on a map. And then we also derive debug so that way we can log it on the user space side. So that is the shared kind of map data part of it. And now we're going to go and look at the EBPF side. The changes on the EBPF side is we have to create a map. And that's what this does here. It just creates a queue that we can push the source address messages onto. Then we update our try fund XTP function here. It's very similar to what it was before, but instead of just logging, we're now going to go and do a modulus three. And if that is the case, if it is equal to zero, we're going to have fizz. And if that's there, then we push it onto the queue. And no matter what, we go ahead and pass the packet. And then on the user space side, we've got our main function. It's going to stay pretty similar, except for we're going to now add a spawn agent helper, which will be evolving throughout the rest of this talk. Again, wait for control C and then exit. So looking at that spawn agent helper, we create the user space side of that map and essentially attach onto what we expect to be from the EBPF side, that source address queue. And then we loop over that and pop off of it. And whenever we get something, we just log it. So we're now basically just moving the logging from the EBPF side of things over to the user space side. So that is the user space side. And now we have our fizz feature here. So actually really moving stuff across the boundary. So let's add a simple update. Now I think you guys might be able to see where this is going here. Maybe fizz buzz. Yeah, yeah. Okay. So if push fizz onto the queue, if the source address is divisible by three, buzz of divisible fives and fizz buzz if both. Otherwise just send XTP pass. So now we're going to, again, go and look at this shared data that we're sending across. We had fizz from before. We're going to add buzz and fizz buzz. So that's that part. Now off over to EBPF. Again, it's going to stay very similar, but we're just going to implement fizz buzz. So modulus three, modulus five and return XTP pass. So that gets pushed onto the queue and then on the user space side, we actually don't have any changes. We are done because it's just going to go and log the change that we did. And so kind of one more time through here, we're going to add another seemingly simple change, which is fizz buzz Fibonacci. Do the same as before. Fizz of divisible by three, buzz of divisible five, fizz buzz of both. Except if the IPv4 source address divided by 256 is part of the Fibonacci sequence then return FizzBonacci. So doing a little more work here is the idea. Otherwise just return XTP pass. And so again, we have to update the message that we send over and add Fibonacci as an option. And then on the EBPF side, we have to update our try fun XTP function. And instead of just fizz buzz, we have to do our fizz buzz Fibonacci here. And then otherwise return pass. And this is Fibonacci helper function, just does the Fibonacci sequence. Nothing to go wrong. And that's kind of it. We're good. Again, we're just logging on the user space side. And everything's great. And our app works fine until about three weeks later. When production's on fire. And we're like, what's going on? And so we chase our tail all day. We're trying to figure out what's going on until we trace everything and we end up here. And we start and we look in and we investigate. And we're like, oh gosh, I'm calculating the Fibonacci sequence. Every single time I get a packet. This is so good. So now we have to create a version four of our app. And we're going to cross that out. And we're just going to go and hard code this because there's really not that many numbers, below 256 in the Fibonacci sequence that we have to worry about. Now, does anyone notice anything about this slide? Anybody? I asked chat GPT to give me a list of all the Fibonacci numbers below 256. I am missing one. Yes, there is 233 missing. So our robot overlords have not quite taken care of us yet, but they're coming. So, all right. Now it's good. And so we're able to go in, put the fire out, right? Save the day. But like why do we have to do this, right? Like why do we have to wait all the way to production to catch this? We should try and shift this as far left as we possibly can. And so we're going to take a look at doing this in development. So we've looked at a basic EBBF program. We've looked at evolving it. And now we're going to look at benchmarking it. You can't improve what you can't measure. So with benchmarks, there's both micro and macro benchmarks. Micro benchmarks, I think of like unit tests. And so that would be this is Fibonacci function, right? That we had to go and fix. There's also macro benchmarks, which I think of more of like integration tests. And that would be at the level of the spawn agent helper function that we created earlier. So we're going to start out by looking at micro benchmarks. The options kind of there for doing micro benchmarks in Rust are lib test bench. It is on nightly only though. You have to use a not related to Bencher, crate named Bencher in order to use it on stable. And it's not being actively developed. The reason for that is criterion. It is available on both stable and nightly. It's become a bit of a de facto standard within the Rust community. And it has many more features. It's much more feature rich. And then there's also IAI, which is also available on both nightly and stable. However, it's much more experimental and not as widely used. It is still from the same creator as criterion. And it allows for single shot benchmarking, which is pretty cool. It uses cash grind. And so it gives you the cycles and things like that. So it is definitely something to like keep an eye on and I'm super interested in it. So of these though, we're going to go with criterion because it just tends to be the most popular option in the ecosystem. So in order to do micro benchmarking here kind of in our setup, we're going to look at be working within the kind of shared part of our code. And we're going to be refactoring some things. We're going to add a dev dependency for criterion. And then we're going to set up our benchmarking. This is in our cargo.toml to say we're going to use a source address benchmark. And we are going to be using criterion harnesses. So harnesses false. We're adding onto source address a method called new, which is going to do all that work that we were doing before of the isFibonacci then some fizzbuzz calculation there. This moves that out of the EBPF side of things, where criterion doesn't really play nice with it and into the shared code base. So then within our benchmarking tool here, this is kind of what a criterion benchmark looks like. We create a function that basically iterates and runs a few times. So that way I kind of can warm up and check, you know, after a few iterations what you get. And all we're going to do is go from 0 to 256 or 255 and create a new source address for each of those and see how long that takes. And this is just kind of some setup here with some macros to add this benchmark source address to the benchmarks we've run. And so if we were on cargo bench, basically get output there, which is nice. That's timing and seeing how long it took. And so cool, that is our micro benchmarking. All done with that. And so now onto macro benchmarking. This is going to be a little more complicated because we're going to have to try and get the benchmark times from the kernel. So the options there are to use kernel BPF stats enabled, which collects the runtime nanoseconds and the run count of all BPF programs. It was added in kernel version 5.1 and it is off by default. The other options are the BPF tool program profile. It collects hardware per count. So it's kind of like IAI that we looked at before in that it's counting the cycles and not the system wall clock time. It was added in 5.7 and requires a BPF tool that was built with clang greater than version 10. And then BPF tool program run. It runs a specific EBPF program. It allows you to provide the input and it gives you the return output and context. This is super cool. Unfortunately though, it is only for specific EBPF program types listed here. And even though we're doing XCP in kind of our tutorial here, we're going to be using BPF stats because it's just more applicable to a lot of other EBPF types. So in order to make this happen for the macro benchmarking, we're going to be working on the user space side of things. That main function that we had before, we're going to still be parsing the command line arguments, but now we're going to have a shutdown bull flag that we create and we pass that into a new helper function that we're going to take a look at in a second. And then we wait until you hit Ctrl C and then at that point we're going to send false, like trigger the shutdown flag and exit our program. So looking at that run function, it takes the interface we're expecting, it takes that shutdown flag and it returns a process which is the PID, the program file descriptor and a handle to a running task. So we get our PID, we go and from our BPF program we get its file descriptor. Then we spawn our task, get the handle and we're going to take a look at the updated version of spawn agent here in a second and that returns the process. And really what this is going to allow us to do is do our integration level testing here. So that way we don't have to rely on the main function to get things spun up. And the spawn agent again creates the map that we created before and then it still has that logging and popping off the queue but now it's looking for the shutdown flag to be triggered and then it just exits. So very similar to what we had before. And so now we're going to go and look at the custom benchmarking harness that we have to create in order to actually benchmark the BPF program. So we create the struct which represents a benchmark of an BPF program. It just has a name and a function that we're going to run for our benchmark. We're going to use this nifty crate called inventory that lets us statically collect those at compile time. And we now have to have also a main. And in that main we have a VAC of results and then for each of those items in our inventory for each of those benchmarks we parse the name and we go ahead and run the function that we're given, collect those metrics and push them into our results vector. With those results we then take them, we serialize them to JSON, save them to a file and that's kind of it. So now we're ready to create our first benchmark, our fun XDP benchmark here. So we're going to initiate a Tokyo runtime which basically just lets us run async things in Rust. We create that shutdown bool and then we pass it in to that EBPF run helper function that we just created and then we go and do some network things which is here it's just running a network request which is not the best thing to do because this is going to be a lot of variance in what you're actually benchmarking here but just a simple example. And then we're going to go ahead and get those BPF stats and then we are going to shut down and return the BPF stats. So looking at what that get BPF stats function is we go in to procfdinfo and then given the process we're just going to go and read line by line in that file until we get runtime nanoseconds and the run count and if those both exist we're going to return the average and if they don't we'll just return zero here. So now we're able to go and enable the stats. We're going to compile our BPF code in release mode and then we're going to go into our user space part of our app and run we have to run a sudo here because we have to be able to load the EBPF program and then we're going to basically just run cargo bench though and the output of that is the JSON that we see realized there and so cool we've now benchmarked our user space side of our code at an integration level. So more to does. So that's the end of the macro benchmarking and so now that we have benchmarked our EBPF code we can take a look at continuous benchmarking. Like we were talking about right like why are we going to wait all the way to production over here that's way too late. We were able to benchmark our code so the problem with that of going all the way over here is it's local only. It relies on us to run it ourselves and for the same reason that we don't rely on developers to run their unit tests locally we run those in CI the same thing should be done with benchmarks we should run our benchmarks in CI so that's the idea behind continuous benchmarking. Another is an open source tool that I built to accomplish this. It's open source, it's both self-hostable and there's a SaaS offering. It has multi-tenancy, multi-language support out of the box and it has statistical thresholds and alerts and there's also a nice GitHub action to make it real easy to integrate with your project. You can use it to track your benchmarks and so as we're kind of going through our Fizz, Buzz, Fibonacci example right the first version of the app probably would have not seen all too much difference between the first and second version but by the time we got to the third that's where we're going to see that performance regression and what we want is an alert to let us know that that happened. Detecting performance regressions is a key part of continuous benchmarking. Using statistical thresholds here you can kind of set the bounds of what you think is acceptable so again that first and second result are going to be pretty tightly clustered but the third result is going to be way out right there and that's what generates that alert. If you want to get into the statistics you can use both a Z-score and a t-test and set windows and sample sizes and things like that. So if we're going to do the continuous micro batch marking with criterion that we looked at that's in the kind of shared part of the code we do bencher run we tell it that the adapter is rust criterion and then we just give it our cargo bench command and it just works. Now if we're in CI and we want it to fail our builds in case there's an alert we just add a dash dash error flag and it will fail to build if an alert is generated. Now onto the macro benching macro benchmarking side of things with that custom harness that we built you again use bencher run but this time we'd use the adapter JSON and we go and read from the file that we saved to for the results and then we run our sudo base command for running cargo bench and again we just pass the error flag if we want it to fail on alerts. So that is the macro benchmarking side of things and that allows you to catch performance regressions in CI and again as I mentioned there's a GitHub action so it's really easy to just pull into your project if you want to use it. So continuous benchmarking you have to do detects in order to prevent production is too late development is local only and continuous benchmarking can save us a lot of pain. So I highly suggest that you add continuous benchmarking to your process especially if you consider performance to be part of what you expect from your application. So again in review we've gone over the basics of EBPF programs in Rust enabling a like an evolving a basic EBPF program benchmarking those programs and then adding continuous benchmarking so that has been run fast catch performance regressions in Rust that is a link to the repository all of the source code is under examples slash EBPF there and if you're too lazy to type that all out there's also Bencher.dev slash repo which will send you straight there and if you like it I do ask if you like the project to please give us a start so thank you Alrighty Any questions? Yes, okay Thank you, good talk I have a simple question for example some degradation could happen on different architecture for example on ARM maybe RISC-5 for example I can test on X86 so is it ready for different architecture? Yes There's a concept of three main dimensions so there's branch, testbed and then the benchmark itself and so the concept of testbed would be for that architecture so if you're running let's say on like a windows box and then maybe a better to say a Linux like a Ubuntu box versus a Fedora box maybe the better to say here at this conference then if you're trying to even you can even use different versions of different testbeds within that and so that dimensionality allows you to kind of keep that straight Thank you Anything else? Alrighty, thank you guys so much Appreciate it