 All right, let's get started then. So welcome to our session on past, present, and future of EBPF and observability. I'm joined by Natalie today from New Relic, who is a contributor to the Pixie open source project. I'm Frederick. I founded a company called Polar Signals. And we both happen to work on EBPF in the context of observability. So introduction to EBPF. Let's do a show of hands. How many people know what EBPF is? All right, we see a lot of hands. OK, so who's running some tools that use EBPF? OK, that's probably maybe 50% who has written an EBPF program. OK, OK, that's still maybe 20%, 30%. That's actually more than I thought. So hopefully, at least by the end of this talk, we'll have everyone be able to raise their hand, at the very least know how to write an EBPF program and how to get started with that. Obviously, you won't have written a program yet. So just to make sure that we're all kind of on the same page of how EBPF works. So EBPF is kind of a virtual machine within the Linux kernel that we can attach to certain triggers. So in this case, what we have is we're attaching an EBPF program that we've written below to some trigger. And in this case, it's whenever we execute the syscall at execve. So every time this syscall is called, first our EBPF program is called. And we can do whatever we want within that context. We're given a bunch of context about what's being executed. And we can save that and have a counter of how often this syscall is being called, or we can do a log line, or whatever is useful for us, especially in the context of observability. Obviously, we would want to export some sort of signal. So these hooks can vary very, very, very widely. And Natalie is kind of teach us where all of this kind of originally came from. But today, there's a very wide variety of hooks that we could attach our EBPF programs to. So that can be kernel probes. So attaching our program to a kernel function that's being executed, Uprobes, these can be used to kind of trace user space execution. So the programs that you and I write that run maybe written in Go, and they do some network service, maybe some web server, you could attach a probe to some function of your Go program and also do some tracing with that. Or perf events, this is actually something that I happen to work with on a daily basis. So perf events tend to work in a kind of overflow way. So we tell the Linux kernel, hey, call my program. Every 100 CPU cycles or something like that. That's typically how profilers work. So when we write EBPF programs, how does that work? Well, actually, it's not all that different from when we typically write programs. Typically, if we were to write a C program, we would compile it with some compiler toolchain like Clang. And we would say the target, right? In this case, I'm building it for an x86 platform for Linux. And if we were to want to write a BPF program, actually, we still write C code, but the only thing different is the target. And what it outputs is EBPF byte code. This is important, so let's remember that for later. Because what happens with the EBPF byte code is it's kind of a generic representation. Maybe people are familiar with kind of the equivalent thing in the Java world. When we compile a Java program, the output we get is Java byte code. And that, in itself, is not really executable by any machine. We still need the Java virtual machine to kind of compile that Java byte code into actually something that the machine can understand and execute. And this is precisely what kind of the EBPF just-in-time compiler within the Linux kernel does. So it first verifies that this program is actually safe to run. And we'll see in a second, or we'll talk about in a second, what that actually means. And once it's verified that this is safe to run, it will compile it. And then you can attach it to any of the hooks that I was talking about previously. So why do we actually need to verify this? Well, the thing is with EBPF, we're executing code in kernel space. This is almost worse than having root privileges, where we can do arbitrary things if we're executing things in kernel space. But with EBPF and the EBPF verifier, it actually ensures that whatever we're doing is actually going to terminate. Perhaps you're familiar with the halting problem in a classic computer science problem where we're asking ourselves, can we ensure that a program that we're writing is actually going to terminate? Will it exit? What happens when we feed? And can we write a program that determines whether that is going to happen? Well, what happens when we feed that same program to itself? And the way that EBPF solves this, it just says, you can't have loops. Everything has to be deterministic. And so essentially what we're saying is we can just write entirely arbitrary programs that kind of have unpredictable terminations. So in a way, you could say we're not Turing complete. So yeah, in that way, the kernel actually makes sure that we're at most executing, I believe it's a million instructions or something along those lines. And therefore, that's the amount of things that we can do. And if we can accomplish a task that we want to do with our EBPF program, then great. If not, then we're going to have a bad time with the EBPF barefire. But assuming that that all goes well, we can then attach it to our hooks. And lastly, to kind of complete our understanding of EBPF, how do we actually, as engineers, kind of get that information out of kernel space? I've always been saying that we execute this code within kernel space. And we can read all this interesting stuff. By the way, with EBPF, you cannot modify arbitrary memory. You can only read it. The only thing that you can modify are things that you've specifically kind of declared to be writable. And these are called EBPF maps. So prior to kind of loading this program, you have to find a new EBPF program. I want to have this map that, I don't know, process to a number of syscalls executed. I'm just making things up. But this is something that you could then share with user space. And user space can actually read this information. And that's how we write observability tooling using EBPF. We use these maps in order to communicate the things that we measure in kernel to user space. And then we can export it as log lines. We can export it as Prometheus metrics traces. You know, whatever is useful to us as engineers. So EBPF has kind of been a buzzword for some time now. And so one of the things that we wanted to do with this talk is kind of resolve a couple of misconceptions. And the first one that I think is kind of the most classic one. And I guess you are all here. You maybe already understand this. But it's not totally unexpected that this is a misconception because it is kind of the history. And Natalie is going to tell us a little bit more about that in a second. But is EBPF is just this networking thing, right? It's just for networking. Well, this is false. Obviously, we're in a talk talking about using EBPF for observability. But for a long time, this was kind of the first question that we would always get when we talk about EBPF in the setting of observability. So we can attach EBPF programs to anything that allows us to attach a hook in the Linux kernel, right? So networking events are definitely a major case for that. But there are a lot of other use cases that we can use this for. The next thing that I want to kind of uncover is whenever people talk about EBPF, they think, ah, this supports all the languages. And again, there's a small amount of truth as Natalie and I were discussing just before the talk in this where, yes, we can kind of read memory from any language from within EBPF. But is that actually useful, right? Like in an interpreted language, for example, what does a register actually mean? It's not necessarily the same thing as it is in a natively compiled program. And we'll see more about that later. So here, the kind of misconception is definitely a lot of white language support is definitely possible. But just because you're using EBPF or just because you're using the EBPF tool doesn't mean at all that there's white language support. And then ironically, there's kind of the opposite misconception, which says interpreted languages are completely impossible to be kind of compatible with EBPF. This is also false. Again, we just need to put the work into it. And again, we'll talk about more and resolve this later. So I've talked about kind of the intro of everything. And now Natalie is going to tell us, how did we get to this point, kind of the past of EBPF? So for us to then talk about the present, what is going on in the ecosystem today. Thanks for that great introduction. So let's talk about how EBPF came about and how the past of EBPF can kind of help us understand what the future might look like. Some of you may have heard of the concept of crossing the chasm and kind of the product side of things. And it's basically the idea that all new technology has sort of a phases of adoption that it goes through as it matures and becomes mainstream. So early on, you have people playing around with cool stuff. And they're doing it for fun. They're doing it because it's shiny and new and they're tinkers. And those people are sort of the earliest phases of adoption in a technology like EBPF. After a while, the cool stuff that the tinkers are doing kind of catches on to some early adopters who may be a little bit ahead of the curve in terms of thinking about the latest and greatest. But they're trying to solve a specific problem. And these two groups kind of comprise the people that really like the new and shiny. And they drove a lot of the progress behind EBPF that we're going to take a look at in just a second. Later on, what you end up with is people who are trying to solve a problem. They don't really care about the technology per se that is used. They just have a problem they needed to solve. And they're going to use what's considered to be the best in class solution. So when we're talking about the history and the past of EBPF, like we're going to look at the first two groups. But just my personal opinion, I think we're here right now in terms of the story of EBPF. And smart people can disagree and debate and say, you know, earlier or later or whatever. But I think that we've just reached the point where you see some great whole product solutions that are focused on a use case rather than a technology. And so I think that it's a really exciting time to be doing EBPF stuff because it sort of means that this approach has been validated. The chasm has been crossed, so to speak. So let's just kind of take a look at a super zoomed out timeline of how we got here. Obviously, there are major milestones missing from here. But in 1992, the seminal paper of BPF, which is distinct from EBPF, although you'll hear people refer to EBPF as BPF sometimes, was published 20 years past. And the BPF JIT, Justin Time Compiler, which we can talk about in a bit, was added to Linux. So there was a long time, actually, where we were in the innovator phase, where people were tinkering around with this thing. And it did see some great adoption in certain use cases. But a long time passed, actually, before we saw some of the mega-developments that have been happening within the last 10 years. In 2014, BPF becomes EBPF. And like I said, we'll talk a little bit more about that. And it's added to Linux. And to me, this is kind of the moment, the birth of EBPF as we know it today. And then in the time after that, what you start to see is sort of higher level developments making EBPF applicable to more use cases, making it easier to write. And you start to see this explosion of events. And these days, it seems like you can't go to Hacker News without seeing something about EBPF. So what was BPF? And how is it different than EBPF? Why was it created? So BPF, I think actually in the original paper, they called it the BSD packet filter, not the Berkeley packet filter. But back in the day, late 80s, early 90s, there were two guys at the Lawrence Berkeley Lab. And they were really concerned with network problems. Network problems are evergreen. We will always have network problems. Probably we always will. We always have. But I think back then, it was even more front of mind in terms of computing, because it was just a real pain to operate and troubleshoot on a network. Like, we think it's bad now. It must have been 10 times worse back then. And basically, what their goal was was how can we allow user space programs to define rules to interact with raw, unprocessed packets? So I think that a lot of times when we think about packet filters, we think about firewalls. And that's a really legitimate use case of a packet filter, because you're trying to say, hey, don't allow these packets, but do allow those. But there's also this use case of just trying to see what the heck is going on in your system. And that was the one that they were actually more motivated by. The problem was it was really hard to do this kind of packet filtering logic in a performant way. There were solutions that existed for this. This wasn't the first one, but they had serious performance issues. And it made the day-to-day of troubleshooting network issues to be a real pain. So what they came up with was a new packet filtering architecture. This allows the user space programs, such as something that monitors all the connections you have or a firewall or things along those lines, to define rules that execute in the kernel. And this was a really big deal, because it basically allowed for the mind-melding of the kernel space and the user space for this particular use case. So what happened in that long period from 1992 to 2011? Well, BPF was considered the state of the art for packet filtering. It was integrated into TCP dump, wire shark. You can see a little screenshot of wire shark on the right there. And like I said, eventually the just-in-time compiler for BPF, which made it significantly more performant to run BPF programs, was added to Linux in 2011. So in some ways, you may have thought back, then, this is the technology it solves. This problem, it's great. But everything changed in 2014. The first version of EBPF, which stands for the Extended Berkeley Packet Filter, was added to Linux in 2014. And it built upon this mind-melding of the kernel user space concept of BPF in some seriously important ways. So one thing, pretty cool, is that rather than using a domain-specific language like BPF had with really strict sets of commands that you could do, EBPF programs are written in C. Now, as Frederick said, it's not arbitrary C. You can't do loops. But writing in C made it significantly more powerful to do a wider variety of tasks and more accessible to people who knew C. EBPF can invoke kernel functions. So the capabilities that you can actually access with it just broaden dramatically. I mean, just think of the power, the difference between saying, all you can do is operate on packets and saying, you can call a kernel function. That gave it a lot more juice. The edit part is a little iffy, but like, EBPF can view the raw kernel memory of what is happening. Like, this was huge. And as we discussed, the verifier, that was what made it all possible. So the verifier is what made this safe to do. Because when you see it at first glance, you're like, oh, no, no, no. So that was a huge, huge deal. And almost immediately you see cutting-edge new technologies being developed once people saw that this has been merged into Linux. So EBPF exists now what? What happened? So there are still some obstacles. It is hard to write EBPF programs. I mean, it still is to some extent. I'm sure that all of you have done it, have run into some struggles. But back then, it was even harder. I think back in 2014, it's still pretty obscure. We're kind of at the verge between the innovators and the early adopters at this time. So people talk about it, but you probably are deep in the Linux ecosystem to actually know what it means. And people still mostly think of this thing as a packet filter, because that's what its kind of original technology has been for 20-plus years. Coming on to the stage then, LibBPF, making it more portable to write BPF programs across different platforms. And BPF Trace and BCC, which made it a whole lot more accessible to write these programs in a higher level API. This made it a lot easier for people to start messing around with it and getting stuff done and just play around with what it could do. Now it's more accessible to write these programs at this point. More people try it, and people start to realize that this thing can do a lot more than just networking. There was an explosion of adoption. I'm not sure why that. OK. So like I said, you can't go to Hacker News without hearing about BPF these days. Pretty much every major company, at least every major tech company on the planet, uses BPF. And you start to see these higher level projects that are more use case specific. Things like Parca, Pixie, Falco, Silium. And these things are helping people solve a problem utilizing BPF, but you don't have to be an BPF expert to use those technologies. So just for a little mini demo, I know we're maybe a little short on time, so I'll keep it quick. It is true that the networking and reading packets and stuff like that still is an important use case of BPF. And that's one of the capabilities that the Pixie open source project has. And so just with a single command, you can install a bunch of probes on your system with the Pixie project and see things like, what were the requests in my system? I can see the headers. I can see the response. I can see the body. I can see it for encrypted traffic. I can see it for unencrypted traffic. And so the visibility that this can provide for request tracing types of use cases is still a really big deal, although it's far from the only thing Pixie can do a lot more than this, too. It's still a really powerful thing that people still get excited by. I'm going to transfer this back over. There we go. OK, here's a good note to switch off on. Let's test. OK. So yeah, going from the past to the present is kind of this shock to reality. We've seen all of these things, and we've seen the demos. We've seen request tracing. We've seen network routing, all of these things. But everything seemed so easy. But the harsh reality is that, unfortunately, in the present, where we're kind of trying to, and exactly what Natalie was saying, kind of building these tools that aren't searching for a problem, where we weren't searching for a problem to solve with EBPF, we were already trying to solve problems. And EBPF just happened to be a great tool for it. And I happened to work on the Parca open source project. Natalie happens to work on the Pixie open source project. And I think both of these are really great kind of tools where this is exactly what ended up happening. We didn't think, ah, EBPF is a really interesting technology. Let's try to build something with it. Actually, for me, with the Parca open source project, we actually, for the first couple of years of the project, didn't touch EBPF at all. So for those who aren't familiar with it, Parca is an open source continuous profiling project where we kind of try to profile all of your workloads in your infrastructure all the time. And the only reason why we're actually able to do this at low overhead and across languages, remember, from earlier language support is hard. But it is actually starting to become the reality in the present. So we're trying to kind of profile everything across any languages across your entire cluster. But kind of going back to the original kind of sentiment of what I was saying. This is literally the documentation for BPF Trace. And it tells us to, you know, it's very easy to do. So you just run this one command and you'll find the string to input into BPF Trace. And then you can attach this probe down here with this example code. And you can trace how often this read line function is called. And you could do this with any program, is what they're claiming. But the reality is, and this is where we're finding ourselves today, right? Like, it turns out symbols are typically not actually in production binaries. They kind of make the binary size very large. And so in most Linux distributions, for example, these are just not available. And so how can we actually, so we're kind of finding ourselves now in this reality of all these demos were really cool. But in true reality, where we don't actually control absolutely every parameter of our deployments, reality is hard. But unfortunately, there are some smart minds in the community that have already started solving these problems. So specifically for symbols, let's say we're here. I have the example of MariaDB, a fork of MySQL, if you're not familiar with it. That's, you know, popularly deployed. And it, the production binary that you install from Ubuntu does not have symbols like we saw on the previous slide here. But there's this really awesome tool called Debug InfoD, where you can just request these servers. And all of the kind of major Linux distributions run a server like this, where you can just say, hey, give me the symbols for this particular binary. And as we can see here, on the first command that I ran, the file command, it actually tells us the build ID. This kind of uniquely identifies the binary. We can use that to then request the symbols for this binary. So we actually have symbols available now, if we know how to obtain them. And it turns out this is still a very fresh thing. So this is the website of the Debug InfoD project. And really only three of the observability-type projects out there actually implement this. The PARCA project that I happen to work on, PERF, the typical profiler used in the Linux ecosystem, and BPF trace. I guess that's somewhat unsurprising, right? Like the kind of pioneers in this space trying to make sure that the tooling that they created works. But this is kind of starting to become the reality in the presence, right? So we can actually obtain symbols. So one problem solved in the present, right? Next problem, let's look at an example program. Here I have a Go program that just calculates the Fibonacci sequence. Whatever number in the Fibonacci sequence you give it, it'll calculate. And this is a recursive program, right? So when we look at the stack traces of this, we see the main function first. And then however many recursions we happen to need to calculate the Fibonacci sequence. And if we look at how an operating system actually executes this, we kind of need to look very low level of how machine code is executed. So what we have to do, and maybe you remember this from the computer science class or something, and even if you don't, I'm about to explain it. The way it works is that we build something called the stack. So this is where all of our variables whenever we have a parameter in our function names, this is where they lie, the stack. It's kind of built last in, first out, right? And so in this case, we have our main function, and then however many recursions we have to calculate the Fibonacci sequence. And this is kind of done typically with this function. Like Natalie was saying, we can actually call some helpers and kernel functions to do some of these tests. And in the very good case, this is very easy to do when something called frame pointers are available. So frame pointers, essentially, while we're building the stack, when we're looking at the execution of a program from within eBPF, all we get is this random pointer into memory. That's what we're seeing as the RSP register, the stack pointer. So all we're starting with is a memory address, right? From here on, we need to figure out what is the function call stack. And when we have frame pointers available, this is not all that hard. What we do is we read the RBP register, which is where the frame pointer lies. It tells us where the next frame is. And from there, we can go to the next frame again, and so on, right, until we're at the bottom of the stack. And that's just walking a linked list, right? So that's very, very easy and very cheap to do, walking a linked list and memory aligned memory. But the harsh reality, again, is this is actually, unless you're in a very controlled environment, and fun fact, Google, Facebook, Netflix, they all enforce having frame pointers across the entire infrastructure. Because the theory is debuggable binaries are worth more than having a very, very small kind of performance game. But the reality is, Linux distributions build binaries without frame pointers. So again, we're kind of back to square one. The demo worked really nicely, because we had everything under control. But in reality, we don't. But if you ever can have the control over this, please, please, please build your binaries with frame pointers on. You're making our lives much, much easier. But we still need to solve it, right? We're still living in reality. And so how do we do that? So there's this kind of look aside table in the x86 ABI that tells us, essentially, we're in this section of the memory. Then you need to do this calculation in order to do this next jump. I'm not going to go through all the kind of nitty-gritty details of how that works. But essentially, this was very, very, very hard to make it be accepted by the EBPF verifier, right? Like the termination and all these things is very hard to make it deterministic. It is possible. We actually happen to have implemented this in the Parker open-source project, but it almost took us an entire year to implement. If you're very interested in all the nitty-gritty details about how this kind of unwinding, using dwarf unwind tables works, some of my coworkers gave a really amazing presentation just on that topic. So if you're interested in that, check that out. But my point being, now in the present, we can actually walk all stacks from real native binaries, right? So we've made significant advances in actually making this type of tooling work in real production environments, where we don't necessarily have all the control. Now the last part is the kind of last misconception that I was saying that EBPF can never work with interpreters, right? And it's not totally wrong. So what I have here are some stack traces. And if you can't see this, that's fine. My point is just what we are seeing here is the Python virtual machine or the Python interpreter. And because the Python interpreter itself is written in C, what we're seeing are the frames of the C runtime, right? Like that's not particularly useful, though, when I want to debug my Python code. And so how do we get this to look like something that I actually understand and where I can recognize my Python code? So this is kind of in that intersection of the present and the future. We know, and there's some prior art, how to do this. Because what we can do is whenever we walk these frames, there is some aligned memory that we can access to figure out what is this actual function name in Python. And we can only do this from within BPF, because BPF actually allows us to read process memory. This is something that we would otherwise need to do with tools like Ptrace. And it's very, very difficult to do correctly. But from BPF, it's actually ensured for us that this is safe to do. So like I said, this is not actually true reality yet in the Parker open source project, but there is some prior art. And we happen to have actually hired this person who did this to who did exactly this in the Ruby virtual machine, which happens to work very, very similar to Python. So like I said, this is kind of this intersection, right? Like we know how to do it. There's some prior art, but in real production environments, we don't actually have this available just yet. So I just want to give a very quick demo of the Parker open source project. So what we're seeing here, I'm just going to start querying all the CPU time across this entire cluster. And as we're seeing, there's a lot of stuff going on in this cluster, right? So I'm just going to kind of trim this down for a second so that we can see. So I'm going to take an example of SystemD, because quite obviously, I do not have the control to instrument SystemD, right? Like I couldn't have changed this piece of code. They definitely, and you're going to need to trust me on this one, this definitely did not have symbols available in the production binary, right? Like if we were to do that on every single host, on every single machine in our production environment, we would be raising a ton of space, right? So it does actually make sense to do this sometimes. But we still see symbols in these stack traces, right? So we're kind of getting to this point where all of this actually does work in a very, very wide array of deployments without having to do anything, right? Like the whole point of the Parker open source project is that everything is zero instrumentation. You don't need to add a library to your code. You don't need to change your deployments. You don't need to redeploy anything. The only thing you do is kind of deploy the Parka agent, and you can profile your entire production clusters. So hopefully that convinced you that that actually does work. And so with that, Natalie's going to kind of introduce us to what the future holds. Yeah, I saw the one minute go up. So I guess we'll go real fast. So where are we today? Lots of use cases. We know about networking. We know also about security. There's a lot of great security use cases and observability, like we've talked about. The breadth and depth of the coverage is hardening. We are getting better and better support over time for programming languages to trace an instrument with BPF. I'm not going to talk about USDTs, but look them up. They're super cool, and they're gaining traction. Cloud providers are also under a lot of pressure to add more BPF support to their platforms. Facebook did a really cool thing with this time travel debugger, which allows you to kind of replay what happened and kind of step through it like a debugger. So this is super cool. What frontiers are coming next? I think that performance is still going to be a big issue, because we're going to want to probe a whole lot of things. And there are some performance issues when you probe something that's called all the time. Accessibility, how do I know what to instrument? I know that I want to debug this problem, but how do I turn that into a set of probes that I can use? Analysis. If you've ever used BPF, you'll see that a lot of times you're getting this fire hose of raw data. It's really hard for a human to interpret that. So what can we do to make it easier to basically receive this fire hose of raw data that's very, very low level and sort of put the next layer of interpretability on top of it so that it is human understandable? So you can go from a bunch of packets and function calls and kernel function calls to something like, hey, your thing is segfaulting because of XYZ. I am not going to have time for this, but the demo was going to be that I passed ChatGPT, a program that I wrote, and then asked it to generate a BPF trace program for me. This actually worked pretty well despite the fact that the training data for ChatGPT stopped in 2021. I think that we will see an increased adoption for LLMs with this type of use case to make it easier to write these programs and to interpret their output. Thank you all so much. I know we're over time, and we're over here to chat after this.