 Good morning. Thank you so much for coming to our session this morning, just about still morning, right? I hope everybody's able to find a seat. If you have got a seat next to you, maybe shuffle along and let people in. Yes, so my name's Liz Rice, myself and John here. Do you want to say hi? Hi, my name is John Fassman. I'm an engineer at Ice and Violet. I'm excited to talk about BPF today. I've been working on it for almost 10 years now. Hopefully you guys get as excited as I am. Yeah, so, I mean, as John says, he works in the kernel, right? He is a proper expert in EBPF. Let's just quickly recap to make sure everybody understands what we're talking about with EBPF. It allows us to run custom programs in the kernel. So we can change the way the kernel behaves by loading EBPF programs there and attaching them to events. Why is the kernel interesting? Well, the kernel is the part of the operating system that is involved whenever we're doing anything with hardware. So if you're accessing a file or sending or receiving over a network or even allocating memory, the kernel's going to be involved. The kernel's also managing the processes running on a system. So it's looking after things like permissions and privileges. So the kernel is involved in pretty much everything interesting. With EBPF, we can change the way that it behaves. So already, there are lots of really great infrastructure tools built using EBPF. I mean, we're both very involved in Cilium and Cilium Subproject Tetraglon, doing things like networking, security, observability. There are also loads of other great projects out there and great products out there that are doing things in these kind of infrastructure tooling spaces. But sometimes you'll hear statements that give the impression that there are limits to what you can do with EBPF. So for example, we've heard people saying, yeah, you can't really implement layer seven parsing. That's just too complex for EBPF. Or another thing that you'll quite often hear people saying is, yeah, but there are limits to what you can do with EBPF because it's not Turing complete, right? So do we think those statements are true? Well, a hint here is these things were not said by people who are working on EBPF. We're going to explore these kind of statements today. So I've said the term Turing completeness. This is not going to be a computer science class. It's not going to be like super pedantic about terminology. But really what Turing completeness is about is can we process some arbitrarily complex task? Can we process an arbitrary amount of data? Can we continue processing for an unbounded amount of time? Do we have the ability to store state so that we can continue to process state? So this is, broadly speaking, what Turing completeness is about, right? So I think it's interesting just to talk a little bit about some concrete examples, right? I mean, we could talk about the mathematical formulation of Turing complete, but that's perhaps not interesting to everyone here. So typically when we think about Turing completeness, people are really talking about your general purpose languages, C, C++, Java, Python, Go, all the things that we programmers love. But I also want to point out that there's a lot of really also interesting, but maybe less interesting for programmers. Nobody really wants to write your application in the one instruction set computer. So that's a totally valid Turing machine. It is Turing complete. But try to write a proxy with just the move instruction, and it might not be quite as fun as you want. Another example is the game of life that we'll go back to. But there's just loads of these things. And the point is, try not to equate, I think, Turing completeness with sort of usefulness as a programming language to developers, right? I also want to note that there's some actual benefits of not being fully Turing complete sometimes. You might be interested, for example, in having a parser that is bounded. And why would you want that? You might want that because you don't want to receive a packet and then have your parser loop forever. Really would like the thing to stop at some point. It's the same way in BPF space where we do syscall and tetragone will watch syscalls or watch kernel functions, right? We really like to have an upper bound on the time that we're going to impact the application or the operating system. And when you do that, you want to say it's really nice to say this will not run longer than a millisecond or 500 nanoseconds or something like this, right? And that really is not Turing complete because we're putting a bound on that environment. But there's really concrete advantages to that as well. So I just want to lay that out there and give you some examples of Turing complete things that are both useful and not useful and an example of maybe why you would want some bounded properties as well. And Turing completeness talks about an infinite tape, right? And there is no infinite tape in the world. So in reality, we're always dealing with just large rather than infinite. Yep, exactly. So that previous slide mentioned Conway's Game of Life. And this is an example of something that's Turing complete. So this is something that every step, it goes through generations every step and each of those steps, the state of one cell is determined by the previous state of the cells around it. So it's a little bit like an actual living cell. If the cell has no living cells around it, it's gonna die of loneliness. If it has too many cells around it that are alive, it's gonna die of overcrowding. But if it's got the right number of cells around an empty space, the cell will actually come to life. So this is an example of something that can kind of go on forever generating these patterns. Very kind of common example of something that uses an equivalent of Turing completeness, yeah? So if this is an example of something that's Turing complete, is it something that we can write in EBPF? And one reason why you might think, I don't think it can, is because of this thing that EBPF verifier. So when you load a program into the kernel, the verifier is gonna run over this program and analyze the program and establish whether it's safe to run. And one of the things I would normally say when I talk about whether it's safe to run, I'll say it's not gonna crash the kernel, but I would also say that it's gonna run to completion. We might pause on that later. Do you wanna dive into a little bit more of what it's really doing? Yeah, yeah, so I think users of EBPF or even people that are higher in the stack, they might think about the verifier as this kind of thing that is there stopping you from writing interesting programs, perhaps. As a kernel developer, though, I wanna kind of lay out what do we, as developers of the kernel, that we wanna ensure that the kernel's safe to run and all this kind of stuff, what do we think the verifier is doing, right? And specifically, this is kind of my criteria. The first thing we always wanna do when we load a BPF program into the kernels, we wanna make sure that we can only read memory that is allowed by that program and by your permissions that you have when you load that program. And BPF has a whole series of places you can hook in the kernel. You can hit kernel functions. You can hook the networking stack. You can even hit user space. And depending on where you hook, you'll have different permissions on what kind of memory you can read, right? So we wanna make sure that your program is only reading memory it's supposed to. And we definitely, 100%, wanna make sure that you're only writing to memory that you're allowed to. You wanna make sure that you're not writing to random spots in memory. You're only writing to valid locations in your BPF program. The next one is we wanna make sure that the control flow is well formed and that it's valid and in balance. We don't want you to jump your BPF program off into the kernel somewhere and run an arbitrary code and crash the kernel, right? Don't wanna make sure that happens. And I think the fourth point here is really interesting because we've actually evolved this over time. It's like, what does it mean for EBP programs to be bounded in the kernel? And if you think the original BPF that meant, the program ran to completion. But over time, we've sort of evolved BPF. Now BPF programs know how to sleep. We know how to do callbacks and iterations. We know how to keep a program running and we'll go into that in a little bit. And really what we want to verify and the verifier is that we don't get a program stuck on the CPU so long that it looks and feels like the system is hung, right? And for kernel developers or people that are really system level programmers, I think this is a sort of intuitive concept because you've been writing in this low level. If you're a higher up in the stack and maybe an application programmer or kind of a distributed systems programmer somewhere in there, it may be foreign because you don't have to worry about this stuff because the OS is doing all the context switching, right? You never write an application in user space, C++, C, go, whatever, where you go. Like at this point, I need to make sure that I release the CPU, right? Because the OS just deals with that for you, right? Once you get yourself into the kernel and kind of the deep system stuff, you really need to tell the OS when it's okay to let your program off the CPU, right? The reason we do this is really if you're in an interrupt context, if you're trying to write some kind of locking and stuff like this, we can't just release it at any arbitrary point. And then the last thing we wanna do with the verifiers, we wanna make sure locks and references are all accounted for. And that's sort of an accounting detail, but it's important if you're gonna reference a socket or reference a file, walk a path. You wanna make sure that those things exist when you're doing that. And that's just kind of a correctness property. Those are the things that I think about when I'm working on the verifier or improving the verifier over time. And I think as we'll see later, this idea that EBPF continues to evolve and continues to improve over time because of the work that people like John are doing, this is why we're able to perhaps do more with EBPF than you might imagine, because it continues to improve. So, John, can we run Game of Life in EBPF? All right, we're gonna give it a try here because we like to do demos. So what you see here is, this is just basically Tetragon, but we added a new sensor to Tetragon called Game of Life. All right, and so we're gonna kick it off. What it did here is it just created the very similar to what Liz just showed you with that web app, is it created a map filled in some cells with some random pattern and we could load it from a BPF map or something if we wanted to, but for the demo it's just a random map and now it's running and there it goes. All right, so while that's running, let's dig into how you were able to get this to work. So this is just a brief, I'm not gonna go into all the details of everything here, but this is basically the core code that is running right now in the background and we'll get back to it, but the important pieces I do wanna call out is that very far left for you guys is do game. This is our core game logic for life here and you can see kind of three things. We copy the cell map, it's basically taking the old state, copying it in the new state so that we have a good state for the next iteration. We do the next iteration, which is all of those rules are run that Liz was talking about. You check to see if your neighbors are alive or you create a new cell or you remove a new cell. So that's the sort of the business logic of this program and then we send an update. All right, so what the update does is it sends that map state up to user space. Unfortunately, I don't have a BPF graphics library yet, so if somebody's interested and they're like a GPU shader from BPF or something to come talk to me, let's make it happen, but I couldn't do it from BPF so I had to send the map up to user space, slightly disappointing, but doesn't really impact the logic of the game. So the user space is actually taking that BPF map and just drawing it on the screen there is what you were seeing. And then the last piece is this run next, which just reruns the business logic but goes back to the beginning of due game and runs it again every two seconds, all right? And we'll talk about how that happens. Other slides you can get to in your spare time later if you're interested. So you may have heard that BPF programs are limited to only a small number of instructions. So maybe let's dive into that. Yeah, what's really interesting here is when we first landed BPF in 4.14, back in like 2015, kind of the first iterations, we really started at 4,000 instructions and why did we pick 4,000 instructions because it seemed like a good number. It's, you know, kernel developers like numbers with K at the end of them, 4.96 sounds good, right? That was interesting, but we very quickly realized that, you know, 4.96 is kind of small, right? You might want more instructions. And the other thing I think is important, why do we have to have a limit was not primarily because we need the program to terminate in some finite time and we decided 4K instructions was a good amount of time, but I think it's really, if you think about it, we have to verify that program and we really want to make sure that when you load a program into the kernel, it'll take some reasonable amount of time, less than a second, right? If you loaded a 20 million line code program, perhaps, maybe it would take 20 seconds and you know, we kind of thought, well, that's a long time, right? So keep it bounded, but at one some point we decided that 4K was just too limiting. People want to write things that are bigger than 4K and so in 5.4 kernels, we went to 1 million and it kind of beefed up that allowable limit. Just to kind of follow up, you might wonder, like, that's interesting, John, but give me some kind of reference, you know, like how many instructions really is a million instructions and so what I did is just, this is kind of a napkin approach, you know, go look at a few programs I had on my laptop. Of course I had DOOM, right? Got to have DOOM there. So, you know, I took a look, it's, you know, less than a thousand K instructions for a basic kind of DOOM clone, if you think of the original DOOM. And then I had Envoy on my system because, you know, we work on Silium and Tetragon that uses Envoy as a kind of one option for doing proxies. I took a look, 15 million instructions and then Clang, you know, compiler stuff, 26 million instructions. So that's kind of where we're at. You know, you might come back and say, okay, DOOM is only maybe less than a thousand, 100,000 instructions, but John, it might have some loops. How are you going to deal with that, right? And then, stay tuned, we'll talk about how to get these kind of much larger cases going forward. So, there's also this idea that, and it was true in the early days, that you couldn't do loops in BPF because you were only able to jump forwards in a program. Yeah, so this was something that we put in, and not because necessarily BPF, like folks that were working on BPF thought, hey, loops are somehow bad. It was purely like the verifier needs to somehow ensure that this isn't going to run forever, right? And that was the early days. And so, simplest thing to do is look at your program and ensure that you don't have any jumps to code that you've already been to, right? We want to make sure that the program control flow is a dag, basically a graph, and no loops in that graph. So, early programming, what we did is we did something called an unroll. Basically, you tell the compiler, I'm not allowed to do loops, please just cut and paste this code, right? It'd be very similar to a developer just doing like cut, cut, paste, their loop many, many times. You kind of do that, but you can see very quickly that you're not going to be able to get things to run very long, right? Especially in 4K instructions, but even with a million instructions, you're kind of only so much can be done. So clearly not an approximation of forever, right? When you're kind of in this space. The next piece here is we have, as we were evolving BPF, this became a very obvious sort of limitation, right? The fact that the, limitation in two ways, I'd say, is one is it makes it hard to write programs, and two, it makes it hard for the compiler to compile things. Optimize your code, basically. So we added loops. This allows the verifier to look at a loop and verify that that loop is going to terminate. And then we added actually a helper call for cases where you want the BPF program to do a loop, but the compiler, human and verifier are having trouble all agreeing on how that loop is going to look like actually, and if you think about the simply code, an object code when it's loaded in the compiler. And once you have this, you can write statements like that. The bottom where you have a for loop, H here is a kind of a pound define or a global variable that tells you how the max loop size and the verifier is perfectly happy with this today. So that's, we're allowed to loop, but we're not allowed to loop indefinitely at this point, right? Correct, yep, that's why we have that max there. And so if you just kind of take a layout of what kernels these were added in, you know, 414 and 419, we had our kind of basic cut and paste model, 5.4, we added loop support so the kernel could figure out if these loops are going to terminate. 5.15 made a bunch of technical improvements to that, so the verifier got much better at finding these things. And then 6.1, we added this call that you can do to basically, you as the human can tell the verifier through the compiler that, hey, this is going to be a loop, please take a look at it and make sure that it terminates, yeah. And I just wanted to call this out because we have people working on this. The BPF site is just evolving continuously, so there's another type of loop coming, we're not going to talk about it here, find me afterwards if you want, but even a more improved version of looping is on the way. Okay, so we've got a limit to how many instructions, but it's quite a big limit. We can loop. Yeah, yeah, yeah. What about these big programs, right? Yeah, what about these big programs? Can we do envoy or clack? Would you write, in theory, a $15 million instruction, a BPF program? That would be great, right? The thing that is interesting, and I think it's a choice we made very early on in the BPF kind of verifier, is we call these things sub-programs. And we said sub-programs must terminate. Roughly equivalent, a sub-program is really roughly equivalent to a function. And so the limits are actually on the functions, not on the entire program. And as sort of a backend napkin, you can do like 31 function calls, or 31 sub-programs is what a BPF programmer would say. And really, that gets you up to 15 million without much trouble, 31 million, actually, with 31 calls, right? You can actually do more in the kernel, depending on where you're actually being, or where you're hooking. But sort of as a rough estimate, we can pretty easily get to 31 million instructions, which is a really big program, right? You can, all of clang is not even 31 million instructions, so. Give you some words of warning. You know, this, the verifier is getting better and better and better, and people are working on this and making it usable for humans to write code, right? But really the trick right now is the human's gonna write the code, the compiler's gonna compile it, and the verifier's gonna verify it. And they all need to kind of be in sync and understand what's going on, right? To get the kind of something into the kernel, right? And this is why it's just not as easy as it should be. You know, the team is working on this. I fully predict that in five years, 10 years, something like that, BPF will be easier and easier to write and kind of open up that number of people that are willing to dive in and really get into the BPF space. And then the last thing, you know, if you do go this way and you maybe have the right mentality, it might be addictive. You might try to write things like Tetris and Doom and Quake and Game of Life in the kernel. You know, the utility of that is kind of great for KubeCon talks, right? I'm not sure, you know, like, Isovalent Enterprise will have a, or, you know. Yeah, we're doing the Isovalent Enterprise launch of Game of Life. Yeah, exactly. You heard it hit first. Yeah, yep. So Game of Life, you know, it goes on forever. So what about this idea of not terminating? Yeah, so this is an interesting one, right? So like, if you want something to run forever, I've told you there's a maximum instruction. You're like, well, certainly you're gonna run out of those instructions at some point, right? And this really goes back to this kind of carefully worded point in the verifier where we said, well, it's not just that we want the program to terminate, it's that we want to release the CPU that we're on. We don't want to lock a CPU down. And like we said, early on, early kernel versions, that meant the program terminated. There was no exception to that, the program was done. But when we evolved the kind of kernel and BPF and the verifier and the compiler as well, like all these pieces are moving forward, we developed these kind of other ideas of kind of really other ways to release the CPU. One of them was an iterator. This is a program that can like run over every file in a directory, right? It doesn't matter how many files are in there and it'll basically ensure that the CPU doesn't get locked up, right? It's been iterating for too long. It'll release, let something else run, the scheduler will come back and say, okay, BPF program, you can run again. We've allowed programs to sleep now. So this is really useful if you're trying to read some memory and it gets the page fault. So the memory's not in the page cache, so you can't just read it, because it's not there. And so you have to do something called a page fault, it requires sleeping. Anyways, it's in the details, but super important if your BPF program tries to read some memory in it. You don't want to get a fault. We added timer callbacks. This is basically a way to say, hey, I've done something useful. I'm gonna release my program, let the OS do something, and then just call me back at some time. You can even set this to zero, which just looks like, hey, I'm done, but I'm safe to release, but call me as soon as you can again. And I guess the last thing is, how can an BPF program allocate some amount of memory? Yeah, so there's really a bunch of different ways we can do this, but the most obvious one is that we have these array maps and the maps are basically memory blocks that the program can allocate, or user space can allocate and then give to the program. And these can be actually quite large. The basic limit of these is, whatever the user space is allowed to allocate based on its C group, if it's not in a C group, they can be more or less unbounded and kind of related to system memory and all this kind of stuff. We have some limits, but those are tied to how much RAM you have in your system, right? Same type of thing that a user space application has, it can't allocate more memory than your system has available, right? So we could potentially have a pretty big field of play for our game. Yeah, and even four gigabytes, we've sort of optimized this case in newer kernels to allow applications up to four gigabytes of virtual memory. Amazing stuff. So with all of these component pieces put together, we've really avoided the limitations, right? We don't really have an effective limit on how many instructions, we don't have an effective limit on how many times we can call a loop effectively because we can use these timer callbacks. Shall we see if it's still running? What do you think? Is it still running? Yeah, so see how good I am at programming, right? If it's still running or if it's still going. So here you go. Looks like it's still up and running. You can see it's still doing its thing. And if you go back to kind of some of the details, I think we can talk for a couple of seconds here. Basically, if you think about that first program that was doing that do game, what we see here is it does a copy of the last state into the new state, runs the logic. That's what moves all of these fields around on the screen, right? And then sends it up to user space and then this is the user space just kind of printing the dots, one is on, zero is off, printing the dots on the screen. And you can see it kind of keeps evolving over time. Every two seconds is the limit just so that you all can see it. So there it goes and it's still going. You told me a pretty interesting thing about the fact that you can't predict the future state. Yeah, one of the neat things about game of life and a lot of terrain machines in general is that this thing will, if you wanna know what the thousandth iteration is or the 10,000th iteration of this, there's no shortcut to that from a kind of a mathematical side. You can't say, tell me, I'm not gonna actually run this thing 10,000 steps. Let me just run it two times and then tell me what the answer is. So it's interesting in that sense is you really have to run this thing to figure it out, right? And we could rerun it and see what pattern we get. Sometimes you'll get interesting patterns. This one is still running. Soon as it's a random pattern, we could have been unlucky, right? We can have the screen be black when we turn this on, but... But it didn't. It worked great. Random applause for a live demo, please. That's great. Perfect. Yep, go back. Amazing. So we have game of life in EBPF, which is really a good demonstration that you can today do arbitrarily complex tasks with EBPF. So those statements about things like, can you do layer seven protocol parsing in EBPF? Absolutely, it's possible. Doesn't necessarily mean that all these things have been implemented yet, but it is possible to do pretty much any processing. Maybe not necessarily to do the display part yet, but the actual computation part, the processing part. And I would say the key is there not yet. I mean, I think the really interesting thing, or one takeaway I think from this talk is like, there's a bunch of folks in the kernel, in the compiler community, and we're working to make BPF better and make BPF improve BPF. So if you have a use case, and if your use case is graphics on GPU for your KubeCon talk, come talk to me, it would be interesting to know. So these things can be, we can make them happen, right? And 10 years ago when we got started, we had this very small scope, and it just keeps growing and growing. The community's getting bigger. It's kind of its own sub-community with EBPF, but I think definitely if you have a use case and you think there's value in it, then you can come talk to us and we can figure out what it takes to make it happen. And I think when we think about Cilium, Cilium is using EBPF for lots of network-based processing, and our vision is to push more and more of that processing into the kernel. So yeah, there's tons of networking-related processing happening in user space today, where I'm not gonna tell you that we're writing it and it's gonna be released next week, that's not what I'm trying to say, but I'm trying to say the vision is that more and more of this processing will be happening in the kernel. Yeah, and I mean, I would also just say, absolutely the same for, I work on TetraGon mostly these days, it's the same thing. Early we started on TetraGon a couple years back, you know, there was missing features, we added the features and then you figure out how to bridge the gap in the interim, but if you look forward in two, three years, you can see like all the cloud providers will have that kernel, only distributions will have that kernel, and you can start to use that functionality, right? And that's, I think, a really powerful kind of point of the open source effort here. So I hope that's convinced you that EBPF is, well, probably more powerful than you thought. If you wanna learn more about EBPF, we have some books that you can download from isovalent.com. There's also a ton of really great labs on the website. There is the EBPF.io site, which has loads of information about EBPF itself and lots of projects that are being built using it. We're also doing, well, I'm gonna be signing some books at the isovalent booth at one o'clock today, so if you wanna come round and pick up a copy of the book, then please say hello. And with that, I hope you have a wonderful wrap-up to the end of your keep-con. Thank you. Thank you.