 Yeah, let's start. So hi, everyone. So my name is Quentin. I work in the Linux user space team at Facebook. And we're going to discuss a bit about IP tables. More specifically, what's called BP Filter, which some of you might know. Sorry. So we'll discuss what's BP Filter and what is relation to IP tables and what we can do with it. OK, so first thing, what's IP tables? So I'm going to talk about IP tables a lot. But it will mean different things. We'll have Net Filter, which is the filtering framework in the kernel. We'll have NF Table, which is a front end for Net Filter, which use Netlink to communicate with Net Filter in the kernel. We'll have also IP tables legacy, which is kind of the same thing, like a front end from Net Filter. But the difference is it communicates with the kernel using getSockOpt syscall. And we'll also have IP tables, which is the new version of IP table, which use the Netlink interface, like NF Table. Is that clear so far? Because it's a lot of different things. That sounds the same. So Net Filter is the standard package filtering framework in the kernel. So if you want to filter something, you go through Net Filter. I guess most people here use BPF to do that. But it's specific to this room. And I guess usually it's just Net Filter and IP table and NF table. It's, for example, the default firewall in Fedora. And most of the distribution, like firewallD, for example, it's using Net Filter. So Net Filter is fine, but it's slower than BPF. But on the other end, it's way easier to use. You don't have to write any code or learn, see, or whatever. You just write the IP table syntax and it should work. And there, the BP Filter story starts. So it started years ago in 2018 with patchchairs from Alexei with change from David Miller and Daniel Bochman, which is named BP Filter, but it's not really filtering. It's just a user-made helper. So basically, a kernel module that will, when loaded, start a user space program, communicate with it using pipes. And that's about it at this point. Then at some point, Dmitry Banshikov, I may be battering that, but sorry if you hear Dmitry, has been continuing the series, submitting a V1, then a V2, which is effectively filtering packets. So what happened is that all the get-suck-ups call coming from IP tables with the filtering rules inside were cut by BP Filter and were translated into a BPF program. So effectively, what happened is that it speeds up, sorry, the packet filtering coming from IP tables by creating BPF programs. So it's much faster. And the good side of that is you don't need to change IP table. Every change is in the kernel or in the user-made helper. So it's transparent. You don't have to update IP table. You don't have to change anything. You keep your same rule. And it's suddenly faster than it was before. There are benchmarks of that kernel module at the time in the patch areas from Dmitry and on a stillm blockbuster iPhone yesterday. I don't have much benchmark right now because it's not relevant anymore. So the old one were based on XDP programs to represent the filtering rules. And BP Filter evolved a bit. So it's now based on TC and soon to be on the BPF net filter hooks. And that's the point where I joined BP Filter, been working on it. So I've been working on a V3 of the patch series as Dmitry stopped working on it. I submitted it in December. But then at some point, we found out that we have to find a new home for BP Filter. So it's currently still in the kernel tree, at least part of it. And the patch area is applied to the kernel tree. But the issue with that is it's a bit slow to get it to get reviewed and merged. It's user space code. And I guess kind of people have better things to do than review user space code, which makes sense. And the other thing is that some use cases are not possible with BP Filter as a kernel module. And we have specific internal use cases in which BP Filter as a user space tool would be more interesting. So at some point, BP Filter has been moved to a GitHub repository under the Facebook organization on GitHub. And I've been working on moving that to the kernel module into a user space service daemon. If you go to the repository, it's still a kernel module, a native tree kernel module. So you can build it, use it, and whatever. But my fork currently has BP Filter as a user space daemon. And what it can do right now is you can translate from a set of IP table rules. You can translate that to specific BPF programs. You can have the counters. I also have a proof of concept of IP tables, which is modified to call BP Filter directly. So from IP tables, it would generate a BPF program and insert it into the kernel and attach it to the right hook. Let's dive a bit more inside of it. So that's basically what's an IP table table. That's what's sent from user space by IP table when you add a rule. If you add just one rule to your rule set, it's going to send the whole set of rules for all the chain in a table every time to the kernel back and forth. So a table will be, for example, the filter table. So you want to filter packets. It's going to be inside the filter table. And inside of that, you'll have chains, which are the hook where the rules are attached. So input, forward, output, and so on. Inside the chain, you'll have the rules, which are basically used to match packets. And each rule will have zero or more matches. And a target would be the decision whether to drop or accept the packets. So as you can see, it's a lot of layers, a lot of things inside each other. And that's basically what we want to receive from IP table on BP filter. So when you do IP tables with your rule behind, it's going to send the table and all the chains and the rules inside. And BP filter will receive that and convert it to the right BPF programs. From a very high level point of view, BP filter is the orange part. So we'll have the lead BP filter on the left, which is what will be linked to the client. Let's say IP tables, for example. You have different parts inside the lead BP filter, a generic set of API, which will send and receive BP filter request and response, which is format diagnostic. It's just data you send and you receive to and from the daemon. And then depending on the client, you'll have a client-specific API. For IP table, for example, it's going to be the IPT something call. So you're going to be able to use the IP table specific structures, not to convert that to anything. Just use this structure inside IP tables and send it to BP filter, which will do the translation and try to understand what you're trying to do and convert that properly. And on the daemon side, you will receive the set of rules from a unique domain socket. You will have different parts. So the front-end, which is coupled to the client kind of. So for IP tables, you have an IP table front-end, which translates the set of rules in the IP table format inside a BP filter specific format. So then BP filter can work on a generic format, whatever the client is. The generator will then convert that set of generic rules in the BP filter format into BPF programs. So depending what you're trying to do, you're going to have more than one program. If you're trying to create a new rule for the inputs filter on IP tables, you will have one program for the input filter on IP tables. If you had a rule for the forward chain, it's going to be a different program. So all the rules for one IP table hook will result in one BPF program. And the third part of the demon will be the management part, which is basically loading, attaching, managing the lifetime of the BPF programs. And it will also be used for contest management. So I don't know if that's known or not. But when you define rules in IP tables, you'll be able to have contours specific to each rule, which are the number of packet processed and the number of bytes processed by this specific rule. And so BPF filter supports the contours. Sorry. I had a question to your previous slide. So for the net filter case, you're basically using the new BPF net filter infrastructure? Not yet. Oh, OK. Yeah. So far, it's TC, which has its own set of issues. Because if you look at IP tables, you'll have the input and forward, for example. And the difference is, packet that targeted to a different host go through the forward hook. And there is no distinction of that on TC. So having the new BPF net filter program type is a good point for me on that, because I don't have to have a feed lookup inside each program. I had a couple of questions. So one of them would be around BP filter. So that's like a demon. So obviously, it can restart. So there's kind of a state, or does it push everything to the kernel, and then when you restart, it needs to restore from the kernel. Like let's say I add a bunch of rules, and then a demon needs to restart, and then I want to add one more rule. That has semantics in the way that you run it with IP tables, because everything's persisted to the kernel. So does the BP filter sort of do some restore, or store a disk and restore, or something like that when you restart that agent? Not really, because there's no need for that from IP tables point of view. Because IP tables will generally, like when you start your computer, it's going to push all the rules. The fact that the demon can restart during runtime without calling IP table restore, for example, is a different issue, which is not yet handled. But eventually, that needs to be fixed. And maybe one other I was curious about, the Unix domain socket, what the API is looking like there. Is it closer to IP tables, text-based interface that maybe users are familiar with, or is it kind of there's a new definition, like it's a GRPC, or what kind of an interior? It's just the binary data that's sent. So for IP table, you have the strict IPT replace, which contain all the table stuff I've shown. And that's binary inside IP tables. So there is a request structure on BP filter side, which is filled with that data, and received on the other side. And BP filter know it's coming from IP tables, so it knows that data is a strict IPT replace, and it can then understand what's inside. Why do you need a demon at all? Why don't you use like a IP tables tools, which is stateless? You receive something, you apply something, and you die. Because you still have the contours to manage, for example. So you need to keep metadata about what's going on, and where the programs are loaded and attached. The counters are in the kernel, right? No, they're not. I mean, the counters are... So from the demon, starting from the demon, everything after that has no relation to IP table or to net filter. So the counters are a BPF map, which is updated from the BPF programs. So if you don't have any demon like a shared library, just a shared library, you can translate and attach the BPF program perfectly fine. But when IP table will try to add a new rule, you have no idea which map you are using, which program are already loaded, and what's going on, basically. You can probably pin them to BPF first and have some state on the file system to kind of recover. Maybe. Yeah, probably, but it's way simpler this way. And from the user point of view, do you basically have your IP tables, user-based binary, and then link against the BP filter and it will do everything in the background? Yeah. Okay, that's cool. So the main point was that... So I've been discussing with Florian Bestfahl working on net filter and IP tables. And one of the main points was that there is not much change to do on IP tables, for example, if it's something I want to integrate. All the real processing is done by the daemon. So you don't have to do anything, you just push your client-specific data to it and it's gonna deliver everything. Yeah, so coming to what I would like to do now, so there is the patch series, as you said, from Florian Bestfahl, which introduced the BPF net filter program type, which allows BPF program to be attached to the original net filter and IP table hook. So only a few, I think a couple of hooks are supported right now. But that's something I would like to support as well because it solves a lot of issues on my side. But the step to get there is that I need to add support for new, what I call, BBF flavor. In that case, what it means is that I have TC, for example, and TC program where they're specific, receive a specific argument when they start and they will return a specific code for accept or drop or whatever. And TC program are loaded and attached to the kernel in a specific way. And so I have a set of functions to support TC, which I define as a BPF flavor. And the BPF net filter program types is different. So the argument passed to the program is different. The return code I haven't looked into in deep, but I guess are not exactly similar. So I need to support that new BPF flavor. And the other thing is that this program requires DINPTR to be used. So far I'm using direct package access. So I need to go through DINPTR function inside the BPF bytecode to use the BPF net filter program type. And a few more things about adding support for NF table. So that means basically adding, not this one, sorry, adding a new client type to libBP filter for NF table because the data structure is different. I need to add a new front end for NF table also. And when that's done, the rest comes for free. Once the data is translated from the NF table format into the BP filter format, everything should work fine in theory. And a few more things, the user defined chain which are for IP tables. So you have the chains which are input chain or forward chain for example, and you have the rules inside. And the user can say I want to define a new chain which when we go through is gonna log the packets. And that chain, you can jump into it from any other rule in any other chain. And supporting that in the way BP filter works right now requires a bit more work because what's happened right now that all the rules are mapped to some BPF bytecode. So literally like sequentially you go through each rule into inside the BPF program. And adding access to the user defined chain mean we have to jump somewhere to process the chain and then come back to it and continue the processing. And also yeah, extending the IP tables and NF table feature. IP tables can do a lot of things and so far I support like filtering by source destination IP ports and protocol which are the basic criterias inside each rule. But you have a lot of more matches available that I would like to support as well. And yeah, that's, oh, sorry. That's about it. If you have any question, more question. What is the plan with the user mode BP filter infrastructure, did you have any thoughts on that? I guess it should be removed at some points. I'm not gonna use it if anyone wants to do the same thing but still in the Canon module, feel free to go but I don't think it's useful anymore. And long-term, do you plan to support anything that IP tables has or just something enough for Facebook too or like what is the long-term story for? That's a good question. So we have used case internally of team maintaining their own BPF program for firewalling stuff. And that's ideally would be a solution for them not to have to maintain a whole BPF program. When it comes to IP tables, it's like the set of features of IP tables is very wide and the set of features of an F table is even bigger. So I guess it depends on what's actually used probably more specifically for us in the first time. If someone wants to use BP filter and tell me I want to use that, I might be too out, you know. But yeah, so far it's driven by getting something that works fine. And what use case we have? Okay, because I guess from our point of view, we mostly remove type of tables everywhere from the fleet except one place where we actually use NF tables. And for me, if the back end of this NF tables is BPF, it's easier to support because we have like a lot of BPF expertise internally. And now we, I guess, we have the only option is to support NF tables, which is things no one understands. So I guess at some point we'll probably check out what you have and try to switch convert. Yeah, I understand. And so far the front end is IP tables and eventually NF tables because that's how it started basically. But we have way to define filtering rules for the containers internally. And ideally we would have a new front end for that kind of rule definition. So the purpose is also to support things that are not supported by IP tables. I think about, like we have set of hosts inside what we call an SMC tier, which is a naming domain, which contain different hosts. And we would like also to be able to filter by SMC tier. So by pool of hosts, which is not possible on an IP tables, unless you do every rule for every IP. All right, thank you very much. Thank you.