 Hello everyone, I'm Quentin working at Neutronome. I've been working on EBPF over the last three years, maybe now. So I'm afraid this is yet another presentation about BPF. So the angle of this presentation would be to focus on mechanisms on Linux for filtering packets from a general point of view for mostly simple filters like I have ACS. I just want to drop some particular flows from my incoming traffic. So what do I have on Linux to just do that? So the first part is a short refresher about the mechanism we have for this. And then to see what the impact of EBPF, which was added maybe more recently that the mechanism is on this filtering mechanism. And also I would like to speak about a certain number of leads for convergence between these different models. Like how can we maybe use BPF to make things easier with other filtering mechanisms? So what do we have here to try and benefit from the best of each solution? So first thing is what do we have on Linux if I want to filter my packets? So say I want to drop HTTP packets. First thing that comes to mind usually is IP tables and F tables, which is used for implementing a firewall. On Linux, the firewall itself is net filtering the kernel and you have a variety of frontends from the user space. So that would be IP tables for IPv4 rules. You have the equivalent for Ethernet frames, for ARP frames. You have contract 2, which is used for maintaining stateful connections. But mostly I'm interested in very simple things here. So that would be IP tables to inject some simple rules into the kernel and do my filtering here. So that's the first solution. Actually, we have other ones too. So we have, for example, TC, which is a framework for traffic control on Linux that you would manage with the TC tool from IP route 2. So TC works by implementing some queuing disciplines, which are queued disks, possibly working with classes which acts like queues, kind of. So you have a variety of filters used to dispatch packets into different classes attached to your queue disk. And this would be mostly for egress traffic usually, so that you can have scheduling and traffic shaping and this kind of processing for the packets leaving the machine. But you can also apply filters to the incoming traffic. So I can also draw flows at the TC level too. So I have a variety of filters available. I have, for example, basic filters. We are using some syntax called extended match. I have flow, flow, which is different, U32. Now I also have BPA filters at this level, but we'll come back on that later. We have some specific filters, some of which only applying to egress traffic. So I won't go into the details of each syntax of filters. But the thing is that we can also filter traffic at this level drop flows. So for some use case, do the same thing as IP tables. Another thing that I have available in Linux is EaseTool to set up things on the hardware directly. So there is a feature called receive network flow classification, which is actually some kind of hardware filters for the NICs that support this feature. I can directly set up some filters on the NICs themselves. So that's yet another thing that I can do. It's more simple maybe than a net filter or TC because NICs just have basic facilities at this level usually to search for a given pattern at a fixed offset and see if your packet matches or not and drop it or dispatch it to a queue. There are several things that we can do. Last mechanism on my list is not exactly inside the kernel. It's not exactly about dropping flows either, but it's yet some kind of tool that has its own syntax to filter packets. So that would be the pick up filters used for example for TCP DOM. So typically with TCP DOM you would take a pick up expression and compile that into BPF code as in the legacy BPF code, the old version before BPF. And you would attach this program to your socket in the kernel and then each packet would be processed by this program and filtered out. And eventually when you dump packets with TCP DOM you only get the ones that you're interested in because of this BPF and pick up filters capabilities. So that's yet another thing you can do to filter packets. So with TCP DOM for example you can see the BPF program produced by your expression with dash D option. So we have IP tables, we have TCP, we have ETH tool that can be used for hardware filters. So this is just a recap of the different filtering hooks that we have at this time in the kernel. So hardware at the lower level, old BPF programs just for the sockets. And in the middle in the kernel stack we have TCP and ETH filter ingress or egress hooks. Just an example of use case for each of these different tools. So this is each time for dropping incoming HTTP packets. Well actually it's not really the case for TCP DOM, that's more about dumping only HTTP packets. The point here is to show that we have several tools with some overlap at least for simple filtering. And each of the tools obviously has its own syntax, its own way of describing the rules. So that can be a problem if you have some rules for one of the systems that you would like to put to something else. Maybe because you want to change hook, you want to do something else beside. So maybe we can find ways to improve things. I will show you about that later. So you also have other things behind the one I presented. So that's because if you have things beside the Linux kernel, so you would have virtual switches. Some of them also working inside the kernel for one part. So for example with OpenVswitch you can have the kernel data path. You can have things besides the kernel like the PDK, a T-Flow is used for matching packets. You can use P4 to implement some virtual switches and compile them into something dedicated to a specific target. So there are a lot of other solutions too. So we have really a lot of things for filtering packets. We also have ABPF of course since a few years now. So what is BPF? So probably by now you know the song more or less. So I won't do a detailed introduction that's generic efficient in kernel virtual machines. So there is the verifier to make sure that programs ejected from user space are safe. You can attach it to a variety of hooks in the kernel. So in particular you have sockets here, TC for traffic control, XDP at the network driver level. So again that's used for processing packets, filtering them possibly. It has a couple of features that are interesting in comparison with the previous mechanism. Maps to keep states or statistics or whatever you want in the kernel. You have tail calls to call different programs, some helper functions that you can call from the kernel. And that is the idea. So BPF we have several hooks. We have XDP here at the lower level nearly when drivers post XDP. Otherwise you have generic XDP that Jesper was talking about this morning in the questions. We have DC hooks. So this one is different that's legacy version but we also have ABPF on sockets now. For hardware that supports it you even have BPF, TC or XDP afloat. So that's only for Neutronome spandex at the time. Although to be fair I think that some other nicks can support things like some afloat for TC rules and these kind of things. So you could also have some of the ones in the hardware. The point is with BPF in general we get more speed than with the other mechanism. There are several reasons for that. We have this just-in-time compiler that turns BPF programs into native instructions. You have very low level hooks such as XDP. You also have possibilities of affluting to the hardware so you get something that's really fast. It's a language to have to implement programs in the kernel so you also get a lot of flexibility about what you can do, the features you have and that gives you a lot of possibilities really. Especially in terms of filtering you can filter pretty much everything you want. So maybe with some issues with loops at this time but that should be solved hopefully in the future. So BPF is power of course but you knew that already. Also BPF comes with a number of maybe not drawbacks but if you are a system administrator trying to switch to BPF you start by getting a lot of headaches and spend long nights rewriting your existing rules. BPF is just as said again by Jesper. It's not a product by itself, it's a building block so you have to build with it and you have to spend time creating your programs and optimizing them maybe so you spend a lot of time on it at the beginning trying to understand how things work and setting other things. Maybe that's something that we could try to improve somewhere, try to find some other ways to generate programs or to use programs in a way we wouldn't have to start everything from scratch again each time we need a new program. So keep also in mind that BPF that's something that's safe contained that's really a virtual machine inside the kernel that's well defined we have a reference implementation in the kernel so that's something that holds together that really makes a possible good intermediate representation for things like filtering packets. So how could we maybe leverage this into finding convergence between different models for filtering. So the first one I want to present here it's not this slide yet sorry. The first thing I want to tell about that is why should we try to unify things. Well that's pretty much what I just said it's if I already have my set of IP tables rules I want to switch to BPF. It's not trivial to do how can I find a way to maybe turn this set of rules into BPF or something like this. So I want to transparently reuse the existing set of rules I have that's one thing. I want also to be able with one set of rules to benefit from the best of the solutions I have. So if I have IP table rules I'm using IP tables and net filter but maybe net filters are not the fastest available solution. So I would like to switch to BPF because I've heard that XDP is so much faster than BPF so how do I do that. And on the developer side it's also interesting because adding some intermediate representation can hide the details of the different mechanisms that are used to inject rules into the kernel. And so you would just have to send one type of intermediate representation to the back end to the driver. And you would possibly also have a better and coupling of the front ends to translate rules and the back ends to offload rules. So that's for example what's used in this first work that's from Pablo Nerai Uso. This is a RFC on NetDev at the moment. It doesn't use BPF this one. That's mostly about trying to make things converge like TC rules in yellow here, net filter rules, hardware filter rules. The idea of the proposal is to turn all of this into an intermediate representation called flow rules. So that's not a language in particular that's more a set of structures inside the kernel that could be used to represent filters in general. And that could be used for offloads after that. So instead of sending all those different rules in different formats to the driver you would have just one representation in this IR to send to the hardware. And the hardware would to send to the driver and the driver would turn that into instructions for the hardware. So that would hopefully make things easier for hardware developers, for driver developers. So that's the motivation. Just have one thing to push down and avoid having details about TC internals here at this level for the driver parts. So that adding new features to TC is hidden from the driver and gets easier to do. Another solution that tends to make things converge would be BP filter which was at the center of many discussions recently on the kernel mailing list. So that's the way to turn your IP tables rules into BP programs but that's directly inside the kernel. So actually it's just the backend part that would change and you would keep IP tables left unchanged so you would still use IP tables to inject your rules into the kernel. And then instead of having the net filter backend in the kernel you would use BP filter which would translate these rules into an BPF program that would be attached in the kernel. So from the user point of view it doesn't change much, it changes in terms of performance but you would keep IP tables rules here sent to the net filter subsystem that communicates with the specific BP filter.co module. Which is a special kind of module that have a component launched in user space. So you would have actually rule translation in user space that's BP filter, umh, I think that's user module helper or something like this. Then you get the rules back as a BPF filter in the kernel and then you attach this. So everything you get, you get the best part from both worlds here. You get your IP tables rules that you already have and BPF programs, performance coming with that and so on. And the third thing I want to mention is some work that I'm doing right now at NatureNorms so that would be a library that we could live here. It's for kernel filtering rules. The idea is to do something similar somewhat to BP filter but in user space so that would be taking different formats of rules. So if the rules, TC rules, IP tables rules, possibly pick up expressions to convert all those rules into BPF programs but not just byte code instructions. I would like to be able to produce C programs so instead of doing all the work down to sending the programs directly to the kernel which would be supported too. We would also be able to dump a C program so that administrators could hack the programs themselves and modify them. So the idea is to present to someone who is not really familiar with BPF. Here you have your rules, use this library to the functions in this library to turn these rules into this set of rules into a BPF program and then do whatever you want with it. Modify it possibly or just inject it right now in the kernel and hopefully everything works. So that's some work ongoing. I was hoping initially to have it published on GitHub for the first time but sadly I didn't have time to finish it. So it's not available yet. That's something that I hope to be able to publish in the coming weeks or months maybe. And that's it for the presentation. So just to wrap up, we have different things on kernel for filtering packets. BPF is one of them. It's both really perfect and really flexible so it can be used maybe as an intermediate representation for filtering packets. Or maybe we can use some other representations. But anyway, there are a number of convergence models that are trying to image because we want to reuse what exists already in terms of rules because we want to simplify things for development. We have a number of other existing leads for the future in terms of convergence. So before it's something that can be used as an abstraction model maybe for different switching filtering models. You can use that to compile programs into BPF so that's really interesting to see also this kind of relationship. You have BPF used in other places as well. You have BPF data path in OVS which is in development at the moment. And BPF implementation inside DPDK too. So that's interesting to see again all those things getting unified and working together. There was a presentation by a colleague of mine about BPF as a heterogeneous processing ABI at the Linux Plumbers conference. So that's about using BPF for offloading on different drivers and compiling for different architectures and so on. And the last one that I wanted to mention is BPF Trace also using a kind of domain specific language for producing BPF programs. So that's interesting again to see how we can simplify things for having these other things converging together. So thank you. I'm afraid I don't have time for questions. I spoke too much. So if you have questions just come and talk to me. No problem. Thank you.