 Hello, names Julia and I work in Facebook in the containers team mostly on the networking site, hence BPF. So in this talk I will touch BPF system, subsystem in system D, basically what is already there, what do we want to put there and the road we need to take to reach our goals. But first of all I have two questions to the audience, the first question is how many of you know something about BPF, what is that, just raise questions, oh wow. The second question is how many of you know BPF and system D, like what are the applications and how many of you are familiar, yeah, good, at least like 30% something, that's good. So I'll split this talk into like a brief intro about what BPF is and then I will focus more on BPF and system D and that's the part when I expect some of you will disconnect, that's okay. So let's jump in. So BPF, so what is this is why everybody is so excited about BPF because this is a virtual machine-like technology which with the purpose to safely modify kernel behavior from the user space. So it combines two approaches like kernel and user space approach and safely means that the code which user inject to a kernel is guaranteed not to crash, hang on, not to like access some invalid memory addresses and so on and applications are like major applications networking and tracing. So when the user wants to modify kernel behavior, what do you do, well, user write a program and okay, what about like the life cycle of this program. So what do you need to do in order your program to work? So you need to compile, you need to load this program to a kernel, you need to attach the program to some hook point, then don't forget to detach the program when you don't longer need it and as soon as there are no reference to this program, kernel will unload it. So like the ref count is essential to keep in mind the program is alive as soon as there is a reference to it. Let's look into detail how the interaction between like user and kernel space. So you have a program in C and you want this program to be loaded into kernel. What are the steps? So first step you need to compile this program into BPF instruction set. So this is like BPF assembly language, it's based on like several instruction sets from several architectures and you need to see a lot of the, once you have your program compiled, you need this byte code to be loaded into kernel and that's the place when I think the most sophisticated part of BPF system kicks in, your program needs to be verified to make sure it is actually safe to be loaded. So you have verifier and after the verifier is done its job, you have JIT compiler which translates your kernel to, for the current platform. If platform is not supported, that's okay, there is an interpreter which can also like, so you can write a program with an interpreter. So once your program is loaded, it assigns a file descriptor and revcount is greater than zero. Remember like the life cycle, your program is loaded while you have reference to it. So this is like brief overview of BPF, like any questions before I actually jumped into like system D part, okay, like you can ask them later. So what do we have in system D? So in system D we have two subsystems, we have BPF firewall and we have device control. So with BPF firewall you can write list or like blacklist some IP addresses, all of them if you want. With BPF device control you can specify, okay, which devices we want to support, like is it like read only, what are the policies and so on. So these are two use cases which are present now in system D. What's the problem and why do we want to touch it even further? It's because the current system D info is tailored for these use cases and basically it's kind of frozen in time because, okay, like the code was taken from kernel, from UAPN for like the BPF helpers, it was like copy and pasted once and it exists there since then. And second like it's not very, so do you see the right hand side now, that's shame, okay, so this actually BPF microassembly and said you didn't see it, it doesn't matter. Okay so BPF microassembly, this is unfortunately the only way you can write BPF programs in system D presently and it's a good way to learn about BPF technology. So for example you will learn that BPF instruction set has like 11 registers but this is not very convenient to maintain, to read and when I first looked into this code I was very thankful for the person who left comments, actually this is counter, these are basically counters. So what are the good news, the good news that there is some custom BPF firewall support, it was added for firewall use case, so you can specify a pass to BPF file system, when your program is pinned, so it's pinned means it's loaded and it has valid file descriptor, but this supports only one BPF program type and you can attach only two egress or ingress hook and another important part of BPF info is flux, so currently present solution supports only multi-flux, there is overwrite flux and it's cool to be able to override BPF programs as well. So what are the goals of this work and why do we want to proceed? So we want a user to be able to define, to load and attach custom C-group programs, we want a user to be able to attach to various hook points and to do it like simply, like easily, this is from the user side and we want system developer not to struggle with maintaining BPF info, so you don't need to like dig into like BPF instruction set to understand what the hell is going on. Also we want to remember that copy-pasted code, so we want new features out from the box. That's why like well this is the final goals, but what about like milestones? So if you remember the slide with BPF program life cycle, you can compile load attached programs, so currently we are in the milestone number one, so we can attach programs and advertisement, I have a pull request which is cool to be reviewed, it's basically expands the BPF firewall and to like other BPF attached types and flags, so like this will close milestone number three, but actually we want more, we want more, so we want the programs not just to be attached, we want the programs to be loaded and to be compiled as well. In order to do that, we need to be able to feed system D with either object files, byte code or program in restricted C and this is a noble goal, but it's not that simple because like remember copy-paste, this copy-paste won't scale for like the last three milestones and we need the help from LibBPF, so LibBPF is a user space library, you can think about it as a set of wrappers upon system BPF syscalls, so it do all the dirty job for you and we particularly interested in two wrappers, it's like the ability to load byte code from buffer or from path and also like with LibBPF is like very hot topic in the kernel and for example like new initiative is like Corea, it stands for compile once run everywhere, so what you need to do is to like compile a program on your platform, then LibBPF provides some guarantee that your program will be able to load into other kernels as well and the cornerstone of compile once run everywhere initiative is BTF, BTF stands for BPF type format, so LibBPF is awesome and why we don't have it right now? Well, we tried and it wasn't very successful because there is a lack of a package, so currently the packaging of LibBPF is in progress, the first attempt was made, it was made in the form of Git sub-module, so like basically you have two alternatives, you can copy paste or you can try a sub-module which is better, but systemd has pretty sophisticated tests and sub-module approach didn't work because we have to modify the tests, so this is like a workaround which is too long, that's why tested package is better and the second limitation is LibBPF is not tested well enough, well at least this sub-module approach showed us that we have problems because the first thing that pull request did, it crashed on some tests, on some sedentated tests and we had some upstream patches to fix that and we approve that there is a problem and how do we want to address the testability problem, so for testability we need to check that LibBPF is compatible with different kernels and backward compatibility is a major concern because when a kernel developer runs a new feature, he runs self-tests from kernel against that and this may not be enough because the feature may be compatible with all the kernels, so what we want to do is we want to use chemo, chemo is virtualizing like infrastructure so we can test LibBPF against several kernels and we want to port some kernel self-tests into like our testing infrastructure then in terms of packages we want to switch to LibBPF package from a mirror currently there are packages at least for Debian and Fedora but they are built from kernel sources and so if you want to do testing we want them to be built from the mirror basically okay so and the last thing I want to cover so this issue was raised in the pull request with Gitsub module is LLVM dependency so yeah so I know that some people concerned about okay we don't want LLVM runtime dependency that's why let's not build C programs let's just store bytecode well bytecode is definitely better than BPF programs being attached to BPFL system but there is a problem you have to store like three entities instead of one so you need to store C program itself you need to store bytecode and big and little indian and you need to keep everything in sync and the point is you don't need to do that because there is no compile time dependency there is no runtime dependency there is all there is only compile time dependency for LLVM so and with compile once around everywhere initiative there are certain guarantees that the code you compile will run on with other kernels so and also like I think it will also be covered in this conference so BPF support was recently added to GCC which is awesome yeah well I say let's not like jump into it maybe like LLVM is something to be there for a while but something to keep in mind so how to yeah so like implementation details so we can just make make a Mason build rule and this rule will be compiling C code into C string buffer and with libpf we can just read that bytecode from the buffer and well there will be like if you look into the slide it will be milestone number four so we have systemd with the libpf programs stored as restricted C this is these are the references if you want to dig more I really like Selium guide it's like great overview like in depth overview if you really like the technology and yeah that's it like questions okay the dealing with government customers they really don't like the idea of having any type of compiler on a system you mentioned the object code and stuff how how close is that or getting to the restricted compiler where we would only compile ebpf is and have governments looked into whether they'd be willing to accept that or yeah so you don't need to compile in runtime so you need to compile once that's the goal so you compile once and the program is translated into like example holder means your Mason build rule dump the output of Selang into a string so you later can just load the program from from that stream so that means basically you need it you need to build rule but you can ship like systemd binary to prod machines without them being like without a lot of yeah so no runtime dependency you need runtime dependency for for there is for tracing you need that but for a system to use case which is networking mostly you don't but again we need a little bit for that might get your poor request pre approved so one question as you mentioned we have these two bpf programs currently placed like the firewall and the and the devices one and in case of the firewall all the actual IP addresses are stored in a bpf table so taking the bpf code and building it was lvm is going to be easy because the bpf code is actually going to be the same everywhere so you can pre compile and everything is good but at least in the current implementation of the other part about the devices stuff it actually I mean we generate the the structure then it's specific to what the policy is actually that shall be enforced right like it looks up the major minor numbers of the device nodes not in a table in an external one but it actually generates it into the code right so what's your strategies there like would like if we adopt this I mean and I presume we would is the is the assumption then that we change this all to become tables or do you what's the story there yeah I think the idea is to use bpf maps so whatever like we want to provide from the user space we use maps or hash table yeah are you going to work on that yes I don't have the strict deadlines but yeah so the goal is to like get feed from the bpf micro assembly yes it's to rewrite firewall and devices okay that would be excellent yeah I'm looking forward okay commitments any further questions no thank you Julia thank you