 Hi, hi everyone, thank you for attending. So we're going to talk about BPF again. So many of you probably know that we can somewhat trace many things on Linux with BPF. I want to talk about how to try and understand what may go wrong with those BPF programs themselves. So not trying to try some figures, but just the BPF programs. So I'm Quentin. I've been working with BPF for the last four years or so. First in a company called Sixwind doing software acceleration for packet processing for networks. Then at NatureNum, where I worked on hardware fluid of BPF packet processing. So with all of that, you may find that some of the examples are slightly more focused on BPF for network processing rather than tracing, but everything I would present should be the same for the different use cases. So BPF is the extended version of the packet filter. You could use it to write your own programs, for example, in C code and then compile them into the BPF bytecode and inject that into the kernel where it is verified in order to make sure that the programs are safe and that it terminates in order to avoid security issues or crashing your kernel. These kind of small issues you want to avoid. So you have an interpreter in the kernel. You also have a JIT compiler to get better performance with BPF programs. And once they are loaded in the kernel, you can attach them to one of the existing hooks. So for example, network processing or tracing hooks. So yeah, that's about it. So the instructions are 64 bits. You have 11 registers, 500 bytes stack. It's not really complete in the sense that we don't have generic loops. Now we have bounded loops. And it comes also with a number of additional features. We have maps that are shared between BPF programming in the kernel and user space or several BPF programs. We have some functions we can call from the kernel. We have BTF on which I'll come back later. So that's about it for BPF. The following diagram can summarize the workflow. So we compile our C programming to be BPF instructions that we inject into the kernel. It may or may not be JIT compiled. And at some point we'll attach it to a hook where it's run well even there. So whether a function that we want to trace is triggered, for example. So the rest of the presentation will be organized like this. Like this. That's better. So remainder of EVPF that's done. The tools to inspect BPF objects. So we have a number of them. I would like to introduce them in the first part for the different steps of that diagram I just showed before. And then in the second part, I'd like to make a somewhat deeper introduction to BPF tool which is used a lot to try to get some information about the BPF objects loaded in the system. So that will be programs and maps, for example. I would like also to give a brief overview of the possible next steps in terms of BPF introspection and debugging what's being discussed at the moment in the community and what we could do. So I just forgot to mention the main use cases for BPF. So we have networking, tracing. We have also some use cases for circuit filtering between C groups. There is work in progress for Linux's security module. And we have also more use cases, smaller ones that exist today and probably some others that will appear in the future. So let's start with the first step of our workflow. When I try to compile a C program, for example, into EBPF bytecode, at that step of the process, I want to make sure that the EBPF bytecode is consistent with the program I had in mind, with what I wanted to create. So how can I do that? So I have this clang command line because you compile your C into BPF with clang and LLVM. There is also a GCC backend now, but it's not as complete as clang. And once I have my object file, I can use LLVM objdom to dump the bytecode of the program. So that's what we have here, B7. And the rest of zeros is putting the value zero into register zero, which is used for storing the return value of the program. And then I have a second instruction which says exit and return from the program. So return zero, basically. I can also get the C source code in that output if I were to provide the dash G flag to clang and use the different options with LLVM objdom. So that's useful to check what I have in my object file. I have an option also that might be useful, which is instead of compiling directly from C into BPF bytecode, I can compile into assembly, EBPF assembly. And that gives me an assembly file, a .s file, that is much easier to edit if I want to change my BPF code itself without having to write all the code by hand. And if I want to test a specific sequence of instructions to test some kernel BPF feature to test, for example, for hardware flows, we wanted to make sure that some instruction sequences were processed in a particular way. So that's useful. And clang again can compile this assembly into regular BPF. So that's it for the first step. There's not much to debug really at this point. That's mostly to inspect things. That's also the case when you try to load your program. So you have your BPF bytecode and you're trying to mostly pass the verifier. That's something here in the kernel acting as a guardian somehow and that often tells you, this I don't want of it. There is a security risk, so you have to change your program. So we want the program to pass the verifier or at least we want to understand why it's rejected, which is essentially improving the program to get something that works. So we have a variety of resources that can be used to inject the program into the kernel and possibly get this output information from the verifier, the debug information that the verifier sends back. So there is the BPF, which is in particular used in BPF tool, which is a common line utility. For networking we have IP, TC, BCC, but BCC is mostly used with tracing, use cases, tracing programs. So we can load programs with those tools without having to reimplement everything ourselves and use and code manually the BPF system code. And we can do some additional management of the BPF programs and maps. So we have different things that can give us some information. We have the logs from the verifier. We have kernel logs sometimes for some specific BPF errors. For networking we can have netleague stack messages too from IP or TC for example. And we have some places where we have documentation to understand those errors. So typically that would be the filter.txt documentation file in the kernel documentation or the CDM guide about BPF which is really complete. So this is a non-exhaustive list of what the verifier tries to check. So if you have non-valid BPF syntax it will be rejected, too many instructions more than the kernel supports it will be rejected and so on and so forth. So just so that you can have a look at an example error message from the verifier. So I'm trying to inject a program here and I'm trying to read a packet, a network packet. But I haven't checked that the offset inside that packet is a safe offset in the term that my packet is big enough for me to try and access the given offset in the packet. So I have a potential risk out of bound access which may never happen because if I'm trying that on the second byte of the packet I never have out of bound access because my packet will always have a MAC header so I virtually have no risk but the verifier doesn't care if logically there is a risk out of bound access the program is rejected. So the message is very useful when you know what you're talking about BPF context invalid BPF context access but for newcomers especially that's very cryptic messages it's very hard to understand what it refers to. So we would like to have some better things in the future maybe some additional documentation or FAQs that would tell us where to look in that case but still we do want to have those messages so it's especially important to be able to print them into the console or you're just on your own to understand what's going on. So we have some debug flags that can be passed to the tools to the various tools so you have to provide for example if you were to inject all on your program with a custom management program calling the BPF system call yourself you would have to be careful to pass a buffer to the kernel too so that the kernel can write into this buffer the logs that correspond to that message which we saw just before because we have a number of tools already such as BPF tool and Libbpf which is used in BPF tool such low level buffer management is already done for us to some extent but for example Libbpf itself does have an additional number of debug information so that might be information about hey now I'm trying to inject a program in the kernel or before that hey now I'm trying to perform health relocation about some stuff from a given section of your object file that I'm trying to put back to the correct place into the rest of the BPF instructions we have a number of health magic stuff that's happening before injecting the program so you may have to tell Libbpf to dump this information so for BPF tool for example you can have information from Libbpf and from the kernel with dash-debug option so that's something useful to know too so for interpreting this information we have that's the same locations that I mentioned earlier the filter the text documentation the Selium guide there is some additional information now under documentation slash networking I think slash BPF although it's still not perfect I think the hardcore solution if you don't get anything better is to go and read the kernel code yeah that's not really ideal too so yeah maybe in the future we'll have something a bit more user friendly that would be nice to have sorry about that so now we have managed somehow to fix our program and to pass the verifier based on the output we managed to make it work to add the check that was missing for the packet access so the program is now loaded in the kernel it's not attached yet it can be attached it's just here in the kernel it's referenced by something in user space so that it doesn't get wiped out from the kernel so that can be a file descriptor typically or when the program gets attached then it's referenced in the kernel until we we detach it so we have our program sitting here and what can we do with it so we have an options a number of options such as listing existing programs in the kernel and maps and dumping the instructions for those programs of the content of those maps so maps will typically be array maps or hash maps or there are a number of additional maps for some specific use cases so BPF tool here is the the main tool that we use on a daily basis to inspect those objects when they are loaded in the kernel I'll come back to BPF tool later though I'll just move on to something else which is BTF for BPF sorry for BPF type format so since kernel 4.18 I think so that's recent BTF objects can be used to embed some debug information about the BPF programs and maps so it's something that's similar to the dwarf format that's being used with JDB for example so you have a number of information stored into your object file and that are passed down to the kernel when you eject your program which means that the kernel may have some information about the BPF programs or maps so for programs it's just a matter of using a recent enough version of clang and lvm and passing the debug flag for maps we have to do some wrapping in the source code so what I have here is a classic definition of a BPF map that would be in the C source code and I have above that to to add some information in particular about the type for the key which is a pointer to a need and the type for the value which is a point to that struct there are more details sadly just mostly in the commit log of that change for now the reference for how those things work would be samples and self-test in the kernel repository for now we're still waiting for documentation to catch up so an example of a program that's being dumped with a BPF format I have the sorry didn't want to click I have the regular BPF bytecode instructions here and below and in the middle of that I have the C instructions I hope you can read hopefully somewhat so here I have an int balancer ingress which is the name of the function here I have if data plus offset is superior to data and then and my program goes on so this is the C instructions that were used to compile the program into BPF bytecode but I get this information from the kernel now the kernel knows this information about the program so that's useful to understand what's going on and it's also being used for some other BPF features in the sense that ptf objects like this are checked by the kernel especially for mats that are checked for consistency so that you cannot just load anything, any information you would like to but something that really corresponds to the map that's being loaded and some advanced BPF features rely on BPF objects to work but that's something different so just let's just think of it as deepening information available to us for now so I have had the possibility to inspect my objects in the kernel and that's good I have my programs loaded now I want to run them maybe so I want to do some actual work with my programs, some processing, some tracing whatever so I attached them to one of the hooks in the kernel so for tracing that would be trace points or K probes for networking that would be TC hooks or XDP at the driver level and for the other use cases but how can I understand why my program doesn't work as I expected I mean I've attached my program to an interface and it doesn't drop the packets I wanted it to drop or it doesn't give me the arguments of the function I was expected to trace so what's happening we have a number of solutions we don't have a step by step debugger yet so it's mostly again about introspection we do have a printf like thing because in kernel it's more printk like thing it's a function that can be called from the bpf programs themselves so it's just when you're debugging a program with printf everywhere that's the same thing except it doesn't print to the console because it runs in the kernel so it goes to psfs file or pipe you have two files actually that can be used and you get the output of whatever information you feed to your program so you have to use a constant string at the first argument but that's pretty much the same thing as other printed functions otherwise so in that case I'm printing the four first bytes of packets so I pass format string the size of the format string and then the data that I want to use for person x here so that's good for understanding what's going on that's not good in terms of perfments so we have a different mechanism which is a bit more complex to use perf even arrays but that can be used to stream data more efficiently from the user space so that's what you would use for example if you were to re-implement TCP dump but with BPF programs attached to XDP interface you would stream data with perfunet arrays or you could use that to send the flow of data like for tracing each time a system call is called and you want to print the arguments for that system call that's also possible to use perf even arrays to notify the user space and user space can print them into the console so you could have something that's close to a trace at least in appearance you would have something somewhat similar so yeah and one of the advantage of this is that it can be used with hardware for network processing too so one thing that's interesting too is that BPF can be used for tracing right so why don't we reuse BPF for tracing BPF programs well actually that's possible now since the latest I'm not even sure it's out 5.5 I think it's being it's the merge window right now so we do have a possibility to attach BPF programs at the entry or the exit of another BPF programs at least for networking programs I'm not sure it works with tracing programs so at least for networking programs okay so but for networking programs it's supposed to work so you can have the the input packet the data of the packet you are processing with program you are debugging and you can also get the output data so you can check the difference between the two and see how your program managed to process the packet you can also use BCC or BPF trace to inject programs to trace the kernel if you are hitting some some unknown issue that's happening for example during verification time I've been using BPF trace a couple of times to understand what was the function inside the kernel verifier that was rejecting my programs so it can be useful to some extent but that's not the easiest way to understand what's going on but still that's something good to think about if you're stuck otherwise so we have a feature that can help us testing BPF programs and by testing in test runs I mean instead of attaching a program to a network interface and wait for packets what I do is manually tell the system to run this program not on a real packet but on that input buffer and to give me back the output data once the program has run so there is a specific program type for that that can be no that's not a program type sorry that's a supplement for the BPF system code and that works best again with networking program because it's a bit more tricky to use with tracing programs so you provide input you get output and you can see what's happening it has some limitations so not on program types it's a bit difficult to check how general data structures that might be changed by the program are changed or not it's a bit more complicated to implement some BPF helpers are so those functions that you can call from within BPF programs are very hard to support with that mechanism for example when you try to redirect a packet to a different interface how can you check that it really worked just by getting an output buffer that's another issue and it would like well some people at least would like to have this feature available for non-route users in order to run test suits on BPF programs but there are security issues again so yeah so a number of these limitations have been proposed in a conference in March in a Dev conference for discussion so things might evolve and improve hopefully and other things that might be useful for understanding what's going on is that you have statistics now regarding programs so you have to activate them with this control and then you get the duration of your program run and the number of times the program has run so at least you can check that your program has run the number of times you expected it to run it's not enabled by default because there is a slight overhead for gathering statistics when you run the program so you have to to activate it a small number of machine learners thinks to know about debugging at runtime too so PERF has support for annotating GTBPF programs so you can run PERF top to the annotated part and you can find the instructions of your program on which the CPU is spending some time so who knows if you need to check what's happening on that side you're being covered there are a number of user space BPF machines so they are nowhere as complete as what's in the kernel but still if you wanted to run a BPF machine under GTB today you would have to turn to one of these user space machines so your BPF especially is in C or BPF is in Rust but otherwise they are pretty equivalent there is a small debugger too I think it's step by step I don't remember it's a debugger too but it's for legacy CBPF which is much simpler than a BPF so also simpler to debug and that's another kernel repository so that's about it for loading and running BPF programs and just wanted to speak quickly about writing programs in user space to manage your BPF objects so what we want is the ability to debug such programs to improve BPF support in the tool chain in general so we have the ability to use those frameworks that work already, BCC, BPF trace LibCafeers library I've been working for and that turns some filtering rules into BPF programs and network filtering rules into BPF programs you have LibBPF of course that can be used to implement your programs managing BPF objects you have a feature that allows you to dump the list of BPF related features on your system with BPF tools so that you can check what's available, what program types are available what map types are available what BPF helpers even and if the BPF is called in the first place it's supported these kind of things since it's the debugging tool anyway we have support for BPF in S trace we have BPF support in Valgrin although in Valgrin I think it might be getting a bit outdated but S trace I think is mostly up to date so that's really nice to have especially when you're trying to inject a program into the kernel and you don't get much information you just get a newer code and turns out that your BPF tool program has been trying several different things like checking that basically I can inject a very simple program and then it tried to create a map and then at last it tried to inject your program so which of those different codes failed I'm not really sure so I can trace it with S trace that's very useful so that's it about the introspection debugging BPF in general I wanted to talk a bit more about BPF tool because that's really useful to do all those things not to load them but most of them so one of the I hope you can read at the back of the room one of the simplest function is listing the programs that are running on the system so I have BPF tool prog show so this is a list of programs attached to to so get in C group so the first seven programs had actually been loaded by system D on my system for fire rolling things and then at the end there is one program that I had added to about XDP so networking processing I can list those programs I can also download the instructions so I can download the what we call the translating instructions that's after my program has been loaded into the kernel kernel verifies the program of rewrites and because of that there are some small changes possibly not in terms of of the logics of the program but just in terms of small offsets regarding to the different data structures being used so I can have just these instructions as they are in the program but I can also download the JT instructions in the second case it was JIT compiled into a native binary for X86 and I can get all those instructions so that's interesting to see what's happening with the JIT compiler I can load the program I can attach it to a variety of hooks I don't think I can attach it to K-progs or to response with BPF2 at this time so I have the related commands BPF2 prog load BPF2 prog attach C group attach and for networking that will be net attach yeah I can work with maps too so I have BPF2 map show to list the maps all those LPM try were related to the system D programs injected in my system I can look up entries in maps so here's the case of a very simple array map and I'm dumping the content of the first entries so what we don't have here is BPF which is missing with BPF information I would have more data than this I forgot to put the picture I would have the fields so my entry here comes from a C stroke with a number of fields of attributes and I would have the name of the attributes and the values beside them instead of just blob of hexadecimal values so I can dump the full map at once too I can update an entry in the map or delete an entry in the map to even works for some specific map types such as the one used for jumping from one program into another one you have this kind of features and BPF2 is really helpful to just update the map I'm sorry this picture is really small I'm not sure why so the idea is what I presented earlier can I zoom on that I don't know how to zoom on that I'm sorry so it shows a list of BPF features spotted on the system first one is BPFC's call for privileged users is enabled on that system cheat compiler is disabled and so on and so forth we have a number of BPF related config options that were used when compiling the kernel so that we know what BPF features will be available and all that can be reused when you write user space programs working with BPF stuff so this is not too big but not as small picture of BPF programs so what I'm doing here is just running a test run of a program I've loaded previously into the kernel so I'm providing so I have BPF2 program this is a handle to my program I'm providing input data data in and my input file data out and dash to dump into the console and I want to run that program 10 times so I get here the output data hello twitter that was because I wanted to put that in a tweet return value of the program is zero and I have the duration on average of those 10 runs so here it is for test runs so some additional features for BPF2 without any screenshots because they are too small anyway so we can list programs per C-group per tracing hook so we can really inspect at that tracing hook what do I have we can load several programs at once especially useful when you have a number of programs with tail calls and you jump from one into the other one we can dump directly into the console the output of BPF trace printk we can dump from perf event maps there's a thing about generation of a skeleton header file for for for helping with writing programs for managing BPF objects but that's pretty new we have batch mode, we have JSON support which is pretty nice batch completion and more so I had made a Twitter thread if you want to look into that more information obviously there is some documentation for BPF tool at least we have man BPF tool and different man pages for the different sub commands so BPF tool pro BPF tool map and so on so here you are ready to use BPF tool now it's it's being packaged now on Fedora Ubuntu 2 on the latest ones I think should be added to 1804 some time otherwise BPF tool is located under the kernel repository few words about what's coming next for BPF debugging so we want to add more stuff to debug like can we have more modular programs with more modular curves between programs so there is something that was added very recently to the kernel for having that could that help for debug maybe there is a discussion about adding more information to the ccfs system file and also discussion about improvements for the test run which it would be really really so nice to have a real step by step debugger for BPF we don't have that at the moment but people are really thinking into that so how can we do that should we sorry why pitrace pitrace where is pitrace I don't know pitrace enough to know if that would work that's an idea maybe so we have a we should talk about that so here are some ideas that were proposed besides pitrace running a program in a VM freezing it is that doable extend the BPF test run interface attach kprobs to every single instruction is that something doable that just leads to explore for now we I would like at least have an updated documentation because what's existing is not always up to date and I definitely think we would beneficiate from a real troubleshooting guide that would tell you this error is often related to that thing in your source code that you should change to that other thing so again in particular the things about step-by-step debugging and ccfs are being proposed to the next net dev conference in March so to conclude debugging BPF programs is not trivial we don't have a step-by-step debugger yet but the tooling is getting better and better we have more tool more efficient we can dump instructions and map contents at any stage the workflow we can print data when the program is running we can do test runs in the kernel we can run in user space BPF VMs that's not the best we can do but that can help debugging what's happening BPF itself can be used to inspect other programs which is really nice we have added here the BTF debugging for format that's being used to provide more information so all of this it's here it's already very good and hopefully we will have better things in the future too so fingers crossed and that's it for my presentation thank you I don't know if you have time for questions quick okay so yeah can I do this and what is the best API you can if you want to generate it yourself so the question is so if you want to code your own BPF assembler program you can do that you might hit a number of issues not on basic program it will work for basic programs not using advanced features but if you're trying to use a BTF debugging information for example or more advanced codes from one program to the other one there is a number of things that are happening when I mentioned health relocation stuff and this is getting more and more complex so you could re-implement that but that would not be trivial so lead BPF is really a reference here it doesn't prevent you to work on your side with another BPF assembler but for more complex stuff you will have more work to do and more time to spend on that other questions yes sorry can you speak louder is it possible to manage a hardware interrupt from BPF program I don't think so I'm not sure I don't think so you don't have a hook on that I think so I would say no there wasn't a question in the back yeah sorry can you speak louder that's the question yeah so how big is a network programming is a network BPF program on average it was limited to 4k instructions before now it's limited to up to 1 million instructions but that's not really true because there's a number of instructions that the verifier can check and it checks some instructions several times and so the average it's hard to tell because it depends on what you're running but the 4k limitations was definitely a hard limit to some people and they want you to have more than that that's also why you can jump from one program to another one one motivation was to circumvent that limitation so I don't have a number in mind but programs can range to a few few dozen instructions to thousands and thousands of instructions now so it really depends on the complexity of your programs you're welcome