 My name is Yusie Olsha. I work for Red Hat. I work on eBPF and Perf related stuff. And this presentation is about various features in the eBPF land that actually happened recently or are about to happen soon. Some of the features are still like on the mailing list in the form of patches. So I put together some list of the features that I was forced to actually deal with or I found on the mailing list. Hopefully that might be hopeful for somebody. So let's start. Calling kernel function. So eBPF actually allows you to add a code to the kernel. It doesn't need to be module. It's just binary instructions that you can insert to the kernel on various places. And this code is of course limited in many ways and the new functionalities being added all the time. And one of the latest one is that it actually allows to call kernel functions. So that was pretty nice when I actually heard that eBPF can do it. I was thinking now we can do actually everything but of course there are limitations. So if you want to call some kernel function from eBPF code it needs to be hard coded in the kernel. So these are really, you need to say which specific function you want to call and for which specific program type. And to actually get it into kernel it needs to go through the review through the mailing list. So you need to have really good justification and business case for that. So it would actually pass the review. Once you actually get the function that you want to call from eBPF to the kernel then the verifier comes. Verifier is piece of software that checks every eBPF program that's being loaded to the kernel. And in case of calling kernel functions there are many, many checks, especially for the arguments. At the moment you cannot call for example the functions with variadic function arguments. So I actually wanted to check more. So how hard would it be actually to call panic from eBPF program and it turned out it's not that bad. This is the whole patch that you actually need to put together and send for the review. And basically what it does is that it adds the function to the list and provide that list to the verifier. So the verifier knows about that. And then in the eBPF program you can actually call the panic. As I said, the functions with variadic arguments are not supported at the moment. So I had to add like the wrapper for the panic function because that's a shiny example of variadic function arguments. So I added the wrapper but it still works of course. So if you actually put this change to the kernel you should be able to call panic from the eBPF program. This is actually not so as crazy as it looks like. I was actually asked by one of my colleagues if we can do that because panic is usually at least in REL connected with the K-dump. And K-dump will freeze the whole, will actually store the whole VM core and you can investigate the kernel image later. So for debugging some race conditions that can be actually helpful because you can put this program to any K-probe, to any place in the kernel and cause the panic and investigate later whatever you want. So nice feature. Timers, we can do now timers in eBPF code that's actually not excited new feature as like in a generic point of view but for eBPF code it was actually missing. So periodic eBPF code invocation that's what it does. You specify interval in nanoseconds you choose the clock and there you go. The eBPF code will be invoked periodically. You can choose the clock and the callback is actually called in soft IRQ context so that's something you need to count on and different invocations can be on different CPUs. So that's also something you need to count on if you relay on CPU specific data. How does it look from API point of view? It's not that bad. It's this set of API. You just set a callback and make the timer start. One interesting thing is that so you need to have this eBPF timer handler and that needs to be part of eBPF map. That's for the reason that the eBPF timer is actually connected with the map and the timer lifts as long as the element with the timer is in the map. So that's the limitation. The timer is bound to the map but you can actually pin the map to the eBPF file system and such timer can then run forever. Of course, if you update and delete the elements in the map that holds the timer, such timer will get canceled. And of course the callback needs to be in some eBPF program. So this eBPF program needs to be present, needs to be alive for the timer to exist. So quite standard timer API. The exciting thing is that eBPF now can actually do that. Moving on to next feature, BTF kind stack. So this actually moves us to BTF land which stands for BTF type format data. So this basically describes, if I focus this only to the kernel, the BTF describe every type in the system, every structure, every function in a very compact way. So you can actually have BTF data for kernel with around, I'm not sure, but about two, three megabytes of data which is really nice compared, for example, to Dwarf. The kind stack is basically adding the possibility to take various piece of types in this BTF data. So for example, in the structure, where you have elements of the structure, you can take various fields of the structures or you can take various variables or the functions themselves. And later on, you can actually displayed it with the BTF tool. You can actually see the text in the BTF data. So what is it good for? At the end, it's used by Verifier and the whole generation of this data is that this support for the tagging is in the clank, sealant compiler. So it allows you to add this this attribute tagging on the on the C source layer. Then this tagging is generated as new elements to the Dwarf. From the Dwarf, the PAHL tool will take it and generate the BTF data and then the Verifier can actually use it. So we can take various type information. What is it good for? There's really nice example which is now in the form of patches on the mailing list. So imagine you want to hook this function, do exactly function, which has arguments, which has pointers arguments and those pointers are pointing to the user space address. Those pointers are already tagged with dash dash user, dash dash user tag. So, on the C source layer, we actually know that they are pointing to the user space and it's used in the moment for sparse, which is doing various checking, but for sealant, now with this BTF tag feature, we can actually add this tag for these arguments to the BTF data and now when Verifier will load the program, that will hook to this function. It knows that the arguments are pointing to the user space. So the Verifier now can check when you read those arguments. Are you using the BTF read user helper? Because if you are not, that's probably mistake. So this will greatly help the user to read the arguments. This will greatly help Verifier to do even more strict checking on the BTF programs. So nice feature. This feature is actually already on the mailing list. So this should happen soon. Moving on, LibBPF tools. LibBPF tools is new RPM under BCC. It's like the sub package and it contains compiled BCC tools. So if you are familiar with BCC tools, you know it's actually Python script that holds the EBPF program. And when you run the BCC tool, it needs to take the EBPF program and compile it through the Clank and LLVM libraries. So standard BCC tools have actually big dependencies on LLVM and Clank. And if you want to install them, you will bring in all those dependencies. That's not the case for LibBPF tools. They are basically compiled BCC tools. So they are C binaries. And they contain the EBPF program already in compiled way. So if you install LibBPF tools, there's no dependencies. There are just three or four standard libraries that you need to take in, which are already probably installed. And also the loading time is much better because standard BCC tool, as I said, needs to go through this LLVM and Clank and compile the code. So there can be small delay at the beginning. And that's not the case for the LibBPF tools. There's, of course, a bit less feature. First, it's a new package. Second, it's written in C. So it's not so easy to add all the features from standard BCC tools, which is written in Python. But as I said, it's quite young. So new features are coming. So you should see new features coming to the tools. All the tools are installed with the BPF-prefix. That's not something we invented in Fedora. It was actually cross-distribution effort. So with major distributions, we discussed and deal on the layout of the package. So that was nice to see. Moving on, Pahol speedup. So Pahol is now traded. What is Pahol? Pahol is a tool that actually generates the BPF data. If you ever compile Linux kernel and you have the BPF enabled, you can see this BPF line with the object. And that's basically the invocation of the Pahol utility which takes the DWARF data and transform it to the BPF data. And depends on your kernel size and on the size of your type information on the DWARF size, it actually can take sometimes, especially in distribution kernel. So there was always effort to make Pahol faster. And finally, threads are added to the Pahol. There's the J option that makes that possible. It was split to two stages. The first one is already done and it has actually very nice stats. So this is the benchmark of one of the Pahol invocations. So real nice speed up from eight seconds to three. And on the mailing list, there's already the second stage of the trading which actually makes this even faster. So this will no longer buy this in the distribution kernel because sometimes it could actually take a long time to generate the BPF data. And finally, new iterators. So what is iterator? Iterator is a BPF program that allows you to go through the various kernel structures and it will generate the new iterators. And it invokes the program on every instance of the structure. So for example, if you iterate tasks, your BPF program will get executed with the pointer to every task and the iterator can take this pointer and get all the data from that object and send it to the user space or do whatever it wants. So it's a really nice feature and it grows with the abilities to dump or iterate over various new objects in the kernel. So these are three latest one, the Unix sockets. So now we can iterate over the Unix sockets and over the instances of IO, U-ring and the EPOL. The latter two are actually not merged yet. It's only on the mailing list, but it will probably happen very soon. There's actually a really nice way to check on the iterators if you go to the under the kernel to the self-tests BPF and if you actually manage them to compile, you can then take every iterator object that was compiled in those self-tests and use BPF tool to pin that object to the BPF file system. So that's what I did for the Unix sockets. I took the object and put it to the BPF file system and then you can get the file and it will actually run the iterator and get you the data. So this is the example how you get the dump of the Unix sockets using the BPF iterator. It's very easy to actually go to the source code of that BPF iterer Unix object and change it to display whatever you want or to get any data you want. As for the two other iterators, you cannot use BPF tool to actually display them because both IO, U-ring and E-Poll are instances inside the program that you would use. But just to illustrate what the E-BPF program is getting on the input. So for E-Poll, when you actually register E-BPF program on some E-Poll instance, the E-BPF program will get on the input, the pointer, the EPI, this is pointer to E-Poll item and you can actually go and check any field from the structure and print it so you can actually show what is inside, how we configured the E-Poll. It works the same for the IO-Ring. There's two versions of that. Of course, you can display the buffers that you configured the IO-Ring with or you will get the files. So for the files, you will get directly the file pointer. So you can actually get any information from that object with your guards to the file. So that can be quite useful. And with that, that's it for my presentation. If you have any questions. Thank you Irka for the presentation. Now it's time for the questions. So please put your questions into Q&A tab under this session if you have any. Looks like there isn't any question. So thank you again, Irka. And thank you everyone for joining us during this. Ah, yeah, sorry. There is one question from Christophe Dineshin. Can you list some limitations of BPF, Irka? Limitations. That's quite a generic question. You mean limitations of the programs or? So basically, well, you load the E-BPF programs to the kernel and first, okay, I'm asking about enforced restrictions. So yeah, the E-BPF program is restrict in many ways. When it's loaded to the kernel, one of them is for example, the calling of the current functions. It cannot do that. There were many restrictions like you cannot use the loops, which is actually now possible. At some way. So yeah, in past there were many. It's getting better and better. Of course, all those limitations stems from the safety. So it's basically whatever the verifier can check and guarantee that the E-BPF program will be safe to run inside the kernel that limitations goes away. But yeah, there are still some. So I don't see any more questions. So we will see if there is some more. Otherwise, thank you everyone for joining us and enjoy the rest of the VrefConf.