 My name is Yiri Olsha, I work for Red Hat. This presentation is about EBPF iterators. I am assuming that the audience is familiar a bit with the EBPF in a nutshell, it's technology in the kernel that allows you to load custom program to the kernel and this program will be executed on certain points in the kernel. So we have many, many different types of the program that we can load to the kernel. For example, for TracePoint, for K-Probes, for any other probes and many network objects and EBPF iterator is another object that you can load the program for. So it's another program type. As the name suggests, the program allows you to iterate several kernel objects. There is a support for some of them building to the kernel. So you can iterate, for example, task, you can iterate memory maps, the files and when you iterate the objects for each object in the iteration, the EBPF program will be executed and in this EBPF program, you can actually choose, you can go to the object and choose whatever field of the object. So for example, if you are iterating task, you can go, you have the task struct and you can go to any field and send it to the user space where the user space can actually do whatever he wants. Currently we are displaying, but you can choose what you want to do. So I have very sophisticated diagram that shows what I just said. So there's several objects that are allowed to be iterated that are supported to be iterated in the objects through the EBPF iterator. It's tasks, files, tags of the task, EBPF programs, EBPF maps and many networking objects. Basically how it looks from Cisco point of view. Let's say you are going to iterate tasks, all the tasks in the systems will be iterated and if you provide a program like this this program will be executed for every task object. There's a extra helper for iterators, BPF sec printf which works basically like printf helper and so you have the format string where you can then you can specify anything that you would put for the normal printf call and you can go to the task object which is basically the pointer to the task struct and you can reference any field under the task struct and send it to the user space. How it looks on the syscall level. So basically when you want to load EBPF program to the kernel you first need to load the program. You need to execute the BPF proglot syscall here in this example, I'm actually using the EBPF function which is the wrapper for the syscalls. So you load the program, you've got the file descriptor of the program. The program is verified in the kernel and the kernel provides you with the file descriptor. Then with the file descriptor you don't create the iterator itself but you create the link layer which is really helpful I will show later on. So you create another file descriptor which is the link you specify the BPF trace iter and then you can create the iterator itself. There's another syscall for the BPF iter create where you specify the link and then you get the file descriptor and you can read it like a normal file and in this file descriptor, in this data for this file descriptor you get all the information that was printed in the EBPF program. So in the EBPF program above you see that it's basically printing the task pit and that's the information you will get all the PIDs in the system you will read in the syscall and after you close it like a normal file descriptor. So that's one usage of the iterators. The next usage is something which I really like, it's really nice. You can take the link layer file descriptor and make something that we call pin it to the file under the BPF file system. So you take the file descriptor, you provide the file name and the file will magically appear under the BPF file system. So if you go to sysfsbpf or wherever you have the BPF file system mounted, you will actually see the file there and if you open it and read it, for example with the cut, the iterator will be created and you will get the fresh data from the moment that you actually open this file. So this is really nice feature with the BPF program. You can prepare some debug output that you want to see. For example, we'll see in the PIDs maybe not the best example but you are interested in some task property. You prepare it to the file and whenever you need it, if you do the cut on the file under the BPF file system, you have the fresh data from the system. So this is how it works in a nutshell. So it's a feature that allows you to basically iterate several kernel objects and it's quite fresh features. So there's not too much documentation on it. The best documentation you will find is the BPF self-test. It's probably the best source of information. How to use any BPF feature because anytime there is a new feature for the kernel to the BPF, it needs to have also the test. So and the test go to BPF self-test which is located under kernel sources in tools testing self-test BPF. So if you go there, you can see how the BPF is actually used. And if you are interested in iterators, you just display the files under the prox directory that have the iter in the name. And here in these files, you will actually get all the usage which is currently available for the iterators. So if you open one of those files, you will see code like this. So this is basically a C code of the BPF program and you can see how you can actually get the tasks and get the information from it and print it back. So this is one thing that you actually see how it's used and you see how it's written. Another nice thing is how actually to use this BPF self-test. And this is actually available through new BPF tool subcomment. BPF tool is like helper tool for many BPF tasks under Fedora. There is a specific BPF tool RPM. So shouldn't be a problem just to install it. And if you actually manage to compile the BPF self-test, you will end up with many objects files for the iterators. And every each of this object files, you can load with the BPF tool iter pin subcomment and pin it to some file under the CISFS BPF. So let's say you want to see what is the BPF iter task stack.o doing. So you just run BPF tool iter pin and the object file and the file name under the CISFS BPF. And the object will be pinned. And then you go to the CISFS BPF and get the file and you will actually get the result of running the iterator. So the combination of going through the BPF self-test and using the BPF iter pin is really nice. You can check every example and how it's working. This example displays the task stack for every process in the system. Normally you would go to the proc self stack or whatever the name of the file and you would need to do it or do it in the script. This way you open just the one file and you will have all the information. I have another example just to show that how powerful the powerful it's actually with the printf you can do any output that you actually want to the extent that you can mimic some of the proc file system files. So here I'm showing the BPF iter UDP4 from the self-test and they are displaying UDP sockets and they are displaying it in the same way as if you open the proc net UDP file. So you will get basically the same output and yeah, you can do much more with that. So this is like the lowest probably interface that you can get to the BPF self-test. It's the direct C interface together with the BPF tool iter sub command. You can use it like that and play with that. Let's go to the another eBPF ecosystem which is the BCC. BCC is a set of libraries and tools that use eBPF. As of now or a few days back when I was writing the presentation, there is no tool yet that would use the iterators. So not too much to play with under the BCC but there is actually support in the BCC ProBloadXATTR function which is one of the libraries functions to load the BPF programs and it has the support to load the iterators program. So this is good and there's also C++ support with really nice example on the task iterator. But like from the high level point of view still there's no tool yet that you could play with. Let's move on to the BPF trace which is even a little bit higher level than the BCC. But again, the support is not there but the good news is that it's in the progress. I will show you how the interface will look like. Currently the support is being added for two iterators, the iter task and iter task underscore file. The first one is just the task iteration that I just showed you in previous examples. So for every task, the program that you specify under BPF trace will be executed. The task file basically display every open file in the system together with its owner, with its owner with the task. Of course we can add many more and probably we will if there will be like people asking for that and if we see the need but we are starting with those two. I have a few examples, first one really easy. So you say BPF trace dash E, iter task. The iterators are basically another BPF trace event. So it will appear in the BPF trace dash L is another event and you say dash E, you specify the event, the iter task and then you specify the program. This one basically for each task that it iterates it displays the code string, which is the short name of the process and the PID for that. So every time you execute this BPF trace program you will get the list of the processes in the system together with the PID. So really nice, but not too helpful. The task file iterator is much nicer in this regard. Basically the program that you execute for this event is called for every open file in the system and what we display in this program is that first I get the task, the context task, all the objects in the iterator under the BPF trace are referenced by the context pointer. So when I say that you were the reference the task you need to go to the context and reference the task and then you can reference any field. So in this example and referencing the command field again together with the PID and as I said the program also get all the open files in the system together with the task. So this task has this file open. The file is basically the file kernel object. So you can reference it to any field under this object and we have already support for the path function that if you reference the F underscore path on the file object and call and use it as an argument for path function it will return the full path of the file. So basically in this output you will get all the open files in the system together with the task. So kind of LSOF functionality. I didn't write it to this example but also there's file descriptor available to open. To have this information on the context so you can actually see where the file is located which file descriptor is the file. So actually this output is already useful. And of course BPF trace adds the support for pinning which means that you can prepare the iterator and not use it at the moment when you prepare it but store it under this CSS BPF file. So in this example, if I specify under another double column the CSS BPF filing, the iterator will be created and pinned under the CSS BPF. So it will not be executed right away. You can see BPF trace writes program pins to the file. And then you can open the file and display what's needed. So very nice. As I said, this is ongoing process. There is a pull request. So if you are interested, I hope if everything goes right and smooth it could be there in a matter of weeks. Another thing I'd like to mention is that we will most likely use one of the iterators under the path because one of the iterator, basically iterate every memory map in the system which is exactly what we do when we start Perf for system-wide monitoring. Currently we can do that. The memory map iterator is there but at the moment we cannot make it parallel which is not faster than what we do in Perf but on the other hand it's cheaper because if you go to the proc file system and read all the data by yourself, every time you open proc file system entry it results in the kernel with extra memory allocated for the in-inode. So if you actually open this proc maps for tens of thousand processes, you can actually tell. So we hope that this iteration with the iterators will be cheaper but we need to make it faster. We need to make it parallel. So this is something which will come in the future. And last thing I want to mention, there's also a kernel config option BPF preload which allows you to compile BPF iterators to the kernel. So you can actually provide the BPF program under the kernel and compile it in and after you boot the kernel, it will boot with several files and then you can compile it under the ccFS BPF. So that's also useful.