 Hi everyone, thank you for coming to my talk. So my name is Jerome Marchand, I'm a Kernel engineer at Red Hat, here in Bernal. And today I'm gonna talk to you about BPF Trace. So what is BPF Trace? So BPF Trace is a dynamic tracing language. I assume that you'll know what tracing and language mean. But maybe some of you are not familiar with the term dynamic tracing. So that basically means it's tracings that you can enable or disable on the fly. You don't need to modify your application to do it. Let's see what's a bit under the roots. So BPF Trace is based on EBPF, the Henens Berkeley Packet filters. That's been a pretty hot topic in Kernel R&R lately. And while we don't need to get into the detail of what it is now, but just you can know that it's a small virtual machine that can run some user code in Kernel. And because of safety and security concern, there's a bit, you have some limitation, you know, such as, you know, size, limitation and notably you cannot do loop in there because you want to only run programs that you know terminates. And besides that, BPF Trace also re-use existing tracing capabilities that exist in the Linux Kernel that already used by other tools. So for instance, Kprobs, Uprobs, TracePoint. So I guess if you already used a tool like Perf, I've traced before, you're probably already familiar with that. And if you don't, don't worry, you will talk a little bit about that later. So why would you choose to use BPF Trace? Well, first it's because you have some tracing needs so maybe you get an application that is not behaving correctly or maybe there's some performance issue. But okay, there's plenty of tracing tools out there. So why would BPF Trace maybe could be something that's suitable to unit? What kind of characteristic does it have that may be something that you need? So I related a few points. First, it's dynamic. So unlike, say, adding printf to your program or GDB, you don't have to do anything to trace it. It's already there. And you can also use it, for instance, on a third-party application that you don't even have the code of. Then, yes, it's a scrimtid language. So unlike, say, Perf of Ftrace, you can easily script it. You may, it's more flexible. You can tailor it to your needs. It saves. That's mostly in comparison to SystemTap. Because like SystemTap, you also run stuff in Kernel but you have strong limitation to what you can run and normally it shouldn't crash your Kernel. And last, it's pretty simple and concise language. So for instance, anything you can do in BPFtrace you could also do with BCC, for instance, but as a matter of fact, BPFtrace is based on BCC. But if you've ever seen a BCC program, it's long, it's complicated, it makes different stuff. And BPFtrace is much easier to use, much more user-friendly. So let's see first a little example. So that's gonna be the classic Hello World. So it's pretty simple, but you can already see a couple of features of the language. First you have the begin and end statement. So that's prefix some block of code that's gonna run as you probably guessed at the beginning and the end of your tracing session. And also you can see that BPFtrace is a classic C-style printf function. One more thing to notice, this is one way to, you can invoke BPFtrace in the command line with the command directly after the minus E option. There's also another way to do it. If you have a bigger script, you can just write it in a file and then just run it with an argument to BPFtrace. So that's an example here. And if you want to, you could also use a shebang at the top and work your file executable. So this is an example I chose because I wanted to choose an easy example of a probe. So as you can see the first line that's gonna be the name of the probe. And it's a trace point of a C-scoll and precisely the C-scoll is the read C-scoll. And even more precisely, we're gonna trace when the C-scoll exists. And after that you got the action block so which is instructions that gonna be executed each time the probe fires. So in this case, each time the read C-scoll exists. And that's a pretty simple example. Again, you print something. And what we print here is two things. First is a command name. So com is a built-in variable into BPFtrace that contains the command name. And next it's the return value of the C-scoll which is the size of the read. So if you look at the output, then you can see we just print something. Every time we read something. So that's pretty big. There's a possibility to have a more synthetic output. We're gonna see that later. But first we're gonna talk about the kind of probe you can use with BPFtrace. So first you are K-probe, carried probe. So these are probes which are set at the entrance and exit of kernel functions. And there you can also access the argument of the function or the return value of the function specifically on carried probe. Then you got U-probe, U-read probe that very much the same thing but for user space. Also there's TracePoint. So TracePoint, there's a number of static TracePoints that has been inserted in strategic place in the kernel. And while they also typically have arguments and return values than you can read. And while there are a few other type of probes and profiles that allow to do a time-based sampling and I think I'm gonna skip on the other one because we don't have too much time and it's not that important. Just so you know there's other kind of probes that exist. And also yeah, something that's pretty useful you can list your probes because there's really plenty of them and you can search by the probe name and you can also use a wild card. So yeah, now we're gonna have a look of variable spork in BPFtrace. So there are local variables and global variables. And for global variables you have two type of them. Got scholar variables and which is very much use associative arrays in BPFtrace. We also call them map. So just arrays that associate with some value with the keys. And to be honest actually, scholars over there are implemented they're actually also associative array but just we happen to just access them by the zero key and not otherwise but for practical purposes I guess we can just consider that there are scholar. So now we have an example here and this example we have just one variable with count. It's a scholar and what we would put in there it's the result of the count function. So yeah, it's a building concern in BPFtrace and it just returns the number of time it has been called. So here at the end you're gonna have in return the values of cscol has been called. And when you look at the output you can see it actually prints that variable. I didn't do anything. BPFtrace just always print all the global variables at the end so you don't have to worry about that. So now it's the same example just I put a key just after the variable. So I use as a key the common name. So now we have an example with associative arrays and so you can look at the output now now it's sorted by key. So in this case the common name. So there are a few buildings variable in BPFtrace we already see the process name come. There are a few other one very much used usually as a self-explanatory name. So PID, TID, we use that all the time. Come as we already seen, CPU too. We often use that as a key for an associative array to sort it by CPU, TID, whatever we need to do. A few other variables I'm usually used in a different way but very useful so yeah the nanosecond timestamp we can use that all the time. Argument obviously written value we use that a lot and there are a few more probably. Don't need to get into detail here. And beside the building variables BPFtrace also provides a few very useful functions. So here's an example of one of them and that's east function that's that returns an histogram of a logarithmic histogram of all the values that has been passed to it. And to be clear I mean all the values have been passed to it in all the call you do during the tracing session cause obviously it gets only one argument. So in this example so we got again a wide, system-wide histogram of all the read sizes. As you can see it doesn't take much code and you already have some nice analysis just a couple of lines of code. So there's a few other function like that that BPFtrace provided we already see in account and east. So there's also some so that adds up all the values that it has received in arguments. AVG does the same but calculates the average. V in max for that recollectors. That remembers the minimum values, maximum value. Stats which may be also useful which is basically count sum and AVG together. So there's also a linear histogram if we prefer to have a linear sometimes it's more useful. And there's a few more ancillary function that delete, clear, zero, the maps, print it. And of course we can combine together the feature we have seen before. So if we take the previous example and now we can also just differentiate the output depending on the common name. So now we have an Instagram for each program that's running. And again that's output's gonna be pretty long it's yet to get it short. But imagine you're only interested in one program as a GCC where you can filter on it. So you could of course put any statement into your action block but there's actually a better way to do that. So we got filters so also known as predicates. So you just add the filter after the probe name and you're good to go. So the probe is still gonna be active but the action block's not gonna run if the predicate is false. So yeah in that example now we filter on GCC and we only collect data that GCC have done or the reads that GCC have done. So yeah that's also something that's pretty useful we do a lot with BPF traces, measuring time, measuring delay, stuff like that. So for the example here let's see you want to see how much time the reads take in GCC. So what we're gonna do here is we take a snapshot of a timestamp, we put it in a per TID variable. And yeah you do that on the C-Score enter. And when it exits you just make the difference between the current time and the ones, the snapshots you made earlier. And voila that's it, there's a few insert stuff you can clean, delete stuff but basically that's it, there's a few lines and you can have an histogram of the time it takes for read to process. So yeah, structure remember sometimes you need to go a bit deeper and look what's into the argument on the function and often you know this function as C structure. So you need to navigate them and BPF trace allow you to do that. So in this case you can see first line I had to include an error. If you have a channel with BPF supporting it actually you can even skip that step. And so yeah what we're gonna do here we're gonna look into VFS read function we have a problem with that function and we want to look say what's the access write you got on this file that you open. So you're gonna look in the first argument of that function that's arc zero it happened to be a strict file and there's gonna be an indirection with the members that you need to see here it's F mode and while it's in that example there is only one level of indirection but there could be several of them you know you have structures that point to those structures that point to the structure that finally points to the members that you're interested in. And yeah, well yeah, well you can see that GCC seems pretty consistent it's only read files that it has opened without right rights. No, nothing really important here just to give an example. There are a few other functions that are available in BPF Trace so we already seen printf what's gonna be important here I'm not gonna say that. So there's a few functions you know to resolve either as an address or with a symbol or to do the opposite resolve a symbol with an address. Stuff like also you know execute shell command, stuff like that. Some moment when we switch oh yeah stack stack that might be useful you know just get the cool stack for where the probe has been fired, stuff like that. And because you know that presentation was pretty short I guess to really see what's or everything possible in BPF Trace so I just wanted to punch you to for the people who are interested to know a bit more about it like two things, two resources that are fun useful. So the first one is BPF Trace one liner tutorial so it's a tutorial it's pretty short it has just a few one liners one after another and you know it's end on experience on BPF Trace and if you want to go further than to BPF Trace reference guide it's actually not that long it's pretty short for reference guide and for everything you're not gonna have in the tutorial that's gonna be it. So do you have any question? Any questions for Jérôme? No? Oh I must have explained every single thing. Do you need the debug info, the kernel debug info for that to work? No I don't remember exactly. No not the debug info I think you need Devel and others. Okay thank you. Any more questions? We still have time? I realize I'm going to sound stupid but it took me a long time to understand that when you ran those scripts I was asking myself what are you tracing? But I finally understood that when you run a BPF Trace script you are tracing everything for future incantations of this talk you might want to make that clearer to the beginning for newbies like me. Thank you. Okay sorry about that. Related to the question maybe what does the print F output to go because it's printed from inside the kernel and so it goes to the controlling terminal of the BPF Trace process. Sorry? Where does it go? The message is printed by the print F function. Yeah. Where do they go? Do they do the controlling terminal or where do they go? On the terminal you are on, you know. The messages are printed from inside the kernel, no? No no no. No no, all the pinterest you use it's not in the kernel whatever, it's on the terminal. It's a common live function. I think I can help on this question. The print BPF print K or whatever it's called. Go to ring buffer that is in debug FS I think. And then BPF Trace can read from that and print it on the terminal. What kind of version do you need to have any of these features available? Sorry, what kind of? What kind of version, what is the minimum kind of version you need to try to use BPF Trace? Kernel version? Oh, I don't remember, it's four something. Sorry? I think I've heard four eight. So the answer is four eight. Any more questions? Yeah, continuing from the first question about tracing everybody, sorry, everything in the system, is it possible to attach it to a specific container or a C group or something and restricted to that? Well, you can trace anything that's on your kernel for at least for the K probe and whatnot, but I don't know anything about containers. I know there's some function about C groups. So, but I don't know what container really are. You had that example where you were filtering for the command name. Yeah. I just tried that and for some reason, I can't use this com variable in the function anymore than it's on purpose. Do you know? Yeah, I don't know. It's one of the basic buildings variables. Not sure what version do you have or... 5.4? 5.4, BPF Trace 5.4? So, yeah, it's... It's pretty old, I guess, really. BPF Trace minus minus version tells me BPF Trace. Oh, maybe it's old. So, yeah, I don't know, normally it should work. I did all the example, you know, I just... It's a relative reason, so sometimes stuff don't work. Oh, it breaks. Okay, I'm sorry. We're all out of time. Thank you for your questions. Thank you, Jeroma. All right.