 All right, hopefully you guys can hear me okay and I want to thank you everybody for coming seems like it's a popular session from the Linux Foundation monitoring, I mean, tutorials. So, I have, you know, I have to cover quite a bit of stuff. And so, I guess started and I will, you know, see if you guys are following. Sorry, I was distracting I see people posting stuff. If you don't have to answer a question maybe just. I appreciate if you don't put it on this chat or I'll get distracted. Thank you. All right, so let's get started. So the the talk will cover interest ratio into introduction to units Linux tracing and and it's calm concepts concepts. So, the story of tracing is a long story, and the first thing that really was generated and used on Linux I would, I would, you know, is that the famous detrace system call is used normally by debuggers to control the programs that that you want to debug. And it's also used by the trace. It provides a way of controlling the execution of a program. You know, in a way that allows the program to be some sort of constraint constrained into what it can do. Some of the actions that are allowed to happen with the trade with P trace are, you know, you can start a process under the control of the trace so therefore the debugger can interact to it. You can attach to an a program that is already running. You can execute the program because when you started it just waits for you to tell it what to do. And you can also interact with the program by setting up reading and bright memories on reading and bright registers. So that's, you know, the beginning of the history in terms of tracing is not really tracing yet, but that's what we have and here I have an example. You probably all know about it. Sorry, as trace examples. Here there is one that can show you the system call that happened during the execution. So here I have an execution of a simple commands at eco. Hello. And you were monitoring the system call open and bright. So here is is the output. This is, you know, you can have a much, you know, bigger output depending on what you're monitoring. You're just, you know, an a small example of system calls that have been called. You have three calls to open and one calls to write so that's the minimum easier way of kind of get an idea what the program is actually doing. And other ways to get some sort of statistics about the whole program. And, you know, it will tell you the system calls and the time it used to execute. So that's one example. There are many, many options in the trace. But this is the idea, you know, first this is a very light to give you an idea of what the behavior of your program is. Other way in history, and I'm going through this to give you some background because I will refer to some of these concepts throughout the whole presentation. So for the usual way of interacting during a debugging session is basically following this pattern, which is basically run your program under, obviously under p trace to execute until a certain point in your code. And then the program stops and at that point you can inspect. You can figure out what's going on in your program by printing information and the information that you normally print, you know, easiest is to print some variables. It could be global variables, local variables. You know, arguments, you know that are visible in the scope with the point where you stopped. And of course you can figure out a side of what's happening now what are the values of these variables. The backtraces also able to tell you how did I did I get in this particular spot right so pretty simple. You can also modify information. And you can test an alternative execution path, let's say, where you can actually modify the values or some of the variables and this is usually done when you try to figure out why this value like X is equal to five. I think it should be two. Let's try to put it equal to two and then continue the execution. So this is kind of what you do when you do a normal debugging session. It's all done interactively under this control environment environment that p trace allows you to do. So it's a kind of an artificial, I would say situation where you do debugging because it's not the natural quote unquote, you know, flow of your program right you have perturbed the situation. It runs under the under p trace, and you are stopping starting you know so the interaction with the OS and and other things is a little different. It's an important concept that we've been using in a sense, but is very similar to a break point. So what is a break point it's a way to stop the execution of a program at a certain instruction of your choosing you decide I want to stop the program in this spot. And that's the way that the buggers do their work. And basically the way it works is the instruction at which you want to stop gets exchanged with an illegal instruction and the type of this illegal instruction depends on the architecture that you're working with right. It could be just a illegal instruction or some architectures have a specific breakpoint instruction just defined designed and used for debuggers, etc. So, when you reach this particular point during execution, you generate an exception and then the debugger takes control through because the trace is involved right so it traps and gets taken basically by the US under control. So it's all done through be trace. So this is a pretty general I say concept that you will see this over and over again in various in various forms that are used by also tracing, not exactly the same but the idea is the same. The idea that you probably have seen is profiling beside debugging rate so profiling is another way of getting an idea about what your program that you're interested in is actually doing the difference here that profiling is usually tied to doing in a statistical way using sampling that are done with a certain frequency. And usually the events that are recorded are fixed and they are defined often in targeting PMU events so the performance monitoring units of various processors so each different architecture defined different ones. Some are you are, you know, looking at, I don't know, brand number of branch instructions executed misses etc etc there are very, very many available. So that's some somewhat also gives you an idea a generic idea so you can kind of zoom in your, your issues right. You can try a trace you can try debugging you can do a little bit of profiling. And then, when you have kind of a clue, a little bit, you know, of what your program is doing assume that you don't know the problem I mean, you can discover discover how the program actually works using back traces and stuff like that. Then finally, you can get to the actual argument. So there is a topic for for for this talk and in general, what you want to do with tracing. So, the idea of tracing is that you want to collect information about your program and what it does and where it goes wrong, or maybe even when it goes well to see what the actual it actually does. And basically, the main goal is addressing should not make your program running in an awkward way, adding overhead, and, you know, modify in the flow. Normally. Elena. Yes. Raise your volume a bit looks like. Sure. Okay, how is that better. Yes. Okay. Okay, so, as I was saying, you know, run on perturb is the main goal of anything that gets added under the, you know, this is part of tracing. So, again, the idea is to collect information at certain points during the execution of the program. And then, once you have this information, you can manipulate the information before presenting it to the user, like, you know, filtering or aggregating information presented in a certain way. So, you know, you can display the information collected. Usually the first to this part of collecting the information and do some sort of filtering and manipulation is done in the kernel. And then the information gets propagated to user space where the users can actually see it. So it's similar concept, but not quite the same as debugging this happens in real time is not interactive, you know, you just run your program, try to not modify what your program would do without observation, right. And one other point is that you can add as opposed to profiling. Normally you can add points dynamically where you've collected information. Some points are already defined for you some points you define on the fly depending on where you actually want to, you know, take a look and spy on your program. So let's look at what I used to call a brief history of tracing but now is actually becoming a very long history of tracing. You know, it's more or less 20 years ago at the beginning of the meet to that 2005 2004 and forward to where things started happening. So, there was a bit of, you know, I would say resistance, or, you know, difficult to have this stuff integrated especially you know in the kernel. You know, at the beginning there was like, well, you know, why do you need this, we can do this using a stress and using kgdb and other things, especially for debugging the kernel right so there was a little bit that met. It's, you know, not as, you know, not what we want to do. So, but also there was some legitimate issues, which were, as I was talking about before that adding this tracing information and to be able to collect the information would add a lot of overhead and slow down, especially in the kernel. Right, you have to be fast, there are, you know, issues with real times and a lot of other things scheduling that you don't want to go and you know mess up because you've added a trace point or something like that. Also, another issue was the developers and the maintainers were worried that once you add points to collect information and this points are added into the kernel code at specific locations and state that you want to trace, you know, one particular part in the scheduler and you want to trace and print, I don't know, parameters of a certain function. And then on top of that you got scripts and other constructs and tools that we use this. They rely on this information and they use, you know, function full with parameters X, Y and Z, but then, you know, the developers feel compelled to maintain those, but maybe that that function needs to be changed for whatever reason to have a different parameters or different name or different parameter values and variables. So they were very worried about this problem that the trace points become frozen so eventually, this kind of was, you know, decided to be added the specialist static trace points in a way that, you know, it's not frozen in time in the kernel, but, you know, we try to add them, not with them, the parameters try to add them in a way that is very not easily added, you know, very reviewed, very carefully and very prudent. Okay, so that's why then the development of the dynamic tracing has become important because the static tracing is somewhat limited. Nevertheless, a lot of the static trace points have been added, right, in many subsystems in the kernel. And then we're talking about, you know, different tools, different things that have happened, you know, some, you know, the one of the first one was LTTNG, Linux Toolkit in the late 1990s. And then we started with K-Probes. I have some pointers here of, you know, the first sign of life from these tools, K-Probes, SysTentApp for Linux in 2005, LTTNG, which was a rewrite of the original LTTNG in 2006, Ftrace as a tool at a bit of a higher level in 2008, PERF at the end, later, it's always in 2008. We have Dtrace for Linux, which we started in 2011, and then, you know, BPF as another type of infrastructure in 2013 for tracing specifically, even though this predates to that particular day. Elena, I have a question. Yes, sorry. That might be better to answer now. Can you, can you please compare quickly what are the pros and cons for the following K-Probes, SysTentApp, LT tracing, Ftrace and PERF? Yes, I will cover all of this throughout the presentation. So, if we can postpone this, and then if there is any other questions towards the end, we can revisit that question, if that's okay. That sounds good. So, alright, so now let's talk, and this is where we're going. Let's talk about the infrastructure, right? As I said, we need to have this infrastructure that allows you to specify the points of interest during the execution of your program. You need to be able to specify what information do you want to collect at these particular points. You need to process such information, aggregate, collect in different ways, create, you know, very ways of representing this information, and then pass the results to the user somehow. So, so now, where comes, here comes the probes. So, basically again, this is a concept that is very similar to breakpoints in a sense that you, the goal here with probes is to associate actions that are performed in specific addresses when they are reached during the execution of the program, right? So, if you're not touching this particular place where you have the probes, the probes does trigger, does nothing happens. And if disabled, nothing happens, but you can insert it at a certain spot, and when the spot is reached, and if the probe is active, then do whatever you want to associate to that particular problem. So, and as I said, again, the information is generally collecting information, sorry, the action, collecting information and process the information. There are different types of probes that have been added to the kernel and infrastructure. We have K probes and U probes. K probes are kernel probes, so probes that are added to the kernel. And U probes are probes that are added to user space, so they are slightly different. The concept is the same, but they are implemented in different ways. And then we have other K probes, so K retros, which is probes, I will cover that, and U retros. And for tracing the Linux kernel, they are available for use. As I said, this is dynamic, right? You can put them anywhere, well, to some limit, with some limits, but you, the user, the person that is want to know what's going on, and you can say I want to have a K probe and this spot do certain things. However, not, it's not always available. You have to configure the kernel with config probes, yes, there are a lot of configure flags or, you know, that you can actually add, you know, at the beginning there was just K probes and then there were many variants. But this remains one of the main ones. You can surely look at those later. I'll show you where you can actually look at them. As I said, the main concept is similar to the breakpoints. When you hit that K probe location, the exception is caused, and the kernel takes control through the execution of the exception handler, and it performs the actions that have been specified for that particular probe. And it's usually, you know, a building block that is used by all the tracing tools that we have in Linux, they more or less all support the K probes and how to specify K probes and how to read the results, etc. For you probes is similar, but this is in user space programs. However, the execution happens in the kernel, and because it's faster, and so the trap is in the kernel, and you do the actions and produce the results passing them to the user space to return the resume, sorry, the user space program again. So this also requires a configuration set up in the kernel config on the square you probes. Equal yes, actually there's not just there to be equal yes. It is described as a think about it as a location is specified as a location, very abstract speaking, right. So there is an I know which is the file and offset in the file which is the location, and then a list of actions that are what you want to do at that particular point. So it's similar to a breakpoint right you say break. At, you know, food or sea line 25. And this is similar right file and location, plus the actions. There are many probes stored in a red and black tree. Okay. And you can add many probes for the same location. And each can have different actions. It also allows conditional execution as I was saying before there is a predicate somehow that different tools representing different ways but that's basically a filter where. Yes, I hit the probe, but hey I really don't care, because some requirement has not been satisfied that that I care about right so I'm not going to count that particular hit and add to the statistics or whatever right for for return probes basically is tells you that you know instead of if I say break or install a probe at a certain function, it will be at the beginning of the function. And if you specify well I want actually to stew the return at the end of the function and see what they function returns, then you use this type of probe called return probes and you have a version for the kernel and a version for the user space. And this one in it's a bit more complex is a double step or whatever you want to call it. You stop at the beginning of function foo at that point with the exception handler is puts a probe at the end of the function at the exit and then runs to their and when the function and exits the execution of the action associated with the return probes is done and then the results are presented to the user. So this is an overview of the type of probes and you know this are I'll show you then how to actually use this stuff but first, we want to talk about another way which is this static trace points. Which are to define this static defined tracing. And these are probe points in the kernel code. As I was saying this are maintained and added by system maintainers kernel system maintainers. There are many in many different subsistence more added, etc. The syntax is universal throughout all the, all the trace points in the kernel. So that many tools can just use them because the syntax is the same. And then how they're defined. There is a file called trace point dot h and include file that just the list of all the trace points but before they're added there you actually have to define what what they're doing so there are two components. They are the action that need to be executed. And so you define under this event directory include trace event start dot h and as a bunch of different files each different for a similar area for a different subsystem and say, and you can define one single event trace event using that particular macro, or you can define a class of event where you say define event and declare event class as a group of events all together. So once you have defined this, of course they're not active they've been defined, but you have to put them in a spot where they trigger. And so the way you do this as a kernel maintainer I don't know if you are doing this probably as users you won't see this but if you want to look at the code. It could be useful for you. And so the way where the stress points are is where there is a function call to a function called trace underscore and then the rest of the function name is the name of the actual trace point. So here is an example of alarm timer dot h. So here is a class and there are more but I have two here. One is called alarm timer fired and one other is alarm time start. As you see they have the same structure, and then they define a class called alarm class, and you see this first parameter in the define event is alarm class which means alarm timer underscore fired belongs to alarm class. And so for the other one similar. And here you have a structure. You know, no structure is an abstract but you know in the way which is made, which tells you the prototype of this functions, the name of the arguments, the structure, and then a fast assign that means all the tools can I put into the structure, which is the ones items that will be printed in the print K with the various formatting. Okay, so this is everything is automated. The assignment and the print you know when this thing is straight is, sorry, it's hit. That's what's going to happen. So in the usage of this trace point so in that alarm timer dot c file. These two particular ones. One is called trace underscore alarm timer fired inserted and I haven't printed everything just to show you, you know, but in this particular alarm timer fired function. This is a trace point and similar in the alarm start function. You see that the trace point call alarm time start, which, you know, going back to that one matches this to trace point, and the stuff that gets printed so that's how the kernel does this there are many many many of those available. So another part. The trace FS and the infrastructure and the building block. So this is the system file pseudo file system. Six kernel tracing or here actually system kernel debug tracing is mounted as that too. And it is mounted automatically. If you have the F trace configured options, the configure options are set to any of them. So for instance, the easy ones config underscore F trace equal one. And you can look at this in fedora, which is what I use that's what I tell you I mean probably Debian is a little different but to config dash kernel version. You see what's been configured. Yes and no, you know, so you can take a look there. And there are a lot of these files in the trace FS that I use to control F trace. So first let's look here at trace event. That's what you see here that are a lot of files and the blue directories. In brief, you know available events are the events that are actually you know there that you can actually use available tracers this is more related to F trace behavior, what F trace will will will collect and these are what's configured in and then current tracer is what will be traced at that particular in the particular run. There are, let's see, what is it trace is a file where the output will be collected so it's a buffer is a trace buffer. And you can read this trace file. Similarly, you can read the trace pipe. Trace is all consuming and the trace pipe is a read incremental read. And I'll show you the rest of these files out there used. So the next item so what what you do with all this right this is the infrastructure that's there and the others. Tools are using the tools are supporting this infrastructure so F trace is a kernel tracer and monitors different areas and activities in the kernel. And this is this CIS current debug tracer or system tracing for control and for the output. There is documentation in the kernel. And this the files, there is a file that tells you tracing on. If it is enabled or not trace, as I said and trace by the output buffers available event and tracer are what's available for me to do. So in events and new probe events is a list of the probes that are there. It could be empty if you haven't set because these are dynamic events, right. And then there are a myriad of other files for a different, much more subtle controls of F trace. So, here, for instance, if you look at available events, you can do a graph for alarm timer so these are existing static trace points. So I see that there are four things in alarm timers that are available that some kind of maintainers put in there already. Each of these. There is an alarm timer directory which has the class. And inside there is a sub directory, each of which corresponds to each of the timers, I mean, trace point belonging to that particular class. And then each of them alarm time and fired alarm time start has a lot of little files, which control how the tracing is done. There is a filter the format or specifics specifies the output that gets printed, enable whether it's on or off. And here are cumulative you know so enable in the alarm timer directory for all of them at once and then you can do one at the time. A simple example here. I can clean the trace buffer by echoing zero or just saying echo actually. I would not in the current tracer I don't want to trace anything. That is built in enough trace, but I want to enable this events that are already defined statically in the kernel make directory and a fork when I entered the system calls. So then I tell started tracing by echoing one in the file, randomly recommend and just created a directory, and then stop the tracing, and then look at the trace buffer. And you will see that the tracer is not because I told don't give me anything I just really want to do this too simple tracing. And then the output shows that there are two things that fired one in a fork. Because with bash, I started to make dear so that's its own process. And then I for can I create the make dear creating a process as you can see here the child 61385 is 61385 and that one triggers to make dear. So that's one simple example. And have some use of F trace with tracers that have been enabled. I mean, sorry, that already exists there. And you say, give me back trace, give me all the functional so here. You have to clean the trace buffer. And the tracing and the tracing and in between sleep too. And I forgot to put, but you have to put current tracer equal function, you have to echo that in the current tracer fire. And then this, this is your output so you get a list of function calls and who called who so mutex unlock a call by RB simple right etc etc so this is a list. A better view of this. So same story, use the function graph tracer as opposed to the function tracer is another built in tracer and this one. Again, start the tracing and the tracing do a sleep in between show me the trace file. And here is a more elaborate show of your functions get called. Now this is much longer so this is just the beginning. This is the CPU it is and how long it takes etc. So this is just a little bit of the file but there is much more in the output. Again, as I showed how to enable various events so there's the fork, make their name, there are different forms of make their system calls in the kernel. So this is a little example. So through trace events you can actually modify this files K probe events and you probe events you can add by hand quote unquote, you can add this probes. So here's a little bit of example, I will show you better here. This is a K probe. So here I created a probe called my probe with args and my probe no args right so they're in the same spot one and then do make their IT function. Right, even though there is already a static thing but I'm just creating a dynamic one. So the first one I print the path name and the mode. And in the second I don't print anything. And I put them in the K probe events and I echo in there with a pending. So then I can look at the K probe events sub directory and I can see these probes have appeared in there. My probe no args and all the little files underneath which control this probe on or off and the same for my probe with args. And then I enabled each of them by enabling one in each of these files. And then I cleaned the trace buffer and I started tracing doing full the full bar directory, creating a directory so on created directory off. And then show the tracer tracer is still not up. I just want to see this tracing that I had before. You see the fork that I had before you see the make directory tree in. And now I also see my new probes that make directory no args and the make directory with args and they're printed the path name and the mode. You see they're the same location. I make the make the AT plus zero. So the beginning of the function. Basically what F trace can do there is more you can do but it's all done directly that will mention there are other ways of doing F trace through interface called trace command. It's a command line and through graphical interface called Colonel shark, which I'm not going to talk about it but there will be another tutorial by Steve rusted dealing with F trace so he can it will go over all this in the next few weeks. I think or months. So Perth is another tool, which is now in the Colonel, even though it's a user space tool but it's stored in the Colonel. It was started in 2008 as a performance counter interface. It was initially called Perth counters model over Perth mon. I don't know if you're familiar with this, something that was created by an engineer, Stefan Iranian at HP. And so it was made for Linux with this per counters. It kind of grew, it grew, it grew, it grew, it started incorporating the statistics but also this probe support, just like a trace, but it's a little bit higher level so the syntax is a little easier. Again, there are different ways of displaying the output. The documentation is in the Colonel directory tools per documentation. If you use it, you want to install the Colonel debugging information RPM. So, as a separate Colonel RPM so here's the command yum dash dash in a variable updates debugging for install Colonel debugging for on a fedora system. Any subcommands of Perth stat, which is the traditional statistics like the Perth mon and, you know, all profile type stuff per record is runs a command and stores the output in the Apple file. Perth data per report read the data per script actually is another one that reads this data. Perth diff per top per probe a zillion of those I'm not going to cover them here, but there are many, many, many different subcommands that do different things, but I am going to talk about the probes here so that's what's going on trade. So, let's try to see how can I look at probes here. So Perth probe dash F, and then an expression to figure out, you know, where can I put this probes. I really don't know the current very well so I say something that's right pages. It's like a gazillion function that have right pages but let's focus on do underscore right pages is listed as one at which I can put a probe they're not blacklisted so let's put a probe in there. So, what do I do though, before I put a probe, I'm curious to see the code, the right page do right page code. So, this is because I installed the debugging for for the kernel so I can see this. So you see the body of the function and then you see the numbers. So the lines where there is a number, you can actually put a probe, but without the number no. Why, because it's been optimized by by the compiler. And so the code has been scattered around reorganized. So sometime is a bit hard to put something in there. So, let's see what you can do. When you know where you want to put a probe in this function but you also want to see what can I print what what variables can I use here so dash V tells you what you can print. And in this case you can print mapping and WBC. Those are the arguments. So let me try put in a probe and to a line number and oops, that's not one of the line number where I'm actually allowed to put the probe to go back to the previous slide, you see seven is here, but no can do that. So let's just do it at the beginning so do right page. Perf probe do right pages WBC is what I want to be printing at that particular spot. So there you go. Finally, I'm able to set a thing, a probe, and then now this is just for your information is not something you have to do but I wanted to show you how this interacts the higher level setting with all the low level trace FS file so first of all, you see the traces. Sorry, the probes with perfect probe dash L. So these are the ones that I set up earlier. They're still there. This one also and this is the new one that do right pages. It tells you with WBC that's what will be printed and in the other two. There's other parameters. Now, do we see this. Oh yeah, I can look at the trace FS K probes file. And I see those files. They're still there the K probes with my probe no arc and my probe with arc. And also in the probe subdirectory, I can actually see this new one do right pages. Right. So that was added by Perf. So you see at the low level you got the stuff. It appears. Are we sure another way to check, you know, you have Perf list tells you all the probes that are there are very big output. But let's do a do look at do right pages. So there it is. It's a probe. I added it. And also, are there any other probe now that I think about it. So there are probes because there is probably the name but if you look at the type. You have the K probes that we have added before and the probe that you added later. And these are all trace point type of events. And then the last time, you know, x ray is you look at the K probe events file and they release the type and what happens where they are and, you know, what will be printed. So that's another way to confirm that what you've done and what you set up is proper. Let's try it. So I have a command file that does enter one into a file called sink and it makes a directory. Then let's run Perf under the record. And now let's enable only one type of probes the probes, not the K probes. And here is a means all the CPUs G means give me also back trace are is a raw output. When I execute all this stuff right there been a sage command a sage. So it does it does oh yeah I got some data. And then let's look at the data and this again it's a really small part. The script shows you the data. So here white white pages was actually used by SH when we started executing this series of commands. Again, by sync. It was also written by sync and then make dear call the make directory. The probe at the make their AT file. And then here you see it prints the argument that I told them the variables WBC DFT path names and so on. And then I can execute only the K probes. I just say K probes star. The same command is the only change. But then you'll see that the other stuff does not trigger but this one triggers the old K probes that I have other than my problem works and my problem with arcs, both trigger by make dear, and one does not print anything and one print the arguments. So this all kind of connects all these events together. And you can actually print all of it by adding them both the K probes and the probes. So there's a lot of flexibility here around the commands and then per script will show you the same outputs, but all together. Also, you can look at in the trace FS K probe underscore profile tells you how many times this probes have triggered. So this is kind of giving you a bigger picture of all these little pieces and how are they all working together. Now what else. Whoops, kinds miss type of there. What Perf can do best though is really per stat. There's a lot of events that Perf also monitors without adding probes and usually hardware events, hardware cash events, software events, PMU events or performance monitoring unit and that is very much hardware dependent and K probe event. Sorry, trace point events, which are the ones that are already set statically in the car. So a statistic with Perf use per stat and then execute the same command. It gives you a summary of what is going on there. And then you can zoom in by, you know, say, you know, I want to look at the branch instruction branch misses and cycles. So it gives you a certain output just of those little things. You can also do the same command, but you can add that sharp tree where that means the same command is repeated three times. So then you can get a statistics in terms of, you know, variation and deviation of, you know, what happens, you know, to get a better idea. So that's Perf when Perf started and that's what Perf was created for and then the rest of stuff was added on top of it. So those are the two tools that are widely used, depending on what you want to do now. Next infrastructure piece, which gives Elena rise to and yes. Before you switch to be here. Would you like to feel some questions specific to keep. Yes. If I can answer sure. Yes. Yes. One, the one question is specific addresses in probes are addresses of functions in memory or addresses of variables in memory or something else. Yeah, I mean you can just say the variable name and then everything has been resolved you can print X right in a certain variable. That's not, you know, I don't have to worry about where the function itself is actually stored. And for the function again you can use the name of the function. And then, yeah, assume you can actually I don't know exactly but you can specify the line of that particular function that if you want, or you can specify an offset. It's really flexible you can do all sorts of stuff in locations, except for the static stuff right the study stuff is there. It has a name. And that's what you're going to trigger if you want, if you enable those. Otherwise, you can just, there is a lot of flexibility around how to set these points, you can set at the beginning by default at the end by default with the return probes. The middle by my number by offset. You can use predicates. It's really really really super super flexible. You can specify instances like the CPU they are monitoring there is a lot of stuff that you can do that I'm not covering here that's I mean it would take, you know, a week of a presentation to cover everything but yeah, it's really really flexible. Thank you for answering your question. Call one. Yes, it answers my question. Thank you. Thank you. Okay. Thank you. And if you people want to know more detailed things. You can send me an email after the talk, you know, was available to answer questions there. So that is one. Alrighty. I'd like to take one more. If K and you probes are similar to breakpoints. Am I correct in assuming that the modify the instructions of my program or the leverage loaded by my program. If so, at what time does this modification happen. Well when you set a probe and you enable the probe then there is a breakpoint instruction there. And then the execution of that introduction creates a trap. That is where you actually handle the collection of the daytime whatnot. So when you disable then the trace point, then it becomes, it goes away, right, you're not going to hit it. If you have a predicate, you can still hit the breakpoint but it records your information. So it's that's kind of the way. I mean, it can be also done in in more sophisticated way using a jump instruction as opposed to an exception that saves a little bit of time. But that's the main concept with the K probes. Okay, let's move on because I'm getting So a lot of more stuff to do. Unfortunately, sorry folks. So BPF what's BPF Berkeley packet filter who you probably have heard that this is the latest thing. It's, you know, center of attention. Everybody's talking about this, you know, there's an infrastructure that allows user defined programs to execute in kernel space. The program I'm saying here is written in C because that's the easiest way for humans to do it. It does not necessarily have to be written in C can be written directly in breakpoint language with their instruction and registers and blah blah But I mean, now people do it in a higher level language that can be translated into BPF. We have a way to do that in just seen a which actually my team at Oracle contributed and originally was only in clang and LLVM but this one translates and generates BPF problems. Since you have programs into the kernel, you're like, oh boy, this is not very safe, but no, there is a very fine in the kernel that tests programs that they don't do infinite loops, they don't loop in they don't do weird calls, you know, it's all very, very constricted. The kernel has just in time compiler for several and architectures, R64, X64, I think S390, there's a ton of this that are now supported. And it is been around for a while, even though it was used only in the networking space to basically filter packets, right. So it was called Berkeley packet filters for that reason. And it can be referenced in the past, it was called classic BPF versus extended BPF. And now it's just BPF is the same, you know, more or less. So in order to actually run this thing, you need some sort of a helper housekeeping program because really complicated to compile the BPF program loaded enable certain thing pass information around. So I'm not going to get into too much detail here this probably is another talk in itself, but the programs are different types, many programs types, but they are associated with what they're used to what they use for sorry. So there is a socket filter type that's still connected to networking. And here you see just K probe trace point and birth event are connected to tracing. You have a program by using BPF program run the kernel macro, and you pass the program and you pass a context context is a bunch of information that will be used by this program. The type of context changes depending on these various types of programs. Again, this is not something you have to deal directly as a user but so you know. As I said is BPF, each BPF program is done within a certain context, the cost contest can be used to pass arguments to help her functions as well, provide data so which the program operates. So for K probes and probes in general is register set for trace point will be the format string of that point that means what will be printed and how for network working filters it passes a buffer when you store the information that you're filtering through. And then we have helper functions that can be called by BPF program. These are not random functions is a function that are defined by BPF itself. So they're known and they're enumerated. And again they perform different things in different area on maps maps is another way of passing in for around tracing networking, etc, etc, and this kept grow keeps growing this list keeps growing. But our maps you might hear this and it's a bit of a complicated concept but in general, we can say that maps are key value storage. And they use again from the transferring programs for BPF to use a space and from Colonel to BPF programs and share data among BPF programs. There's a file descriptor they use this file so you open and BPF system call returns about descriptor, and then you close them up with a close on the file descriptor at a later time. So you can pass this file script around different parts of BPF and and sorry helper functions. The attributes of a map are the number of elements that they can be stored the type of the key and the size of the value again different maps of types. Some for tracing a stack trace some for Perf event and blah blah blah and again as I said this keeps growing and growing. So this is the BPF sis call really briefly. It's a sis call which is a huge multiplexer can do everything. So it's one sip call sis call with many different variants depending off the parameter that you pass right so this starts a BPF creates a map, you know blah blah blah blah and again as most of the, most of the BPF work is done through that. Now, how do we use this stuff. It's all very complicated I mean I have a talk that I did at Lino's con several years ago when this stuff came out. And it is really gnarly, as they say, there are a lot of details, and you're like, Oh, come on. That's crazy. So basically there are tools that have been written on top of BPF. So I want to mention here the BCC the BPF compiler collection is a set of many little programs that perform common tracing and performance analysis tasks. So they're set in Python, they use LLVM and not GCC. So they're not specifically to tracing. And you can also they have an API to specify some common operations that you want to do with BPF so normally I assume people look at this. An example that's similar to what you really want and then you kind of modify make a few attempts on writing a script. So who comes with BPF, sorry, BCC tools, RPM, that you can install and in the user share BCC this is on Fedora. And one parameter, sorry, one script tells you the exact calls. I write it on my Fedora 34 machine. I got a bunch of warnings, but in the end I did get an output. And then I control seeded obviously you stop this is continuous, and it tells you what happened, what P called the parent P it was the number of function call the arguments. And you can do this really easy one liner but if you look at the length is 307 files of code so it's like well okay. So this is unfortunately the nature of the game here. So this is with the BCC BPF type of infrastructure, the first layer on top of BPF it might have all it might evolve in something else but now that's what we have. So let me talk about the trace this is something that it's available on Linux. It's something that my team has done since 2011 reported it to Linux from Solaris. It's a very well known tool very well documented. It has been ported to different OS like BSD, three BSD windows even a few months ago there was an announcement by Microsoft people. Linux, which is us, etc. So it's very high level. It is very powerful. And it's easy enough to do very basic and simple tracing but also to allow complex tracing with many probes in many spaces, and then higher level manipulation of the data is stable as an interface you know and it's also stable as a thing to run in your system always on to see if you are intercepting some problems, I don't know, some files that should be open and they're open, I don't know. As I said, in the first version was into the 2011 we've been, you know, working on this since adding features, making parity with Solaris. And now, in the last year and a half, we are implementing it without the kernel patches that we were using in the first version. So basically what we were doing the first version of the trace was a more of a verbatim and also verbatim but you know very similar architecture as the database on Solaris. There was a lot of support in the kernel itself. At that point in 2011 there wasn't as much infrastructure into the kernel as there is today. So now we can actually deviate from the original infrastructure that required a bunch of kernel patches and implemented as a user space facility with fewer kernel patches we have a few variants. Sorry, a few patches that were actually trying to propose upstream as becoming part of the kernel a couple of features, not a lot. And they've been previously discussed and we thought, oh, we should use those and now we actually want to have them in the kernel. But more everything else is done user space so this trace is like a single thing to install. Not a lot of packages. It's just that you have base utils. So today, excuse me, leverage, leverage BPF and other kernel facility like the trace FS system and stalling. Okay, probes, blah, blah, blah. You can see it on GitHub, and it's available and there's a mailing list and it's available as RPMs on Oracle Linux at this point we hope will be available on other Linux flavors going forward. So we have actually some ways of running it on Ubuntu and Fedora without a lot of changes, but it's not the right moment distributed with them. Anyway, so the Trace to the architecture as I said, do a lot of stuff in user space, the kernel provides probing mechanisms already which didn't in 2011 2010, when we started BPF can give an execution engine for all this programs that are associated with probes. We attached these programs to the probes and the programs that are created are the clauses specified in the trace. The output is written to the same perfect and buffer that is read by Perf and trace. And the trace reads that represents it manipulates it and that all it all it needs also contain. As I said the D clause that you produced specify either in the scripts or in the, in a script or on a command line are associated to each of the BPF program. So we connect the PPF with with our programs in the traces to use a trampoline that then the trace does sorry that BPF executes and then connects to the actual work that the trace does. That happens at the trace point sorry just. I'm trying to say the trampoline calls the BPF functions for the probe clauses. So the trampoline the BPF itself is a little bit restricted in what it does the three stuff is done after the trampoline is executed. So here's a quick diagram. What happens in the kernel space and what happens in user space when user space, you start from a descriptive at the bottom right, which is different clauses put together the clauses that compiled into BPF program at the same time the probe is defined to use the parameters that are defined over single space. So for each of these clauses you create or use K probes new probes or trace points depending on what you want to do in your script. So the creation of a K probe, the creation of a you probe or the use of a trace point. This will be reflected through modification in the trace FS as I showed you before. In addition, it is associated with a perfect event with a behalf BPF program attached to it, which has been compiled from the closed to BPF language, using GCC, we use to see here. So, simple example. So here's the code. Sorry, the script. You basically every one second you have a tick element, and every one second you just print I and increment I nothing. And at the end, you print I the value of I at the end so pretty simple. So here you see in an action you call the trace with the script tick tick.d it prints 123 56 and then prints the total results nothing exceptional. But the reason I did this is because this can run in the background. So while that running in another window. I'm like, let's look at what actually happens in trace FS. Right. So there you have it. The probes that have been added by the trace by executing this program to the you probe events decide begin and end probes that are always there by the trace created by the trace the beginning of the trace and at the end of the tracing. So you can actually activate commands and the end as we saw. You can use a tool called BPF tool. If you want to see all it see. Did I actually create the program with the trace. Oh yeah, I did that they are. There is one at the end. One at the beginning and one to actually do the stuff that I need to do. So this BPF tool is part of the BPF infrastructure and it shows you the programs that are in, but that's not all the best way of seeing what you did traces doing is actually enabling the disassembly on the command line and eight is the powerful disassembler. Sorry, the level of the disassembly is most powerful is with eight that shows you the actual final instruction for each of the programs BPF programs created now here I truncated because it is a pretty big output. And then I realized, oh well, the first three instruction are the same. But anyway, so you see the begin the end and the tick one second instruction which corresponds to the three ones that we saw in the previous right so but then if you want to see it you have to see the whole output. More complicated F file that you can use script with tracing and the trace here you can do a lot of things. And then you can show you what that does. And you can run it with the script. And this produces a histogram and the timing how much time you spend in a system call. And then you print some statistics at the end that tells you you know the number of calls how much time spent in each call. And then the standard deviation spent in each call sorted by error number so you grouped sorry not the group by error number right here is one of the parameters that you're looking at. And then at the same, the same information sorted by a function but presented as a histogram. So this short command gives you a long output so it's that show full, but I printed a little bit. On three different slides but so the number of calls it just tells you you know the number of name of the functions. One time, except for what was called and close was the highest call 24,000 time. And let's look at this one connect, oops, connect for 25 times, and then how long time, how long has been spent inside each. So for connect I spent, you know this much time. And then, let's look at the output the rest of the output again I'm trimming here to just to look at the connect but for each of those functions printed in the previous output you have this type of output. So for each error number. So correct as an error number one one one. Connect as an error number of two and connect returns an error number of zero and how many time has been spent the standard deviation actually this is not the actual time but standard. So you can see that one one one was only at one time. So the deviation zero because it was only called one time. And this is how much time was spent for the instances where it was returned twice. You have this size. Sorry. Okay. Please mute yourself. A question. Okay. And here again returning zero error number zero. And this is, you know, how many times he got printed so here you can take a look, instead of reading off the number just look at the histograms and say, oh, this one was a bit odd. It was very outlier. So maybe something was going on there. And then you can zoom in and see. And then other part of the output as I said the number of bytes has been output here and you can, you know, basically just an example. So another little example with the trace the trace this end system call entry is back name so if the name of the executable and how many times. So the system calls system calls so the system calls were found 319 calls to system calls in general. Sorry, the probes that are entered not the number of times so many probes have been triggered. And that's a total number so the trace called the 979 system calls. You know, network manager calls under 28 system calls, you know, a way of showing things. So that is the trace. And then, and we're almost done folks and I know it's taking a long time but another last recent contribution is this be a BPF trace, which is an attempt of doing the trace on Linux although I would say, not exactly there yet but So it's a collection of script is not a tool by itself. It's, it's a wrapper uses BCC the previous Sorry, BPF compiler collection. And it's a bit of a higher level on top of that provides a bit of a higher level syntax similar attempt to be similar to the trace. And of course is uses be BPF. And there are also system, a set of scripts that do already stuff to start with you don't have to write it yourself. An example, this should have been read but okay. It's the same thing as the previous example. So here you have to tell the trace point roses call six center and then the count. If you sorry if you compare. And then here it gives you who calls and how many function calls were done. And if you look at the previous example the syntax was a little as a little simpler, simpler. So you monitor a system called the entry of all the system calls, and then because you're not specifying a name right so this is the system call entry of them, all of them. And I want to put the executive the executive the executable name of the process and count how many and put an array basically of all of this. And so here is a little bit more requires a little bit more knowledge of all the underlying stuff. So it's not exactly equivalent but almost. So, last slide. Other tools trace command. As I said, as a front end from F trace is a user space to you can find it in in it is maintained by Steve rusted. It works with kind of shark, which is written on top of that and cannot sharp is a cool and it will show you all this information in a bit less. I won't say painful but you know easier way. And then we have system tap, which I want to mention briefly even though I don't have slides on it but it's a neat tool that was started in 2005. And it's a little ironic in the sense that, you know, the history of tracing. So system tap. What does it do you write a program in a script in language similar to D and similar to C is a middle of the road. And it gets compiled into a C program, which, you know, sorry into into an executable from C with a lot to constraints. So that's treated as a kernel module that gets loaded. That's the way system tap started I think now it's changed a little bit. It was kind of, you know, bad idea, terrible, but at the end what does BPF do is very similar, right, you compile something and then executed in the kernel and this was basically the same idea. So after 10 years. So it is, it is kind of funny how things actually go full circle right with a little bit of variation but you know we're there. And so system tap is still available and I believe now it's actually implemented also on top of BPF one day BPF won't be good and then we will have to reimplement all of these tools again on top of whatever. The day shows up but you know it's good. It's good to see that people are thinking about new things and new ways of doing things and make it stable and so on. And then LTDNG is the other thing you can have it you can see LTDNG.org also still very widely used as a statistical analysis tool and so on. That is it and I appreciate sorry it took a long time. Any questions. Yes, I will, I will ask them in an order that makes sense Elena. Does rearchitected rearchitectured retrace use breakpoints. It uses K probes right. So K probes uses breakpoint or illegal instruction so yeah, but not directly right we set a K probe, or we set a you probe. Which then is implemented by doing XYZ which is you know setting a jumps if they're optimized or whatever. So yeah. It's all the same thing you want to get to a point in your program and inspect what the heck is going on there. But for a debugger when you're at the breakpoint everything is stopped and it's up to you when you take a look. But here no no no we don't want to stop we don't want to per term we just want to do that in and out continue. But the idea is similar. Yes. Next question is what is the advantage of detrace over system tap, ignoring support for OSS other than the next. Well, the syntax for the trace, I think is very simple and flexible. You can do a lot of manipulation of the data without having to invent a wheel. It's all there already number one. You can, you know, do instagrams you can do. I don't know a lot of other things. There are some examples. Actually, it is a blog entry that we have that shows some complicated examples that you can do really easily so that the advantage of the traces really the flexibility of the syntax and the fact that you don't have to do a lot of the stuff is done behind the scene all the stuff I showed you of how it's done you don't really have to even know about it, while with other tools, you have to go at a lower level of of operating all this. Yeah, lower level tools. Okay. So what is the advantage granularity these tracing tools, especially ebf ebpf can provide it again a lot. Because, you know, depending on how low you can go and how many variants of the various parameters in in specifying the programs right so yeah I didn't go too much detail in the bpf detail but you can specify the programs directly if you want, or you can use this PCC or examples. So yeah, I mean as an infrastructure is relatively flexible is very flexible. Great. It's, it's, it's all. I would say, can I say limited but not in a bad sense constrained by the verifier right there are some things they can do something you cannot do. You have itself. It's much bigger than this right you can trace. You can do stuff in the in the kernel. You can do tracing but you can do stuff with the networking. I mean if you look at it now. It's used in many different spots. You can now do instead of compiling your program, you can reuse compile once and execute many times as well Corey compile and execute. So you can do this with LLVM the problem. My, my, my point of view right I'm just seeing no type person. LVM, C lang, very complicated huge libraries. So now, again, my team is doing work to put all the bpf stuff into GCC so we have the bpf bpf. And we also have core core C O R E Corey that we're working on and proposing patches right now actually so we're trying to simplify things doing an equivalent behavior on GCC. The other part I didn't mention is this debugging information right so there is regular debugging information dwarf, right which is used by debuggers all over the place. And uses that as I said, load the debugging info before PM. But there are other levels of more compact debugging info one is CTF CTF was using Solaris and the trace to the CTF. And again, we are putting that into GCC. We hope that instead of dwarf, people can use CTF for production instead of using dwarf and dwarf will stay in there debugging for PM, but CTF is much smaller. So you can get at least a little bit of debugging info by adding it to your executable and put it back into the production RPM. So we're getting there again the GCC and the new tools the stuff has been contributed. The Jesus stuff is under review right now. We're writing the new version again. There is BTF, which is the derivative of CTF. And the other thing that's been used by some of these tools, especially LLVM and is a subset of CTF. So we are supporting that as well into the GCC to change. So I mean there is a lot a lot going on these days, and all this stuff is coming from all over the place where we're working on, you know, the BPF infrastructure, Facebook, with, you know, Alexey Starvoitov, groups are working on this other thing. Cilium is a company that was so there's a lot of activity. So now, while the K probes and stuff is now stable. Now this BPF stuff is keeps expanding and changing a little bit. So maybe in a few months, so we have something different in that area. We're coming up on time I think I will ask one last question and then we can see. Yeah, go from there. The same trace points and associated info are available via a variety of users based tools. Any advice about best practices for choosing tools to run monitors in production. Yes. So I mean if you want to do performance monitoring and figure out things like that use Perf, which is equivalent to O profile or whatever so that Perf stat gives you that area. So more ad hoc probing, then yes, use the other tools, I mean in the end they all use the same, the same mechanism so it's a little hard to say, you know, of course if you just output stuff and control stuff through echoing things into the little 3SFS files, that's the least stuff that you have to do right. It's a bit, it's more cumbersome. If you want, you know, multiple clauses predicates then something like the trace is good, because that allows you to specify more things from a higher level. It all depends, you know, a lot of these tools have to be run as route. That's the other thing that I didn't mention, but you saw the examples. They are run as route. So that might be another constraint that you might want to have. Yeah, I think there isn't a directive or a simple answer as to what to use what I think you might. What's your preference, what are you more familiar with, you know, you start learning a language and dialect kind of comes out of a certain using a certain subset of programs and commands so I don't know it's very personal. We have several questions, please. I don't know if they sent you email. Absolutely. Absolutely. Yeah. There are two emails, either the Gmail or the Oracle one up to you guys, which I want to use. Okay, wonderful. And as a reminder, those, those email addresses will be on Elena slides which will be posted to our Linux Foundation website later today, along with a recording of this presentation. Thank you so much to Elena for her time today and thank you to all of the participants who joined us. We hope that you will join us for future mentorship sessions. Thank you so much and have a wonderful day. Thank you.