 Welcome, everybody, to this session about the beginning and profiling tools in the Linux kernel. So why this session? It is in fact a consequence of many questions we have at the end of the moment. So it's a company that is about BSP adaptations to various hardware and BSP making, drivers development, optimizations, time, good time, power management, things like that. And we have a lot of questions from our customers that encounters problems saying, oh, okay, we have that problem. What can we use to debug the stuff, to profile? Okay, we have fast boot demos and to really profiling the boot times and things like that. What can we use for that? So all that, we take a little time to do a listing, in fact, of available kernel features for debugging, profiling, tracing, and finally do a listing of the tools that exist in the community and that use those features. So this, I think all of you know what is tracing the debugging, profiling about, but just what is important, in fact, it's to remember that when you do that, you will have impact on what you are tracing the debugging or profiling. So it's quite the high number principle when you see the stuff, you're not really seeing what is really doing in the normal world. So tracing, it's specialized use of logging in a debugging purpose and you will modify your source code so you will have impact on your runtime. Interactive debugging, you will have also impact, but it is binary modifications and you will need special software and when you provide, you are sampling your system so at each sample, you will have impacts on it. For the features, oops, sorry. Well, we cannot speak about tracing without speaking about print card. I think it's probably the most commonly used function in the kernel to do tracing. So its first purpose is logging, but it is a lot used to do the debugging. So when it logs, you don't use a circular log buffer, sorry, and so it can be called from any context. It has not logs or things like that, so you can use it also in any context to activate it in the configuration, just use config print card. You can, you will have its outputs on the console, on the current console of the system generally, but you can configure that in the proc file system and you can use the DMSH tool to have all the circular log buffer entries. And in its formats, it's like printf, in fact, you just don't have the float, because there is no float in the kernel, but you have specialized formats. Something you can do is see documentation in the kernel sources, in documentation print card formats, you will see that is a lot more of the formats commonly used. You can print buffers, things like that. Well, now print card is done. Debug FS, it's a great feature of the kernel, quite recent. And in fact, it's like proc or sys, procFS or sysfs, it's a specialized file system, RAM-BAS, and it is dedicated to kernel debugging. With proc, normally you have rules, you only put processes, information on the file system, since you will have all the kernel objects and the information related to these kernel objects. And you have rules for developers to put only one value per file and six like that. In the Debug FS, you can do whatever you want, just about an interface for the user-learned to have kernel debugging information. So to activate it on the configuration, Debug FS, you just have to search for Debug FS, find it, it's in kernel features. You can read more about it in documentation file system Debug FS in the kernel sources. And on most systems, because it is mounted in the sys file system under sys kernel debug. It is important Debug FS because it is used by a lot of other kernel debugging features and by a lot of tools. And has an interaction between an interface to configure traces and things like that. So we had Prinka, but Prinka has eye-overed if you don't protect it with branch or things like that. So it was common to protect with macros debugging and stuff like that. Now you have a feature, it's dynamic Debug that is able to activate or deactivate at runtime some kernel information code. This is general. Currently it is essentially pure Debug and dev Debug that are branched to dynamic debugging. And in fact, there is in the Debug FS file system, a specific file, so dynamic Debug control. And this interface lets you either have all the dynamic Debugs that you can activate or deactivate, so list them all. So you can just do this control file and you will have lines of formats, activatables and activatables per module, per function, per traces. And when you echo with a simple query language, in fact it's based on the simple query language, see dynamic Debug how to see all this language. And you can echo strings with, for example, the name of the module plus p, it will print all the traces corresponding to that module. It will activate those traces and you minus p it will deactivate the traces. So you can activate and deactivate specific traces in specific parts of the kernel. Great feature. So, but this is mostly about tracing, dynamic probes is a good enhancement since Prinka, but there is also more specialized things to do probes. Problem is, okay, what value has this variable, where am I in the kernel, things like that. And during heavens, okay, I'm passing from this, I am in this code flow, so I want heavens corresponding to that. There is two kinds of probes, static, dynamic ones. The static ones are more tracing oriented. And it's, well, Prinka is part of the static probes. It's the, how to say this, the less featured static probes. So generally they use branching and with the built-in expects of GCC. So with predicts of branch, it is the static probes when activated are unlikely branches. And they always have a low overhead because there is always the if and some variables. So even when not activated, the on or off stuff will be tested. Dynamic probes are more debug oriented. And it's more like break points, you break your original code to do other things. So when it's off, you have your original code, so no overhead. But when you do actually break and do extra things, you will have potentially a higher overhead. And you need for that to install it, you will need debug symbols and knowledge of your map, of your kernel mapping. But this is really general to be more specific in static probes in the kernel. There is, there was the kernel markers that were replaced by trust points and trust events. So what are, well, kernel makers was embedded the tracing code into the kernel, into the kernel code directly. So it has quite a high overhead and it was difficult to tracking the instrumentation code. So trust points were added to and were made to replace kernel makers. In order to separate instrumentation code from kernel code. What does that mean? When you want to use a trust point, you will declare it, you give it a name. With a specific macro, you will give it the prototype of the probe that will be associated to the trust point. And you will give the arguments names. You have a macro to define it on the implementation. And on the module that shows the trust points that want to trust things, you just have to call trust the name of the trust points you gave. The name given to the trust point. So there, when your module, when the code will reach the trust name function, it will put events when on, it will put tracing events. And you can, how to activate or deactivate these trust points, it's to associate a probe to the trust point. By using another module, for example, the common use of that is you make a little module that initialization will do a regressor of a probe. So the callback here to the specific trust point. And at that time, the trust point will be activated. And so at each time the code hits trust name, it will put an event and it will call the callback. In the callback, you can do print card, you can do whatever you want, associated to the trust points event. You unregister the trust points when you're unloading the module and it will off the trust points and trust name will have quite no overhead. So to activate it in the configuration, complete trust points. And you will have more information than that on documentation trust trust points. So for the trust points also, before 2637, there was, okay, there was for each trust points a branch. And a branch on a global variable, variable generally. So when activated, it was doing the, it was calling the callback. And when no activated, just the problem of the cache miss because of a lot of global variables. And essentially that problem and the branching stuff that do a little bit hover it. After 2637 to betterize the thing, there is a more complicated, more sophisticated way of doing. By when you're compiling your code, the trust, in fact, with specific linker stuff, it is replaced by no ops. So virtually you have no overhead on your trust points when they are off. And when you register and unregister your trust points, so the props on your trust points, it will replace dynamically. There is a table with the places where the name of the trust points and the place where replaced the no operations by actually the call to your callback. So when off, trust points has no overhead and when on, you have your trust. But there was still something to do. It's register and register and the second module stuff, making a little module to register, to register props on trust points at loading and register them at unloading. And so it was not easy to use. Trust events have been added for that. It's quite an event system for trust points by adding a ring buffer and a debug interface. So now when models declare trust points by the trust event macro in the root fs and through the debug fs. So in the debug fs tracing and there you will have a lot of files where you can catch all the available trust points. You can activate them, you can activate even per events. So directly on the console, you can manage your trust points and activate or deactivate them on the kernel. When a trust point is it, it will commit a binary buffer corresponding to the trust event in a ring buffer and tools like ftrace, lttng, perf will see them and the debug fs interface will be able to retrieve the entries in the ring buffer and then it will be formatted in string formats and single highlight. So with trust points and trust events, you have really a differentiation between the real kernel code and the code that is used to format, to instrument, to actually do the callbacks and things like that. So instrumentation code and kernel code, more easier to maintain and things like that. Card probes, great stuff but these are for the second part so the dynamic probes. This is not trust as you had to your code. There are events you can add in the binary code at runtime. It's not modification of the source code, it's modification of the binary image. So for that, it will use software breakpoints. We will see specifically how breakpoints works on the next slide. You can activate it with config card probes and there is more documentation on card probes in the documentation part of the kernel. In fact, when you want to register or unregister a card probe, it's like trust points, not trust even, but trust points. You generally need to do a little model that at its initialization will register the card probe and at its initialization will unregister the card probes. And the card probe is just a structure that will say, okay, I want to have an event on that symbol name of the kernel, potentially an offset in the symbol, so you have a function name with an offset in this function. And when you hit this point, and when this point will be hit, the function pre-handler you put in the structure will be called. The actual original code will be down and after the post handler will be called after that and it will return to the original place where you were just after the breakpoints. So on ARM, something interesting is that the actual implementation of the breakpoints will depend on their architecture, of course. And on X86, you will use intree or things like that. On ARM, you will use specific instruction that is guaranteed by ARM to be an illegal instruction. So in fact, you will do a fault. To manage these faults has a breakpoint. And then save the context, do your pre-handler, original instruction staff, post handler, and then restore the context and just come back just after the breakpoint. Now, more stuff, more for profiling purpose. There are the perf events in the kernel and it's mainly an API that will abstract all you can find on modern CPUs. So you have PMUs, you have on ARM it's on the co-processor of 15. So you have special registers that contours and performance contours and performance registers. You can manage to record events on the system and how much you missed your catch. How much you hit branch, things like that, the sequels of the CPUs. So generally on the CPUs, you have a limited number of those registers. But you can ask to the kernel, okay, I won't say you have four registers for that. You can ask 10 registers, 10 perf events. You will map each perf event to some countries and to the four countries you have, to the limited number. And it will manage, it will count in fact on the hardware countries in a run-robin fashion. And for so, say you want singles and cashmere and a lot of other things. You will have sometimes singles in some other time cashmere and on the registers, on the hardware registers. And the rest of the time, it will be statistic, it will leverage between two real measurements on the registers. Okay, it has also software, so perf also provides software events. So these are not using register specific hardware for that. You just use automakers in the code. Okay, so all these features are great, but we saw that you have to load models. You have to, some of them use an interface on WFS. It's not that easy to use every day. So what are the tools that make them a little bit easier to use? Well, we can start with Ftrass. So Ftrass is also something in the kernel that use features from the kernel. But it proposes a lot of various tracers, a lot of tracers are available. Time sources, it can use local CPU clocks. So clocks that are not guaranteed to be a coherent on all the CPUs, but that are quite accurate on one CPU. It can use the global monotonic clock that is less accurate. Of course, it can use also if you are not interested in timings, but in events, atomic conjures, the less overhead you can have. There is no locks. Global clocks have a lot of overheads. Global clocks have a lot of overheads. So with Ftrass, you can give a lot of filters. You can filter on function names, you can put generally on function names, on events. It supports also, so it will use tracers and it can use K-Probes also. And you can do all the boot tracing using command line, kernel command line. So you just have to, on the command line, you put Ftrass equals the name of your tracer. And the tracers will start as soon as possible for the, when the tracers point framework in fact is activity. So basically Ftrass will put some stuff that when you hit a function, you will be, not really a break point, but you will be, you will do extra thing that will log this entry. And there will be extra thing at the exit, that will log the exit of the function. And it will time, it will take timings of each function execution. So you have some examples here of tracers with Ftrass. So Ftrass is only based, the interface with Ftrass is only based on WFS. You don't have to have other tools with it. And you have again in WFS tracers and the files there, you will have the available tracers, the current tracers that you can activate, the filters. You will have tracer per tracer, a lot of information. You can activate the activate them when you want. So the function tracers, each function, you have a branch function tracers that will give you for each function trust the stack. Stack tracers, branch is for if and things like that, sorry. And the branch of the stack tracers for the stack of the calls. Hercules off is something interesting. It takes the last stack that gets to the biggest time passed with Hercules deactivated, for example. So you can have a lot of information there. And you have for Ftrass, because again, it's console based and when you have a lot of tracers, events, things like that, it's quite difficult to organize them and to analyze them. So you have some useful tools like trust, TMD and Kanochak. You can take them on Git. There is a Git repo for that. That will take your tracers and try to analyze and do second cell and graphic. UI to use them. LTDNG, it's a little bit like Ftrass, but there is like Ftrass with more UI and more functionalities. It can use trust points, card probes and couldn't like Ftrass. It is only available in models and LTDNG is the best. And seeing important with it, you can debug and trust in the kernel, but you can also do it on New Zealand. It provides libraries for you to be able to put trust points on your programs and things like that. And have complete tracers and probes and the beginnings from your application to deeply in the kernel. So here, an example of LTDNG with the Eclipse plugin. So in Eclipse now, you can download a lot of stuff from KGDB plugin to LTDNG plugin to trust point plugins. So you can have a lot of different UIs, especially for Linux debugging and trusting. The debugger is talking a little bit about KGDB. It's in fact an implementation of all the GDB API in the kernel. So you can use your toolshins, specialized GDB, so generally you have a GDB associated to your toolshins when you are cross compiling and things like that. You can use it with the KGDB stuff of the kernel because it's in fact an implementation like GDB server on the user land. It is an implementation of the GDB API in the kernel. Breakpoints, watchpoints, all the step by step is all quite architecture dependent. So there is a really big framework for KGDB in the kernel. And it will use when it can, hardware breakpoints, hardware watchpoints, things like that. It depends also on the stock you are using. For example, Cortex A15 has a good, is well supported, all the watchpoints stuff and things like that. All the co-processors that use the PMU and co-processor 14 that use the register for watchpoints and breakpoints are well supported. On Cortex A8, the last time I've seen it, there was no support. On Cortex A9, you have only one hardware watchpoint, for example. So it depends and it evaluates, of course, also. So you have to keep you informed on the architecture evolutions. So KGDB, like LTTNG, have an Eclipse plugin and you can do early, you can do early debugging with it also. So using KGDB Box, these are common lines, arguments. So KGDB wait, for example, we say to the kernel, OK, we will do a GDB session. So you will wait for the GDB connection before continuing, after having set up the GDB framework in the kernel. KGDB is like, a little bit like KGDB is the burger, but it does not use the same framework. It uses K-probs to do its stuff. And it has also traspunt supports and serial internet transports. I think the last time I checked, there is also an Eclipse plugin for it. So it's like KGDB, but not quite like KGDB. And it's not in mainline for now. For the profilers, well, there is Perf. So Perf, we saw that it is an API, but it's also tools. So there is two parts in Perf, and they are localized in the kernel sources. Under tools, Perf, you can cross compile there. So in fact, when you do make and you activate Perf, it will do the tools also. So like Ftrass, there is a lot of profilers. There is a lot of trust within Ftrass. In Perf, there is a lot of profilers. And you can sampling based on events. When you hit an event, you will start your sampling. And you can deactivate it through WFS. Like Ftrass, it has WFS interface. And you use Perf at the command line to activate events. So the best is to go in the kernel, see under tools and Perf. You see all those tools. And you have the Perf tool. And with the Perf tool, you can put, okay, I want to try. I want to have a profiling of cache misses with that program. So you have your user program, your launch. And I want to know how miss a hard, how cycles I've done in the CPUs, how things like that. So just Perf minus E, the profiling you want. And the heavens you want. And the command you want to profile. A word on a profile on x86 is different. But a profile on ARM, it uses Perf as its backend. And it's also profiling too. So here, an example with Perf tools. So here, a Perf top. So you can see how much you entered CPU idle, for example. A lot of times. Reports. So your reports when you saw the cycle. So function preferences. Well, quite obvious in fact, all these tools. Time shots to have an idea of the scheduler and the scheduler work with your program. So here it is where your CPU is running. There is deep dependency sleeping. So this is the scheduler image of what is going on your system. There is little tools with that that can use, for example, G-Prof 2. That can use Perf samples or samples with G-Prof. Or samples that can use a lot of different samples. To, in fact, and using dots to make a graph of what is going on. A profiling graph of what is going on your programs. From the user line to the kernel. From graph also. So this is more for the kernel. You can see. So the time taken by and with the stack, in fact, for each function. Another thing for profiling memory leaks and memory monitoring. So in the kernel, we speak about some of the features. The most important ones. When you go to kernel hacking, you will see that is a lot and a lot of stuff. There is K-Profs and Perf. There is trace points. There is all this and all that. But there is also slab info. For example, to have information on the use of your slabs. So the memory. You have code to debug spin locks or to debug locks. You have debug features for all that is lockless algorithms. Like R-Series, read copy updates, algorithms and things like that. Really, there is, I will say, dozens of profiling in the beginning to stuff in the kernel. So, and can only, for example, can use some slab info and things like that. And it is also in the kernel, the hacking part of the kernel. And it is like a valgrind for the kernel. It will valgrind when you use it with memcheck. It will collect all the calls to kernel and things like that. And when you hit K3, it will remove those calls and log at the end. Say you, okay, you have done that Camelock, that Vimalock and things like that. And you haven't freed it. So here, for example, reports of Camemlink. So you see you have the stack that goes to the allocation of your objects. Of the memory that you didn't free, free. And you have stack information, gfiz information, things like that. Like write-grinding. Write in the kernel. Some tools, quite important also. System tab. So it has, system tab is quite important in the kernel. It has support for all the stuff in the, all the features, debugging features of the kernel. So caprobes and can do light with debug with caprobes. It has support for transparent and for perf. So it's quite all in one tool that will use all the kernel features to try to give you information and debugging. It's like, in fact, when you want to compare it to something, in Solaris you have the detress functionality. You can do scripts and you will add whatever you want at whatever point in the kernel you want. Function call, things like that. It's the same in system tab using caprobes, trispoints, perf events, things like that. On your host, you will build, you will script some things like this, for example. So this, you have the script syntax. And here, for example, we will change the MTU when we, we get to the function tgzware getStacks in the module tgzware. So there, when the function is hit, it will call the function just we declared here and change the MTU at runtime. So this code will be inserted in your kernel by, in fact, a module that is built from your script. With the, with the tab that will parse your script will generate a source file. You will build for your kernel and when you will load your module, you will have the information. What's, it is useful not to have a deep knowledge of caprobes, trispoints, things like that. You can do it yourself or, of course, you can do your model yourself using register caprobes, trispoints, registering, caprobes, registering, things like that. System tab from just a script will do that for you. And you can, that way, manage to construct a script base that you can reuse after to make the beginning and profile. Some words on kexec. It's another functionality. It's quite different of the others. It is, in fact, kexec is, in fact, a second kernel. It's a specific call that when you, you can set up to when you, when your first kernel crash or has a oops or panics, it will reboot automatically on a new kernel and give you the old memory, access to the old memory in a health format from drug VM call. So on your new kernel, you will be able to debug the old kernel that crash, panic, things like that. And this is better done with the crash utility. That will, that is a utility that simplifies, in fact, the, the rate of course and things like that. Okay. So we've seen that there is a lot of features, lots of tools, in fact, in the kernel. What it is lacking today, it's more something like simplicity. Because you have a lot of tools, there are special offers things. You have the begin first interface, you have UIs, you have a lot of features. Since you pass from models and things like that. But the, there is no real, perhaps system tab, but even system tab, you have to put scripts and things like that. There is no really a tool that does all the stuff and simply does, does what you want. And it is like doing average ring and just, and correvency between and connecting Heavens to others. And something, the correct timelines and things like that. Sorry. Okay. Thank you. I think it's time. Thank you for your attention. And if you have any questions. The list overhead, I think it depends the list overhead. I think it's breakpoints and caprabs because you don't have it when it's not activated. And on caprabs, when you put a probe, when you associate a probe on a trust point or when you load your model and you register your caprabs, for example, there you will have the overhead because you associate and you activate either the breakpoints, either the trust points or things like that. So when you are basically, when nothing is associated to your trust points, your caprabs, whatever stuff, generally all is done to make overhead as little as possible. And caprabs, for example, has no overhead when they are not activated. When a probe or a function is associated to the probe, to the trust point, all the other things, there you will have the overhead of the branching or the breakpoints and the overhead of the call of your function. And the word, so it depends also of the probe or of the callback. If I just want to print. Print K, you will have always your overhead. If you, the least one is perhaps, well, for a print K, I will say a trust point. Because you have a low overhead for the branch and you don't have the overhead of the context saving, context restoring and things like that. It's just a callback. Yeah, it keeps the memories. Well, Hanks, Hanks, Hanks is more difficult than for the, for the panning is that, that reboot or the panic that I think it's more KGDB stuff oriented or more debugger. If it's a crash, it's obviously a debugger or things like that. Well, maybe just seeing what the stack press, but a debugger for crashes, for oops, for panics. It depends a little bit what makes panic. If you are, if you want a complete image of your kernel at the time of the panic, it's quite difficult. You can have, you can try with the debugger, perhaps with watch points, if possible, if you have an idea of where it is going. Well, you have timing problems in there. I don't have really the, the demisage buffer directly. It's a good point. The rest of the same. That's good. If I have this one. Yeah, exactly. You want to do that on the disk? Yeah. Yeah. But this is how it's gone.