 Today I'm going to show you how to trace in Linux with Ftrace. I'm Derek. I work for Deepin. Deepin is a Debian-based Linux distribution. It's an element and I think user-friendly. Maybe we should try it. It's my agenda. I will introduce you to what Ftrace can do for you. We have case study. Ftrace is a tracing framework in the Linux kernel. It's developed by Steven Rusty. It was released in 2008. It's quite mature now. It's a mascot designed by... Sorry, I forgot my name. Sorry. It's copied from Brandon Gray's blog. Ftrace is a... You can treat it as a logger. It can log almost everything occurred in the Linux system from kernel to user space. So is it the mainline? Yes. Very mature and stable. You can use it in your production environment without any unstable issue or crash your kernel. It's a tracing mechanism I prefer to use. If you are using something like a system tab, you write your script and it can be able to see and generate a native Linux module and inject your kernel. You have to ensure that the binary and your kernel match. Or the generated code is set for your kernel. For example, if you need to make your kernel crash something like that. Ftrace is very efficient because the log is occurred in a kernel. Everything is in kernel without user space involved. It logs every event in the read buffer. So if you run out of space, the old log line will be removed. And how can you use that log? You just use cat to a file or with, for example, trace command. It's a new command also developed by Steven Rust. So you can parse it or with kernel shock, you can visualize it. The event source is from four kinds of mechanisms. One is the predefined static trace point which is defined by kernel developer in every subsystem. And grouped by the same name. And second one is mCount. mCount is a function. If you compile your code with GCC and dash pg parameter, it will insert a function code to the mCount file in the entry point of all your files. So you can provide a C++ program. So the kernel developer uses that mechanism to make every function tracing can be done. And K-probe, U-probe is more advanced dynamic probing mechanism. You can insert probe point at anywhere in the kernel. But U-probe is for the user space application. So once the probe point, once the CPU runs through the probe point, you will emit, invent and you can collect parameter, global variable, value from register, something like that. It's very powerful and I think it's material enough with these two. At first, the second one, U-probe. You insert your probe point into application without instruction aligned to your CPU. It will make an infinity loop or six-fold. Those are the information we can collect while the event occurs. For example, the time of a function called or xc in every entry and the xc of a function. So the function is what kind of function? Every function. The kernel with user space? Yes. Okay, with encountering the kernel function, that's almost every function in kernel. So if you enable all the trace point, this kind of trace point, the rebuffer will be flooded as a lot of events go through. If you're cat, no. But user land, if I'm librarying the user land, I can use this? Yes. You can use it with U-probe. And a lot of situations, the problem is you have to trace from user space deep into your kernel to figure out what part of kernel is, for example, lower your throughput or increase the latency. You can observe that with trace. It is what log9 looks like. The command and the TRD, which is CPU, the event occur, and time stamp, and which function is called, just I extract only one line. But there's a lot. So you can check your kernel config with grep to see if it's enabled. But nowadays, almost every distribution is enabled. So you don't have to build your own kernel. I think that one is in the build system. So how do I know whether my system at free system is enabled? The system I use is deep in the kernel image, installing boot and system map and config also in this directory. So it's not developing development environment. It's a production environment. You can find it. So in production environment with ftrace, you don't have to install tools to help you work with ftrace. You use ftrace with batch script is enough. ftrace is supposed to interface through the debug fs. So you first, you CD to the kernel debug tracing directory and echo the name of the event to set event this file. And echo 1 to enable the tracing. So the kernel will do the job for you. You know user space involved. And in the same directory, you can find all the events listed in the available event file. All the event list can be found in this file. And group with prefix with its group name and event name. So you can use group to find the event you want. Then echo to the set event. If you have multiple events, want to trace, you just append to this file. And the cat can use to check what events we are tracing. If you want to trace all the system code, just echo the group name syscores to this file. So you can echo the event name or you can echo the group name. If you echo the group name, all the events in this group will add to this file. Just like this kind of tracing, just like ftrace. But more efficient. But since ftrace won't resolve the parameter type for you. So for example, if a parameter is a string, it just print the address for you. Not a string content. So we enable the tracing and the cat trace file. You can find all the trace log in it. And zero to turn off. And echo nothing to trace to clean the remover. And echo nothing to set event to clean the, to remove all the tracing there. So it, here is the case study. We, there's a user process which blocks the syscending shutdown procedure. Until the, until system tie, tie it up and kill it. Since the system is in shutdown stage, we, it's hard to execute another program to monitor the system. Or this kind of process will be killed. And because a user session is composed by a lot of user process. So you cannot, you cannot just check this line to see which process is blocking the system. So either you can write some plugin to syscending to log the information which process is tied up. Or you can trace it with a trace. Because kernel do that for you. So we need to collect some information to identify the process. Which process block the shutdown. So maybe signal, who send signal, when send signal, and who receive it. And when the process exit, just, just, just this kind information I think is enough. To identify why the user session tied down. And why, which process is blocking the shutdown. So then we create a script in the lib systemd, system shutdown. This script is named debug by SH. This script will be invoked by syscending. Before it turn, turn off your, or reboot your system. And now here we remount root and cut the remover to a file. Then the shutdown dialog at root directory. And the change add as a cuteable permission to it. Then we enable a trace. Just be simple. You echo your related system call to the setting band. And this one is signal deliver. When the kernel deliver the signal to a particular process. This one will be, will be emit. And this one is scheduler. This one is emit by scheduler when the process, when the process exit. So this full information, this full event. This full kinds of information can tell us how to identify the blocking process. Then we change the clock to the trace clock to ftrace. Since if you have multiple CPU, the clock will not sync. Because that will be not so efficient. So default one is CPU local clock. Since we want to make sure the time stamp is consistent and can be sold. So we use the global trace clock. So the time stamp will be consistent among the CPU. And then we enlarge the trace buffer to 40 megabytes. So can store a lot of trace line, trace log. And then we use the, it's called a tracer. An NOP tracer is a simple one. It's in the ftrace. There are a lot of tracers you can use to generate different kinds of invents and information. But I didn't mention it earlier. We can talk about it later. And we clean the ring buffer and start the tracing. Before we shut our system down. And after we reboot our system, we can check our log. And the log is not so user human friendly. So I process it with a simple batch script. Now you can sort and find which process is blocking the system. You can see here the time stamp is about 5 seconds. And the next time this one got a signal, it's a number 9. It means 6 kills. So this process is make the shutdown process tie down and being killed by the system D. So the boot issue, you want to increase the boot, reduce the boot time, you can use ftrace also. Because you can enable ftrace at the kernel command line. So you enable it and trace, for example, which process opens which file and spends how many times on it. And how many process fork and execute something like that. So you can collect those information and visualize it. So you can identify why your boot time is so long. So I do think ftrace is amazing. Although it's not so efficient to process all these logs at user space. Those are the information you may be interested in. Any questions? So let's say there's something, some problem that's causing the kernel or the system to lock up. Okay, system freeze, or maybe there's a hardware for the machine check error which will lead to system lock up. So is it in such situation, is it possible to use ftrace? Yes, you can put a kernel command line parameter to tell the kernel, use ftrace to dump all the core frame. With ftrace, this core frame can be very clear. And with human readable names not seem just a trace. Yeah, but the system has crashed basically, it's a lockout. It's a hard lockout. So how are you supposed to dump this? Because what I understand is that everything is in a ring buffer. Which means it's a memory if the system has stopped. Oh okay, it dumps on your screen. Or you can try the net console. If the system has locked up, it can still, oh it's dumping light. Yes, yes. But let's say you're in a situation whereby you don't know when it's going to crash. Yes, so that's why we told the kernel use ftrace to dump that. You don't know when the system will, for example, panic. So okay, let's put it this way. To read it, you need to dump this. You need to write this right? If the system is okay, you dump the lock from this file, from the trace file. But if the system panic, it will dump to the console. If there's no panic? Yes. So panic is, the programmer knows there's a problem to a panic. But if you have a real problem, the whole thing might just freeze. There's no chance of even going to a panic. The whole system just locks hard. Okay, you can use the... Because normally for this situation, you do a kernel problem. So I was wondering whether you can use this without doing a kernel problem? If your kernel can accept, I don't know, I forgot the name. LTE, print screen, you know that key. You can LTE, print screen, and... This control key? Yes. And make a kernel to dump the lock. Okay, but the speed there means it's going to run like this, which means you can't read, right? Yes, so that's the problem. I think the net console may be more convenient for this. What's the meaning of the net console? Okay, the console which is not run on your local VGA or serial port is output through network. That would be the serial port, you're talking about the serial port? Not the serial port or UART, the Ethernet. Ethernet? Okay, so that one is hardware thing, right? It's basically the VMC. Your net console, is it from the VMC? The net console is in support by Linux kernel. It's in the Linux? Yes. You can... I never use that. I don't even care that such a thing. Okay, I'll show you. But if the thing is locked up, the net console will be as dead as a kernel. So it's a chance to get the dump, but... So that means you can lock somewhere, right? You're saying that you're reading through a network so you can do a continuous dump through your remote system? Yes. That's a simple one. So I think maybe you will work while your system is not so dead. Okay? And if you are interested with... For example, with K-Prop, you can insert pro-point dynamically at the user space. For example, you want to check your MySQL why my query took so long. You can insert pro-point at the start of the query and it can extract, for example, the statement from parameter and print out. And you can insert another pro-point to collect the exit time. So you can measure the average time of every statement executed. That's a very interesting one. And then you don't need to install another... In the production environment, you make things clean and small to make sure the security is not being compromised. So if you want to trace your system and don't want to install another tool, I think ftrace is great for you. So ftrace will slow down the system? I think it depends. It's a high efficient, but if you enable mass amount of events, it will slow down your system. For example, you enable the... You try to log every single system code that will slow down your system. I will show you. For example, we can use the function tracer to trace every single function code in the kernel. The available tracer you can find in this file. It's a block device tracer, memory IO tracer, function graph tracer, function tracer, and the NAP tracer. So we can echo function to current tracer. And echo 1, 2 tracing up and see what we got. So how many current tracers can you enable at any one time? No, just start of the command or return from command. No, I mean the echo function redirect to current tracer. So can you enable more than just function? Just one, just one. Any one time just one? Yes, because it will effect the output for me. I will show you. It's all function code. The prior is the calling. The latter is the caller. You can get trace pipe. Trace pipe, the content in the trace pipe and trace is the same. Trace pipe will block your output if the content has been read out. And if event or log add to the green buffer, the cat will continue to print out the events. My CPU is so busy now. Because every single function code will be logged in this situation. So it's not good for the production environment. It will reduce your throughput of your system. So I stop the tracing. And we can change to function graph to current tracer. And enable tracing and see what's different from function tracer. You can see it's more human friendly. The call hierarchy and the time spent on that function code. You can use it to trace a lot of things. But if you don't want to slow your system down, you can add some filter to the ftrace. For example, you want to trace only the scheduler part. Just echo, schedule, get, start to the set trace filter. So now you can see only functions, start with skate, skate, skate. So the CPU won't be so busy like before. It's the time system spent on this function code in microsecond. It's very precise. So you can use it to measure your kernel to see. For example, you want to trace your kernel to see why it slows down your boot time. It's a way to do it. And with Uprobe, you can trace your user space application. Are you interested in this? If not, maybe you can see the document from internet. Do you want me to show that? Thank you.