 Hello, my name is Fiyo Yukiishi and today I will talk about dynamic tracing tools on ARM and AX6W platform. I'm working at Panasonic's automotive section in Japan as a Linux engineer, and recently I've been engaged in Linux debugging tools. And this talk aims to promote the dynamic tracing tools in embedded. I will start this talk with an introduction of the dynamic tracing capability and tools, including recent updates. And I also introduce a porting example of the dynamic tracing tools to AX64 platform. Okay, so I'm working at Panasonic and developing IVA products in vehicle infotainment products with Linux. Actually Panasonic is a tier one IVA supplier for various REMs and is also the largest IVA supplier in the world. And recently we are using Linux on ARM CPU in several IVA products. The current IVA system is getting much larger and much more complex. Nowadays IVA systems has really massive features such as navigation, voice recognition, multimedia streaming, and ripping, database networking connectivity and so on and so on. And to meet those wide requirements we are trying to use the latest hardware such as ARM V8 64-bit architecture, as known as AX64, and also a newer version of Linux like 4.x, which has the latest drivers for AX64. Still both of them are not in product yet, but will be in near future. And also we are trying to use hundreds of various open source components. But in result the software increases to the tens of million lines of unfamiliar source code. They are consisting of open source and in-house code, but sometimes in-house code is much harder to be understood than open source. And in such case one of the big issue is debugging. Tons of unknown issues occur when we are running such a big system and the debugging becomes very complex. In other words, we have to find out the cause of the issue from the hundreds of components or tens of million lines of codes. So we need an advanced approach for this program. How can we debug tens of million lines of unfamiliar source code? And the answer is dynamic tracing. So what's dynamic tracing? In short, it's a way of watching every function at any time. Even there are a few exceptions such as inline out functions, but we can watch almost every function. And it's not only in kernel, but also in user space. Some more details. The word dynamic means the capability to attach to a live system. And it also means no precaution is required, such as pre-editing source code. So it gives us a flexible and ad hoc way of analysis. And the one tracing means that two of the systems or programs behave. For example, you can monitor a particular function which you are interested in such as system code or scheduler or virtual memory or file system or networking or some user process or libraries or whatever. And you can also do some profiling such as interval sampling of the system stats. And also you can do a record of the function code history for the sake of roughly testing functions or exploring unfamiliar source code. And why now dynamic tracing in NVIDIA? The biggest reason is the powerful tracing capabilities that Winx 4.x has. We finally got the key features for the dynamic tracing in mainline kernel, such as K-profs for ARC64 or BPF. They are already mainline and U-profs for ARC64, it is almost way too much. Another reason is tools. The dynamic tracing tools are remarkably improving in past two or three years by supporting latest technologies such as BPF. And what are the advantages? First, you can gain advanced observability. You can break down the issues quickly and clearly. Even it's a kind of complicated issues or unknown issues. And second is the less installation cost. There are no need to change the product code for dynamic tracing. It's really efficient because we have to analyze tons of codes. So let me briefly introduce what we can do with dynamic tracing. On Winx system, there are many observability tools. And most of them are kind of traditional tools, such as TOP or PS or BMSTAT or FREE or STRACE or ASOF or many other tools. And most of these tools are designed for single purpose. In other words, for analyzing particular subsystems in kernel. By contrast, dynamic tracing tools can trace and observe the whole system. And it's not only in kernel and also in uter space. And those dynamic tracing tools work thanks to the various tracing frameworks in Linux, such as TracePoints and KProbes and UProbes and FTrace and PerthEvents and BPF. All of them, other than BPF, are not so new features. But BPF is pretty new and is becoming the most important feature in dynamic tracing. Because it reduces the overhead in dynamic tracing. BPF can provide a way of processing trace data inside of the kernel. So we can summarize the trace data before dumping them to uter space. This is really efficient because the data transfer can be a large overhead. And reducing the overhead, BPF makes many things practical, which were previously impossible. And I also have to explain that there are some types of dynamic tracing tools. First is program of multi-tool. And second, not program of multi-tool. And third, single purpose tools. The first one provides a program of framework for tracing so we can write our own tool with these two. And also they have some bundled script as individual analyzing tools. And second one is a kind of all-in-one tracing tool. It can a lot of tracing things due to the command line arguments. And the third one is just a group of individual analyzing tools. So let's see some examples of dynamic tracing tools. This is a kind of single purpose tool. It contains about a dozen of commands. It's not so many, but it covers the minimum subsystems for performance analysis. And this is an example of BCC. It's a kind of program of multi-tool. It contains really many commands and can approach to many subsystems to achieve the advanced analysis. So let's see some practical examples of dynamic tracing usage with some commands of BCC and path tools. OpenSNEP traces the open system calls and prints all the file-open events in system at real-time. In this example, there are some building process on the other terminal. OpenSNEP is tracing the file-opens by BCC and Configure and many other commands. This tool is useful for analyzing file-related issues such as issues with config files or dynamic libraries. ExecSNEP traces new processes beyond execution scores. It's similar to OpenSNEP. And this example also shows some building process. And this tool is useful for analyzing kind of short-leaming process like this. So you can easily find out which command execution was failed. BioSNEP traces the block-device IO with PID and latency. And there are some more information like sector location and size. This tool is useful for analyzing storage issues such as performance issues or issues of the drivers. FuncGraph, this prints the kernel function call graph and durations. This example shows that in this example, it's tracing the child function calls of do-sys-open with limiting the max depth to 3. It's really useful not only for troubleshooting but also for understanding how kernel function works. FuncGraph, this counts the kernel and user function calls. In this example, it's tracing the functions of bash which name is started with set. You can easily find which function is most frequently called. And it's useful for reducing CPU consumption. Trace, this commands trace arbitrary functions with filters. Example on the upper side is tracing the C3 function in kernel and filtering the result with its arguments value. And example on lower side is tracing the read line function in bash and printing the return value. This tool is useful for any kind of other Hoke analysis. And frame graph shows the function stack history of the system wide. You can easily find out which function is time consuming and you can roughly check the sanity of functions. And there are many other useful tools with dynamic tracing. If you were interested, I recommend it to see the BCC's project page. So next, I will introduce a summary and recent updates of dynamic tracing tools. The first one is PerfTools. It's a kind of single purpose observability tools. And actually this project seems to be closed. Most of the tools are ported to BCC by the author. But this tool is still useful because of its few dependencies. If you can't get worked the other dynamic tracing tools and because of the dependencies, I think it's worth to try PerfTools. But first, it has direct dependency on the Intel architecture in each script. So you have to fix each script one by one to port them. And also, the scripts are less customizable. And this is an example of the script of PerfTools. And this is a source code of OpenSnoop. It's a bash script and it's a bit long, so it's truncated here. But what this script is doing is creating a probe point to sysOpen function by using kprobes interface and printing the arguments and filtering out the result with org script. The second one is SystemTap. SystemTap is really powerful programmable much too with nice language. Its language is high-level and less arch-dependent and easy to understand. And also, SystemTap is ready for AAC64 basically. But there are several cons. First, it takes a long time before running a script because it runs its script with building them into a kernel loadable module. And also, SystemTap is relatively unsafe. I mean, SystemTap sometimes calls the kernel panic. It's also because SystemTap runs the user script as a kernel module. But there are some people who is working for the SystemTap's BPF support and both cons might be fixed with BPF support. So I'm really expecting for the release. And this is an example of SystemTap's script and it's also the source code of OpenSync. It's really simple, but it can do almost the same thing as PathTools 1. The next is BCC. BCC is a promising BPF front-end tool and is under really active development. Its biggest pro is low overhead and extra capabilities thanks to BPF. And also, it supports various front-ends such as Python, or Lua, or Go, or C++. And also, it contains very many useful scripts. But the count of BCC is lower-level language. To write a BCC script, you have to write the BPF program in stricted C code. So script gets bit wrong. And also, the BCC script has direct dependency on interact texture. This is the same problem as PathTools. This is an example of BCC script. And it's doing a counting the BPF function to Incarnal. The left side is a BPF program and doing Incarnal summarizing such as a countup. And the other is Python script and doing creating a pro point and loading BPF program and doing some exception handling and printing results. And next is Pry. Pry is upcoming BPF front-end tool. And so far, it's only one tool with high-level language and BPF support. But Pry is still in the beta version and it's relatively unsellable. And also, its development seems relatively slow because of the single user. And also, AX64 is not supported yet. So it seems fairly far from the practical use, especially on AX64 platform. But still, it's interesting too. This is an example of Pry's script and its output. It is doing system-wide counting of the old system course. This is a really modern language and seems easy to write. The last one is Perf. Perf is performance analyzing tool and is in Linux Source 3. Its pros is, first, it's reliable and ready to use on many architecture because it's in tree and it is heavily maintained. And also, it has advanced capabilities such as CPU statistics by PMU. Although it's a kind of tricking. And also, BPF already has BPF support, although it's a pretty tricky tool. And the cons, first, Perf is not much programmable. And second, Perf requires relatively much keystrokes other than other tools. So totally, I feel that Perf is relatively tricky to master. But it depends on your preference. And this is summary of tools. Please note, this is just my personal opinion and means no special recommendation. And also, please note, there are many other useful dynamic tracing tools other than this. So far, VCC seems to be a good choice because it has a programmable interface and it supports BPF completely. But the other tools have other advantages, so you can make a choice of tools according to your use case. So the last section is porting. In this section, I will introduce an AX64 porting example of VCC. Actually, I'm also trying to port several other tools than VCC. And currently, Perf tools are partially available. And also, all patches are put on my Gitfab and I'm going to update them continuously. So this is my environment for porting. The hardware I used is the Salvatore X-Bow, which is the reference board of the RuneSense Alka Gen3 SoC and is integrated in AX64. And I also used the RuneSense, maintained by RuneSense, and its version is 4.9. And we need some more extra kernel patches because Uproves for AX64 is not mainline yet. And the patches are developed by Pratesh Anand at Red Hat. And this is the whole kernel configs I added for porting. Please note, I used the common configs for every dynamic tracing thing, so there might be the unnecessary configs for VCC. And the other environment is this. The only one thing to note is material environment. The user land I used, which is based on Yocto 2.x, the libraries are installed to live64 directory, such as SlashLive64 or SlashUserSlashLive64. This is a bit unusual thing for OSS open sources, so it might cause a problem during building. And this is the version of VCC and NorthWarsie dependencies. VCC uses LVM and C-Lang for compiling the VCC bytecode. And Python is optional, but strongly recommended. This is porting steps. Please note, there are some temporary workaround so far, and I'm working hard to correct them. And versions of LVM and C-Lang is this. The first step is cross compiling LVM and C-Lang. This is an instruction for checkout. And please note that dynamic means commands on the host PC and sharp means target book. And this is the build instructions. Here we are using the CMake for build configuration, and there are many options for cross compiling, such as sysroot path or live64 directory or GCC as compiler and so on. And there are also small tips. If you are trying to build LVM and C-Lang for the first time, I will recommend not to use build parallelization because it might cause an umbrella. The linking phase consumes really huge memory. And the next step is cross compiling BCC. This is almost the same as LVM and C-Lang, but I had a first program here during BCC's cross-compilation. And it says some error occurs during finding libraries and path doesn't point to live64 directory. So this is a kind of lack of multi-live support in BCC's CMake file. So this is the patch for this program. You also need to add this option for enabling this set property. And now BCC's cross-compilation has succeeded, but we had another program running the Hello World of BCC. And it's saying interpreter has not been linked in. This is caused by the lack of the BPF module in C-Lang. And to fix it, we need to fix two points. The first is it has not built on the LVM and C-Lang's compilation. And also BCC doesn't try to link it even if it will exist. So the fix of LVM and C-Lang side is removing LVM target arch and LVM target to build option. It seems to be really necessary, but in this case it's obstructing because these options make the BPF module not to be built. And cross-compilation still works well because of this another option. And fix in the BCC side is a bit difficult. The module, something about the module detection seems to be wrong in the BCC's CMake file, but so far I couldn't fix it correctly. So this patch is kind of the quick hack and it forces BCC binaries to link all LVM modules. And so BCC binaries becomes needlessly huge. So I'm working to fix it soon. We have another program with running scripts. It's saying that inline assembly is not supported. This is caused during the building process of the BPF programs from the SQL. And it's not a bug, it's a kind of specification. So we need to remove the inline assembly code. But where is the inline assembly code? I cannot find it. Fortunately, there are persons who are suffering with the same problem. And I found that inline assembly code exists in the kernel header. And this is the header. The assembly code is here. This is only the definition of the macro with assembly code. And it's never used during compiling BCC bytecode. So it can be removed with a compilation switch. It's a really ugly hack, but it's effective. And to be correct, it should be fixed in BCC or maybe in C-rank to filtering out our assembly code before building BPF bytecode. So I will try it. And now BCC works fine. And the Hello World program and several scripts worked fine. But unfortunately, the open-snip script doesn't work well. So some scripts require extra fix in each script, as I mentioned before. So we need to fix some arch dependencies. And in case of open-snip, we need to fix the probe point mismatch. Because AX64 kernel uses the sys-open-art function instead of sys-open. So this is the patch. It's slightly long but simple. It's detecting the architecture with uname minus m. Uname minus m. And if the arch name starts with AX64, switching the target function to sys-open-art. So finally, open-snip works fine. And I'm trying to port the other two as same as open-snip. And summary of my talk. I really hope that if you try some dynamic tracing tools after this talk, there are good selection of tools such as BCC and SystemTap and PathTools and many other tools. And I'm really looking forward to the BPF front-end tools with high-level language. It might be BCC or Powerful Extension or SystemTap's BPF support or Ply or the other two. And the future work, I will correct those work runs and also try to port much more scripts and do testing. And also I'm preparing to contributions. And maybe I will try to port the tools to RINX-V 3.x and RINX-V 7. And this is references. I want to say many thanks to Brandon Gregg at Netflix. He's a reading engineer of dynamic tracing and performance analysis. So that's my talk. Thank you. Any questions? Where? How can we stay up to date? Oh, yeah. I'm just thinking about it. Please see this account. Please forward this account, yeah. And I'm going to join some remaining like BCC or something. Not yet, not yet, yeah. Have you seen anything in your travels that speak to detecting memory leaks inside the kernel, for example? Sort of. Do any of these technologies or techniques allow you to detect memory leaks inside the kernel? Oh, yeah, yeah, yeah. I might have the example. No, no, no. There are two named memory in BCC project and it detects all Maloch code and free code and detects all memory leaks. So you can try it. Actually, I'm not trying it yet. You have this list of tools. Some are secure and some are not secure. What kind of secure do you talk about? As far as I understand, with BPF, you load code into the kernel. So I guess it's inherently kind of unsecure when you do that. Yes, yes, yes. You're right. It's a future work for me. I'm not so concerned about security so far. So it's future work. Any other questions? So thank you very much.