 My name is P3, working at Hyundai Motor Company. In this talk, I'm going to introduce Guider, unified runtime performance analyzer, and explain how to trace our Linux system using it. This is a summary for my talk. First, talking about performance analysis and optimization. Next, introducing Guider and its useful features. Next, explaining how to trace our Linux system using Guider. Finally, showing demo how to use Guider. Okay, let's talk about performance factors. First of all, the major performance factor is CPU. There are many reasons make your system slow, such like CPU intensive job, frequent context switching, busy way task, and so on. If your system is slowing down, the first thing to do is watching total CPU usage and which tasks are using CPU cores. Memory is also frequent, important. Frequent memory allocation and the allocation jobs will consume CPU more than expected. An efficient allocation and missing the allocation, such like leaks, can cause the out of memory. The system will be slow seriously. In worst case, it will restart finally. To free up memory, Linux kernel tried to flush file caches, swap pages out, it called Reclaim. Once Reclaim start, the system will slowly start to slow down. Next one is IO. Generally, block device is the slowest in our system. So optimization, such like caching, preloading, compression, workload tuning is required. Especially, unnecessary IO operations should be removed and contiguous operations must be merged. Last one is for communications. Lock, it's very important to prevent data corruption shared between multiple tasks, but it can have a huge impact on performance. Excessive lock contention increases CPU usage and also response time. Moreover, performance can be worse depending on lock attributes, such like priority and hurdy-tons protocol, easy way, wake up strong in few texts. IPC, in Moldon system, all services generate remote procedure call using system bus, such like D-Bus, to operate in complex relationship with each other. But this IPC may cause system overhead, such like context switching, serialization, memory copy. In particular, broadcasting calls greatly increases the system load and response time. Next one is IRQ interrupt, especially software interrupts called Barum Hub can affect system response time. Network drivers are typical. So it has many performance tuning options between CPU usage and response time. It's about trade-off. In addition to these factors I described, there are many other performance factors. Most importantly, we need to be able to recognize each performance factor and measure its actual impact. Yes, we need to think about how to measure them. The narrowing down approach is very effective for root cause analysis when debugging and profiling system. From the system level to the instruction level, the following four steps are repeated at various level until the root cause is found. Placifying problem, measuring factors, modifying arguments, verifying results. And finally, we should be able to pinpoint the root cause of the performance problem to go home. But it's very time-consuming job and sometimes it seems to have no end inside. So how can we do this easily? Logging and using tools are the most effective way to analyze performance. Logging is very useful for recording specific information, but understanding a log requires domain-specific knowledge, so system-level engineers or new members are difficult to understand them usually. In addition, adding new logs require source code and tool chain for rebuild. It's very boring and time-consuming. It's also difficult to record and analyze too many logs because of the limitation for memory and our time. So we prefer to use performance tools. It's very comfortable and effective to analyze performance and system level. Some nice tools doesn't even require rebuilding target program and installing itself with dependent packages, restricting target tasks, but sometimes too many tools confuse us. Determining the right tool from a variety of performance issues is not easy. So I introduce either unified runtime performance analyzer. It can monitor, profile, trace, visualize, various performance factors. Monitoring features provide continuous performance stats every interval in real time. Profiling features provide a static overview of corrected data during a specific interval. Tracing features provide the specific data on the execution of the task in the form of logs. Guider is a kind of CLI tool, so it offers a lot of features by the combination of commands and options. But in this term, I'll try to explain only some useful features and tracing features about it because of time limitation. It's open source program and written in Python. It doesn't require installation, but PIP and OE, Yachto recipe also supported for your convenience. Actually, just executing guider.py file is enough. Guider never used external binaries such like executable programs, libraries, Python packages, except for Matplot library for some kind of visualization features. Most of guider features implemented directly using standard libraries such like Lipsy. That's the reason why guider doesn't require rebuilt install configuration. In addition, it can be applied with only one megabyte of storage space. These characteristics are very attractive in embedded systems. All features of guider are supported in Linux and Android, but it also provides some limited features on iOS and Windows. From now, let me introduce some curing features of guider. First one is monitoring system resources in real time. This feature works by periodically updating stats for system resource and events. System resource is about CPU, memory, swap, block, network, storage. As shown in the picture in the first part, system resource information is shown on the top line such as the number of core RAM swap. Additional system information such as context switching, interrupt, running tasks, memory zone, and performance stats using PNU are also printed. In the second part, important system library resources and events are printed. System stats such as CPU uses, available memory, swap uses, memory reclaim, block IU, network IU are most precious information for performance analysis. In addition, core usage is also printed, although not shown in the picture. Governor clock temperature for each core can be shown together using specific options. In the third part, storage information about busy workload or valuable space is shown for each device. Heavy storage workload can cause serious performance degradation. That's the reason why we check those stats. In the upper part of the picture, network information about inbound and outbound is shown for each device. In the lower part of the picture, not only system resources, but also task resources are shown with their attributes in real time. It's a little bit similar to RMS type command. Usages for CPU, virtual, physical, shared memory, swap, block IU, and memory details are printed well. The shown tasks are sorted by CPU usage by default, but you can change the third order using a specific option. The task filter is also available to show only specific tasks. All or specific function cores are monitored for specific tasks in real time. In addition, stats about function cores are printed such as average, minimum, maximum time. At this picture, all function cores are shown with backtrace. That usage is not about CPU, it's the proportion for the total function cores. So this feature is useful when finding frequent cores in a specific function core count, including backtrace. Of course, there is another function monitoring feature to measure CPU-intensive function cores by sampling techniques. The task filter and function filter are also supported. All systems including backtrace are also monitored for specific tasks in real time. In addition, Cisco states are also shown such as elapsed time, error return together. This feature is very useful when finding Cisco's that take a long time measuring specific Cisco count checking Cisco error returns. The task filter and Cisco filter are also supported. All open files, sockets, types are monitored for each process in real time. Files and printed with position and open flag. TCP and UT sockets are printed with binding and connection status. UNIX domain sockets are also printed with the file paths like this. This kind of information is very precious when debugging issues or performance tracing. The process filter and the file filter are also available. By using the file filter, monitoring all processes that open specific files or binded specific sockets is possible. Previous monitoring features are for checking current status, but if someone wants to see a summary of system changes for a long time, the profiling features can be a good solution. As shown in the picture in the top table, changes for system resources and events for each interval. CPU uses available memory, block IO, memory claim size, running task, network uses are summarized in each line for each interval. Because of screen lens limitation, some fields were truncated. In the middle table changes for storage uses displayed with total summary. There was no storage operation in the profiling time. Busy workload size available space are summarized for each interval for each device. In the bottom table, changes for network uses are printed with total summary in the red box. Workload for inbound and outbound is summarized for each time interval for each device. Similar form. Next profiling features for tasks. In the first table, changes for per process CPU uses are shown with task attributes and total summary for each interval. Total summary information in the red box represents CPU uses such as minimum, average, maximum, and total for each task and pool system. In the second table, changes for per process virtual memory uses are also printed with task attributes and total summary for each interval. The overall format is similar to the CPU table above. Although not shown in this picture, various types of types are reported together such like scheduling delay, physical memory, block IU, C-group uses. By using these pictures, measuring and comparing resource uses are possible for various test cases. Text-based analysis is specific but less readable. That's why Guider provides visualization features in SVG format. Using the SVG format out in your web browser, it provides easy to view and responsive interface. First visualization feature is a resource graph. As shown in the picture, the top box shows graphs of CPU uses for processes. The box on the right side is the label list for the CPU graphs. The middle box shows graphs of block and network IU for the whole system. The bottom box shows graphs of memory for the whole system. Of course, process graphs about block, network, memory resources are also variable. In addition, filter option for all of them is also supported. As you can see, this visualization feature makes it easy to understand big data corrected for a long time. It also helps to understand trends in resource uses for a long time. This is also good for communication with other people. The next slide is about scheduling. The scheduling data is very large and very difficult to analyze one by one. Therefore, as shown in the picture, scheduling data such as time slice, preemption, and blocking should be visualized prior to detailed analysis. As a speech format output in your web browser, you can view details for time slices. It's very difficult, it's very effective for analyzing multi-threaded programs, interactive services, the right tasks, and co-utilization. This feature also allows for scheduling events as well as other custom events having time steps for start and end. The last visualization feature is about course text. Analyzing only last code functions without full course text is difficult because standard functions such like read, write can be called any other functions. Above all, in most cases, last called functions won't cause all the problems. The problem is like with some other functions that called those last functions. Therefore, to analyze performance problems in function level, we need to be able to see including each course text. In this case, this frame graph feature is very useful to analyze course text-based profiling results for CPUCs, blocking status, memory leak, Cisco trigger, function call. As shown in the picture, last functions at the bottom of each task, each stack barriers, so we need to analyze upper functions that contain them. I guess modifying those functions will improve your application or service performance actually. Opening the SVG format output in your web browser, highlighting, joining, searching, specific functions or stacks are also available using mouse and keyboard. Okay, so far, I have introduced some useful features of Gaider. From now on, I would like to explain tracing features. Because of time limitation, I'm going to explain only function tracing, signal tracing, IOT tracing. The target of function tracing is divided into three kind of things. First, native course such as C, Rust, and Go. Second, Python course using interpreter. Last, system course. Signal tracing is about signals delivered to the target. IOT tracing is about IOT operations at various levels such like device, task, file. Tracing target is divided into program and task. Program is finally not yet executed from storage, so Gaider can execute the target program at which point tracing begins from loader. Task is running. Gaider does not require restart of the running task for tracing instead it attached to the running task directly. Tracing commands are values. If you want to see detailed commands and options, please refer to Gaider in the description. The first tracing feature is native functions such as C, C++, Rust, Go. Native function tracing is started by bit trace command. The command is implemented using breakpoint called trap. Breakpoints for all small other tasks are injected to the target tasks by Gaider itself using bit trace. Gaider can detect events for function call and function return from the target task by bit trace. Gaider can even read and manipulate registers and memory for the target task when function call or return. As shown in the picture, calls takes are shown with various steps for Go program in real time. Arguments and binary name for each function are also printed together in Align. The G option in the command line is task filter. All tasks have name including Go will be targets for function tracing. The H option means printing back trace. If there was no H option in the command line, just all function calls are printed without depths. Function filter is also available with the C option to trace the target functions. The C option supports specific characteristics such as asterisk or illusion or circumflux or exclusion. Using the H option, all back trace are also printed with the target function is called every time. As I already mentioned, Gaider can read and write registers and memory for the target task at the time of each event. Function call or return. In addition, various features such like task control, injection for Python code, and external binary. Remote call are also available using call command. As shown in the picture on the right side, many call commands are supported to handle specific function call events. Let me explain some call commands. Extrude external command when function called. This one is filter. Print context if only specific conditions are met. Get arc. Print specific argument value. Set arc. Manipulate specific argument value. Get return. Print return value at the last time when function returned. PY file. Extrude specific Python script remotely. Read man. Print specific memory value. Manipulate system memory value. Slick. Wait for specific seconds. Syscall. Call specific syscall remotely. User call. Call specific user level call remotely. This kind of call commands are very useful when analyzing more deeply. This is about how to use call commands. These call commands are appended to the function filter with vertical bar in the C option. The command line at the picture means first start tracing only right function from yes, abcd command. As you know, yes is just little commands that you can use. This is the command where print abcd string repeatedly. Print function is implemented internally by the right function. Next, when the right function is called, print the memory value pointed by the first argument with back trace. The right function is the right function is specific memory address that points to the value abcd to be written. I guess the yes command is implemented using preferred writes because multiple abcd strings are written at once by the right call. Next tracen feature is Python call. Python function tracen is started by ptrace command. Pytrace command. The command will print all Python method calls. As shown in the picture, Python call stacks are printed in real time as well. As you can see, Python calls are just that depending on the stack frame. Bypass and line number for each function are also printed together. The target was iotap program that written in Python and prints it in real time. The next tracen feature is for cscore. Cscore tracen is started by strace command similar to original rinse command. The command will print all cscores and their arguments converted into an easy to understand format. As shown in the picture, cscores are printed in real time. Call commands used in previous function tracen are also available for this feature. Next tracen feature is for signal. Cscore tracen is started by strace command. The command will print all received signals for the target task. The signal is printed in real time and those of the signal generation and the sender can be also printed when receiving signal fault and trying the signals. As shown in the picture, received signals are printed for target stress in real time and those stress were terminated in real time. Cscore tracen is also available for this feature. Using backtrace, you can analyze which function is being executed when the target task receives signals. This feature is useful when monitoring multiple tasks to analyze abnormal terminations and the last tracen feature in this term is for IO. IO tracen means analyzing which tasks performed which operations, which files, which devices and what size. And it's not only for specific tasks but also all tasks. So it must be possible to correct all the problems including various metadata such like tasks, device, I-node, workload information. So IO tracen consists of three steps. First, recording all system IO events. Second, processing recorded data including conversion. Last, summarizing and reporting result. In the command line, IO like command is for recording system IO events into a specific file. Report command is for processing data and reporting to specific output file. As shown in the picture, first report information is about task workload. In the red box on the right side of the picture, black workloads are shown with a last time right size in megabyte. Operation can't field for each task. Not only workload but also a last time is printed. It's very useful to analyze delay caused by IO system widely. The cached IO is excluded in those states because this is an actual black operation. So some operations using page caches are not measurable. Next report information in IO tracen is about device workload by size. In the red box on the picture, it shows the workload of each device for the read operation of all tasks and the proportion about sequential operation. The read operations consist of 4k workload. In the blue box on the picture, about 55% of operations were sequential. In other words, about 43% of operations were random. This information is useful when optimizing device workload in kernel read overhead. Of course, not only total workload but also per task workload is also shown at the bottom of the picture. Next report information in IO tracen is about file workload. In the red box on the picture, it shows the workload of each file for the write operation of all tasks. The read operations of about 100 NB performed into the test file actually, it's all about the guidance read using IO test command. This information is useful when tracing or analyzing tasks for code or file code. Yeah, it's very nice. Next report information in IO tracen is about file operations. As shown in the picture, all read file operations are displayed including time, task, offset, size, and pass. Actually, it's about page fault operation. So the total file size is also appended into the end of pass in each line. But analyzing is a little bit difficult. So, okay, let me show demo finally. First of all, let's install Guider using VIP. Then check version. Okay, it's installed successfully. And check comments using help command like this. There are many comments supported by Guider. If you use hoption with any comment like this, options and examples for each comment will be shown like this. This is examples. Sometimes I refer to them. Next, let's execute yes program. It just print input string and then execute it in background with redirection to null device. And let's monitor system resources with top comment. Yes, processes using CPU much and it's running on this core now. There are many other states as shown like this in real time, but this is in real time. So let's profile system and tasks for 10 seconds using this comment. Okay, finish it. The report is saved into Guider.out file. So open the out file using VI. There is system information from the top of the file. General information resource uses like this. System resource uses in each interval for 10 seconds are shown. CPU uses available memory block I.O. SWAP network I.O. CPU uses this for whole system and specific processes including yes is shown like this. This is summarized states each interval states. Next, let's in addition virtual memory and physical memory, memory details are shown like this. This is about detailed states for each interval. Let's monitor this course with course tags. Only write this course being used. And this is the user level course tag calling the write this call. This is real time states. Next, let's monitor functions using CPU using by sampling techniques. It looks similar to previous this call course tag. Write function is called Lipsy using this back trace. Okay, let's trace this course with arguments and course tags like this. This is write this course and its arguments return value. This is back trace return value and elapsed time for each write call. So next trace functions and course tags. This is write function call course tags with memory value pointed by first argument for the write function. This is the address and the value is shown like this. So with the memory value pointed by first argument for the write function this is the address and the value is shown like this. Okay, next feature is visualization. This is performance graph showing system resource uses about CPU, memory, IO. This part shows CPU uses for full system and processes. This is about total CPU and CPU uses for other processes. This is levels for each graph. This part is about IO but there is no graph because profiling options were not enough. This part shows memory uses for full system like this. Next one is timeline chart showing scheduling time slices for all tasks. Each number shown on the left is the CPU core number and time slices of running tasks in each core are also shown. If you move the mouse pointer over specific time slice, information about task event and time is printed. Some time slices are there here. The last one is frame graph for course text. If you create a specific function box, zoom in is performed like this. Searching is also supported with regular expression like this. So far I have explained some kind of data and some kind of useful features including iteration of the guide. There are more useful features besides the one I described but I couldn't explain all of them because of time limitation. For specific details, please refer to the readme file in the guide. If you have any questions, please leave them in the email or github. Thank you for your listening. Thank you.