 Hi, my name is Pisri, working at Hyundai Motor Company. In this talk, I'm going to introduce Guider Unified Runtime Performance Analyzer and explain how to analyze automotive and front-end motor system using it. This is a summary for my talk. First, talking about performance analysis and optimization. Next, introducing Guider and its useful features. And next, explaining how to analyze automotive and front-end system using it. Finally, showing demo how to use Guider. Software development eventually lead to code changes. Changed code introduces new bugs and makes working code meaningless, causes unnecessary communication between tasks and requires new optimizations. For new platform development and continuous update, we constantly create change and delete code, but we are not very sensitive to the performance degradation caused by the change. And if these changes are accumulated, the code is modified very significantly to improve the deteriorated performance. It's a mixture cycle that repeats. Therefore, we always need to manage, analyze, and improve performance more systematically. Let's talk about performance factors. First of all, the major performance factor is CPU. There are many reasons make your system slow such like a CPU intensive job, pre-country context switching, very busy way tasks, and so on. If your system is slowing down, the first thing to do is watching total CPU uses and which tasks are using CPU cores. Memory is also important. Frequent memory allocation and deallocation jobs will consume CPU more than expected. And in efficient allocation and missing deallocation such like lakes can cause the out of memory, then system will be slow seriously. In worst case, it will restart finally. To free on memory, Linux kernel tried to flush file caches, swap pages out, it called reclaimed. Once reclaimed start, the system will slowly start to slow down. Next one is IEU. Generally, block device is the slowest in our system. So, optimization such like caching, reloading, completion, workload tuning is required. Especially on necessary IEU operations should be removed and continuous operations must be merged. Last one is the communication. Lock, it's very important to prevent data collection shared between multiple tasks but it can have a huge impact on performance. Accessible lock contention increases CPU uses and also response time. Moreover, performance can be worse depending on lock attributes such like priority inheritance protocol, vis-a-vis wake up strong in few takes. IPC, in the modern system, all services generate framework procedure call using system burst such like DevOps to avoid in complex relationship with each other. But this RPC may cause system overheads such like context switching, serialization, memory copy. In particularly, broadcasting calls could actually increases the system load and response time. IRQ, especially software interrupts called bottom heart can affect system response time. Network drivers are typical so it has many performance tuning options between CPU uses and response time. It's about trade off. In addition to these factors I described, there are many other performance factors. Most importantly, we need to be able to recognize each performance factor and measure its actual impact. Yes, we need to think about how to measure them. Logging and using tools are most effective way to analyze performance. Logging is values for recording specific information but understanding log requires non-n-specific knowledge so system-level engineers or new members difficult to understand them usually. In addition, adding new logs requires some search code and tool chain for rebuild. It's very boring and time-consuming. It's also difficult to record and analyze too many logs because of the limitation for memory and our time. So we prefer to use performance tools. It's very comfortable and effective to analyze performance at system level. Some nice tools doesn't even require rebuilding target program and installing itself with dependent packages restarting target tasks. But sometimes too many tools confuse us determining the right tool from a variety of performance issues is not easy. So I introduce Guider Unified Runtime Performance Analyzer. It can monitor, profile, trace, visualize various performance factors. Monitoring features provide continuous performance stats every interval in real time. Profiling features provide a statistic overview of collected data during a specific interval. Tracing features provide specific data on the execution of the task in the form of logs. Guider is a kind of client command line interface tool so it offers a lot of features by the combination of commands and options. But in this talk, I will try to explain only some useful features and tracing features about it because of the time limitation. It's open source program and written in Python. It doesn't require installation but PIP and OE, Yachto recipe also supported for your convenience. Actually just executing Guider.py file is enough. Guider never use external binaries such like executable programs. Packages accept for metplot library for some kind of visualization features. Most of guide features are implemented directly using standard libraries such like Lipsy. That's the reason why Guider doesn't require rebuild and install configuration. In addition, it can be applied with only one megabyte of storage space. These characteristics are very attractive in embedded system, especially. All features of Guider are supported in Linux and Android and it also provides some limited features on Mac OS and Windows. So from now, let me introduce some carrying features of Guider. First one is monitoring system resources in real time. This feature works by periodically updating states for system resource and events. System resources of CPU, memory swap, block, network storage. As shown in the picture, in the first part system resource information is shown on the timeline such as the number of cores, RAM, and swap. And additional system information such as context switching, interrupt, running task, memory zone, performance stats using PMU also displayed. In the second part, important system-level resources and events are printed. System states such as CPU uses of valuable memory, swap uses, memory decline, lab IO, and drug IO also features information for performance analysis. In addition, core usage is also printed. Although not shown in this picture, governor, clock, temperature for each core can be shown together using specific options. In the third part, storage and information about busy workload of developed space is shown for each device. Having storage workload can cause serious performance degradation. That's the reason why we tracked those stats. In the upper part of the picture, network information about inbound and outbound is shown for each device. In the lower part of the picture, not only system resources, but also task resources that are shown with their attributes in real time. It's a little bit similar to Reynolds top comment. Usages for CPU, virtual physical shared memory, swap, block IO, and memory details are printed well. The shown tasks are sorted by CPU usage by default, but you can change the search order using a specific option. The task filter is also available to show on the specific tasks. All or specific function calls are monitored for a specific task in real time. In addition, tasks about function calls are also printed such as average, minimum, maximum time. At this picture, all function calls are shown with that trace. That usage is not about CPU, it's the proportion for the total function calls. So this picture is useful when finding frequent calls or measuring specific function call counts, including backtrace. Of course, there is another function monitor feature to measure CPU intensive function calls by sampling techniques. The task filter and function filter are also supported. All system calls including backtrace are also monitored for a specific task in real time. In addition, these call states are also shown such as allowed time, error return together. This feature is values for when finding these calls that take a long time, measuring specific Cisco count, checking Cisco error returns. The task filter and Cisco filter are also supported. All of the files, sockets, pipes are monitored for each process in real time. Files are printed with position and open flag. TCP and UDP sockets are printed with binding and connection status. Units domain sockets are also printed with the file pass. This kind of information is very precious when debugging issues or performance tracing. The process filter and the file filter are also available by using the file filter, monitoring all processes that open specific files or binding specific sockets is possible. Precise monitoring features are for checking current status, but if someone want to see a summary of system changes for a long time, the profiling features can be good solutions. As shown in the picture in the top table, changes for system resources and events are printed for each interval. CPU uses available memory block IO swap uses, memory claim size, running task, native uses summarized in each line for each interval. Because of screen length limitation, some fields were truncated. In the middle table, changes for storage uses displayed with total summary. There was no storage operation in the profiling time. Busy, workload size, available space summarized for each time interval for each device. In the bottom table, changes for network uses are printed with total summary in the red box. Workload for inbound and outbound is summarized for each time interval for each device. Next profiling features are for tasks. In the first table, changes for pop process CPU uses are shown in task attributes and total summary for each interval. Total summary information in the red box reflects and CPU uses such as minimum, average, maximum, and total for each task and whole system. In the second table, changes for pop process virtual memory uses are printed with task attributes and total summary for each interval. The overall format is similar to the CPU table above. Although not shown in this picture, various types of tables are reported together such like scheduling delay, physical memory, block IU, signal uses. By using these features, measuring and comparing resource uses are possible for various test cases. Next feature is for comparing performance between different versions of software. This feature makes it easy to analyze changes in resources such as CPU, GPU, and memory due to version changes. Each resource is largely divided into system and task unit uses and resource usage statistics provided as mean, average, max, and total. Text-based analysis is specific but less readable. That's why it provides visualization features in SVG format. Using the SVG format offered in your web browser, it provides easy to view the responsive interface. First visualization feature is about resource graph. As shown in the picture, the top box shows graphs of CPU uses for processes. The box on the right side is the label list for the CPU graphs. The middle box shows graphs of block network IU for the whole system. The bottom box shows graphs of memory for the whole system. Of course, process graphs about block network memory resources are also available. In addition, filter option for all of them is also supported. As shown, as you can see, this visualization feature makes it easy to understand big data collected for a long time. And it also helps to understand trends in resource uses. This is also good for communication with other people. Next visualization feature is about scheduling. The scheduling data is very large and very difficult to analyze one by one. Therefore, as shown in the picture, scheduling data such as time slice preemption and blocking should be visualized prior to a detailed analysis. Using SVG format output in your web browser, you can view details for time slices. It's very effective for analyzing multi-threaded programs interactive services, delayed tasks, core utilization. In addition, this feature also allows for scheduling events as well as other custom events having time stamps for start and end. Last visualization feature is about call stacks. Analyzing only last called functions without who on call stack is difficult because standard functions such like read and write can be called by any other functions. Above all, in most cases, last called functions will not cause all the problems. The problem is likely some other functions that called those last functions. Therefore, to analyze performance problems in function level, we need to able to see the who including each call stack. In this case, this prime graph features values to analyze call stack-based profiling result for CPUs, blocking status, memory leak, Cs call trigger, function call. As shown in the picture, last functions at the bottom of each stack are various. So we need to analyze upper functions that contains them. I guess modifying those functions will improve your application or service performance actually. Opening the SBZ format output in your web browser, highlighting and joining or searching specific functions or stacks are also available using mouse and keyboard. Okay, so far I have introduced some useful features of Guider. From now on, I would like to explain tracing features. Because of time limitation, I'm going to explain only function tracing, signal tracing, and IO tracing. The target of function tracing is divided into three kinds of thing. First, native calls such as C, Rust, and GoLanguage. Second, Python calls using interpreter. Last, system calls. Signal tracing is a signal delivered to the target. IO tracing is about IO operations at various levels such like device, task, and file. Tracing target is divided into program and task. Program is binary, not yet executed from storage. So Guider can execute their target program at which point the tracing begins from loaders. This is, yeah, task is a running thread. Guider does not require restart of the running task for tracing. Instead, it attach to the running target thread directly. Tracing commands are various. So if you want to see detailed commands and options, please refer to Guider help more. The first tracing feature is for native functions such as C, C++, Rust, and GoLanguage. Native function tracing is started by bit trace command. The command is implemented using backtrace called trap. Backtrace is for all similar addresses from ERF and DROP sections are injected to their target tasks virtual memory by Guider itself. So Guider can detect events for function call and function return from the target task by bit trace. Guider can even read and manipulate the registers and memory for the target task when function events occur. As shown in the picture, call states are shown with various steps for GoProgram in real time. Arguments and binary name for each function are also printed together in a line. The G option in the command line is task filter. That means all tasks have name including Go will be targets for function tracing. The H option means printing backtraces. If there was no H option in the command line, just all function calls are printed without depth. Function filter is also available with the C option to trace only specific functions. The C options are for specific characters such like asterisk for inclusion or some complex for exclusion. Using the H option, all backtraces are also printed when the target function is called. As I already mentioned, Guider can read and manipulate registers and memory for the target task at the time of each event. In addition, various features such like task control, injection for Python code and external binary, remote call are also available using call command. As shown in the picture on the right side, many call commands are supported to handle specific function call events. Let me explain some call commands, execute external command when function called filter, print context if only specific conditions are met, get erg, print specific argument value, set erg, manipulate specific argument value, get return, print return value and elapsed time when function return. PY file, execute specific Python script remotely, read mem, print specific memory value, write mem, manipulate specific memory value, sleep, wait for specific seconds, cscor, call specific cscor remotely, user call, call specific user level call remotely. This kind of call commands are values for when analyzing more deeply. This is about how to use call commands. Call commands are appended to the function filter with vertical bar in the C option. The command line at the picture means first, start tracing only a write function from yesavcd command. As you know, yes is just links command that print arguments value and finitory. So yes, abcd command will print abcd string repeatedly and print function is implemented internally by the write function in libc. Next, when the write function is called, print the memory value pointed to by the first argument with factories, argument number starts from jerof in guider, therefore function arguments for the write function is specific memory address that point to the value abcd to be written. I guess the yes program is implemented using preferred write because multiple abcd strings are written at once by the write cscor, like this. So previous function tracing features for the remaining functions, but how about programs written by jit compiler languages such like JavaScript? It's impossible to get symbol mapping information about them using just the only ELF tables or DWARF. But if the target test provides guider with jit compiled symbol address mapping information and jit compiled function calls follow process call companion, it's possible to trace them. Java and node.js can export symbol address mapping table using external file at runtime. Then guider can trace their jit compiled function calls after importing the mapping table. Yeah, like this. This is a tracing example about node.js. As shown in the picture, there are jit symbols in red boxes and native symbols outside. Function filtering for jit symbols is also a property. Next tracing feature is about Python function. Python function tracing is started by pytrace command. The command will print all Python method calls. As shown in the picture, Python call stacks are printed in real time at various depth depending on the step frame. File pass and line number for each function are also printed together. The target was IOTAP program that written in Python and prints IOCs in real time. Call commands used in previous native function tracing are also available in this picture. Next tracing feature is for Cisco. Cisco tracing is started by strace command similar to original links command. The command will print all these calls and their arguments converted into an easy to understand format. As shown in the picture, these calls are printed with that trace, return value, elapsed time in real time. Cisco commands used in previous native function tracing are also available for this feature. Next feature is for signal. Signal tracing is started by strace command. The command will print all received signals for the target task. In addition, the calls of the signal generation and the sender can be also printed when receiving segmentation fault child test, child signals. As shown in the picture, reserved signals are printed for target stress in real time and those stress were dominated because of segmentation fault caused by long memory access. Backtrace option and signal filter option are also available for this feature. Using backtrace, you can analyze which function is being executed when the target task received signals. This feature is useful when monitoring multiple threats to analyze abnormal termination between as segmentation fault. Last trace feature is in this term is for IO. IO tracing means analyze which task performed which operations on which files on which devices and at what size. And it's not only for specific tasks, but also all tasks, whole system. So it must be possible to collect all system IO events including various metal data such like task, device, I-node, or loading information. The IO tracing consists of three steps. First, recording all system IO events. And second, processing recorded data including conversion, last summarizing and reporting result. In the command line, IO like command is for recording system IO events into a specific file. Report command is for processing data reporting to a specific output file. As shown in the picture, first report information is about task work load in the red box on the right side of the picture. Black workloads are shown with elapsed time for read and write size and in megabyte. Operation count fills for each task. Not only workload, but also elapsed time is printed. It's very useful to analyze delay caused by IO system widely. The cache that IO is excluded in those tests because this is about actual block operation. So some operations using page caches are not measurable. Next report information on IO tracing is about device workload by size. In the red box on the picture, it shows the workload of each device through the reading operation of all tasks and the proportion about sequential operation. Most read operations consist of 4K workload in the blue box on the picture. About 57% of operations were sequential. In other hand, in other words, about 43% of operations were random IO. This information is useful when optimizing device workload including corner reader hand. Of course, not only total workload, but also per task workload is also shown at the bottom of the picture. Next report information in IO tracing is about file workload. In the red box on the picture, it shows the workload of each file for the write operation of all tasks. Most write operations about 100 megabyte performed into the test file. Actually, it's all about the guide thread using IO test command. This information is useful when tracing or analyzing task workload or file workload. Yeah, this is very nice. Next report information in IO tracing is about file operations. As shown in the picture, all read file operations are displayed including time, task, offset, size, and path. The total file size is also appended to the end of path in each line. In detail, the information, but analyzing is a little bit difficult. Yeah, so let me show demo finally. First of all, let's install guide using PIP and check version, check comments using hypercomment. Yeah, there are many comments supported with various OSs. If you use Hoption with any comment, options and examples for each comment will be shown. It's very specific data. So let's execute the yes program. It just print input string repetitively. Then, yeah, like this. And then executing background with redirection, like this. Okay, then let's monitor system resources with the top comment. Yeah. As you can see, yes process is using CPU merge and it is running on this core now. Yeah. There are many other steps as shown in real time. And next, let's profile system and tasks for 10 seconds with the top comment. Yeah. And next, open the output file, guide all that out. There is system information on the top of the file, like this. And system resource uses in each interval for 10 seconds and CPU usages for the whole system and specific processes, including yes is shown like this. In addition, virtual memory, digital memory, memory details are also shown. This is about detailed stats for each interval, like this, detailed statistics. Let's monitor this course with course text. Only write this course being used in yes program. Look at this. And this is the usual level course text calling the write this course. So, yeah. Next, let's monitor functions using CPU by sampling techniques with utap comment. It looks similar to previous Cisco course text, but it's about all usual level function course. Now, next one is about Cisco tracing with argument and course text. Yeah, this is write this course and its arguments and return value and elapsed time. And next, let's trace function with arguments and course text. Read the memory value pointed by first argument for the write function. Let's check. Yeah, like this. This is the address and the value is shown like this for a memory value pointed by first argument for the write function core. This is about core command. Yeah, next feature is visualization. This is a performance graph showing system resource uses about CPU memory, IO. This part shows CPU uses for whole system and processes. This is about total CPU and CPU uses for other processes. Yeah, this is levels for each graph. This part is about IO, but there is no graph because profiling options were not enough. And this part shows memory uses for whole system. Yeah, this is about eight. Next one is timestamp chart showing scheduling time slices for all tasks. Each number I shown on the left is the CPU core number and time slices running tasks in each core are also shown. If you move the mouse pointer over a specific time slice, information about task event and time is printed. Some time slices are here. Last one is frame graph for the cross text. If you click on a specific function box, zoom in is performed and mouse moving to specific slots, you can check detailed information. Yeah, searching is also supported with regular expression. So this is my talk. Thank you for your listening. And if you have any questions, please let me know it. This is the channel for communication with me. So don't hesitate to contact me. Thank you.