 Hi, my name is Isli working at Hyundai Motor Company. In this talk, I'm going to introduce Guider, a new performance monitoring demo, and explain how to use it. This is a summary for my talk. First, talking about performance analysis and optimization. Next, introducing Guider and its various features. And finally, explaining how to monitor performance of our system automatically using it. OK, let's talk about performance factors. First of all, the major performance factor is CPU. There are many reasons make your system slow, such like CPU intensive job, frequent context switching, easy way task, and so on. If your system is slowing down, the first thing to do is watching total CPU usage and which tasks are using CPU cores. Memory is also important. Frequent memory allocation and the allocation jobs will consume CPU more than expected. An inefficient allocation and missing the allocation, such like leads, can cause the out of memory. Then system will be slow seriously. In the worst case, it will restart finally. To free up memory, this corner tries to flush file caches to our pages out. It called Reclaim. Once Reclaim starts, the system will slowly start to slow down. Next one is IO. Generally, block devices slow list in our system to optimizations such like caching, preloading, compression, working tuning, work load tuning is required. Especially, unnecessary IO operations should be removed and contiguous operations must be merged. Last one is communication. Block. It's very important to prevent data corruption shared between multiple tasks, but it can have a huge impact on performance. Excessive lock contention increases CPU usage and also response time. Moreover, performance can be worse depending on lock attributes such like priority inheritance protocol, busy way, wake up strong in flue text. IPC. In the modern system, all services generate remote procedure call using system bus such like eBus to operate in complex relationship with each other. But this IPC may cause system overheads such like context switching, serialization, memory copy. In particular, broadcasting calls greatly increases the system load and response time. IPC is very heavy communication. And next one is IRQ. Especially, software interrupts called bottom part can affect system response time. Network drivers are typical. So it has many performance tuning options between usage and response time. It's about trade-off. In addition to these factors I described, there are many other performance factors. Most importantly, we need to able to recognize such performance factor and measure its actual impact. Yes, we need to think about how to measure them. Logging and using tools are the most effective way to analyze performance. Logging is very useful for recording specific information, but understanding log requires domain-specific knowledge, so system-level engineers for new members are typical to understand than usually. In addition, adding new logs requires source code and toolchain for rebuild. It's very boring and time-consuming. So it's also difficult to record and analyze too many logs because of the limitation for memory and hour time. So we prefer to use performance tools. It's very comfortable and effective to analyze performance at system level. Some nice tools doesn't even require rebuilding target program, installing itself with dependent packages or restarting target tasks. But sometimes, too many tools confuse us, terminating the right tool from a variety of performance issues that easy. So I introduce Skyder Unified Runtime Performance and Elijah. It can monitor, profile, trace, visualize various performance factors. Monitoring features provide continuous performance tests every interval in real time. Profiling features provide a statistic overview of collective data during a specific interval. Tracing features provide specific data on the execution of the task in the form of logs. Skyder is a kind of client command line interface tool, so it offers a lot of features by the combination of the commands and options. But in this talk, I will try to explain only some useful features and tracing features about it because of time limitation. It's open source program and written in Python. It doesn't require installation, but PIP and opening ready the up-to-respeed are supported for your convenience. Actually, just executing Skyder.py file is enough. Skyder never use external binaries such like executable programs or libraries, libraries, Python packages, except for matplot library for some kind of visualization features. Most of Skyder features are implemented directly using standard libraries such like libc. That's the reason why Skyder doesn't require rebuild, install, and configuration. In addition, it can be applied with only one megabyte of storage space. These characteristics are very attractive embedded systems. All features of Skyder are supported in Linux and Android, and it also provides some limited features on macOS and Windows. From now, let me introduce some killing features of Skyder. First one is monitoring system resources in real time. This feature works by periodically updating stats for system resource and events. System resource is about CPU, memory, swap, block, network, storage. As shown in the picture, in the first part, system resource information is shown on the top lines such as the number of core, RAM, swap. And additional system information such as context switching, interrupt, running task, memory zone, and performance stats using PMU are also displayed. In the second part, important system library resources and events are printed. System stats such as CPU usage, available memory, swap usage, memory reclaim, block IU, network IU, most precious information for performance analysis. In addition, core usage is also printed, although not shown in the picture. Carbonar clock temperature for each core can be shown together using specific options of Skyder. In the third part, storage information about busy workload or backlog space is shown for each device. Heavy storage workload can cause serious performance degradation. That's the reason why we check those stats. In the upper part of the picture, network information about inbound and outbound is shown for each device. In the lower part of the picture, not only system resources, but also task resources that are shown with their attributes in real time. It's a little bit similar to Linux top command. Uses for CPU, virtual, physical, shared memory, swap, and block IU and memory details are printed well. The shown tasks are sorted by CPU usage in a default, but you can change the sort order using a specific option. The task filter is also available to show only specific tasks. All or specific function cores are monitored for specific tasks in real time. In addition, stats of function cores are also printed such as average, minimum, or maximum time. For this picture, all function cores are shown with backtrace. That you see is not about CPU. It's the proportion for the total function cores. So this feature is useful when finding frequent cores or measuring specific function cores, function core counts, including backtrace. Of course, there is another function monitoring feature to measure CPU-intensive function cores by sampling techniques. The task filter and function filter are also sorted. All these cores, including backtrace, are also monitored for a specific task in real time. In addition, these core stats are also shown such as elapsed time at all return together. This feature is very useful when finding these cores that take a long time, measuring specific Cisco count, checking Cisco error returns. The task filter and Cisco filter are also sorted. All opened files, sockets, pipes, are monitored for each process in real time. Files are printed with position and open flag. TCP and UDP sockets are printed with binding and connection status. Unix domain sockets are also printed with a file pass. And this kind of information is very precious when debugging issues or performance tracing. The process filter and the file filter are also available. By using the file filter, monitoring all processes that open the specific files or bind the specific socket is possible. The first tracing feature is for native functions such as C, C++, Rust, Go. Native function tracing is started by Btrace command. The command is implemented using Btrace and FreakPoint called Trap. FreakPoints for all symbol of the disease from ELF and Dwarf sections are injected to their target tasks virtual memory by Guider itself. So Guider can detect events for function call and function return from their target task by Ptrace. Guider can even read and manipulate the registers and memory for the target task when function events occurs. As shown in the picture, call stacks are shown with various steps for Go program in real time. Arguments and binary names for each function are also printed together in online. The G option in the command line is task filter. That means all tasks have names including Go will be our targets for function tracing. The H option means printing backtraces. So if there was no H option in the command line, just all function calls only are printed without depth. Next tracing feature is about Python function. Python function tracing is started by Ptrace command, PyTrace command. The command will print all Python method calls. As shown in the picture, Python call stacks are printed in real time at various steps depending on the stack frame. Five paths and line number for each function are also printed together. The target was IOTAP program that returned in Python and prints IOCs in real time. Call commands used in previous native function tracing are also available for this feature. Next tracing feature is for Cisco. Cisco tracing is started by STrace command, similar to Linux original command. The command will print all Cisco's and their arguments converted into an easy to understand format. As shown in the picture, Cisco's are printed with backtrace, return value, and last time in real time. Call commands used in previous native function tracing are also available for this feature. Next tracing feature is for Signer. Signer tracing is started by STrace command. The command will print all received signals for the target task. In addition, the calls of the signal generation and the sender can be also printed when receiving a segmentation fault, child signal. As shown in the picture, received signals are printed for the target threads in real time. And those threads were terminated because of segmentation fault caused by long memory access. Text-based analysis is specific but less readable. That's why Guider provides visualization features in SVG format. Using the SVG format output in your web browser, it provides an easy to view and responsive interface. First visualization feature is about resource graph. As shown in the picture, the top box shows graphs of CPU usage for processes. The box on the right side is the label list of the CPU graphs. The middle box shows graphs of block and network IO for the full system. The bottom box shows graphs of memory for the full system. Of course, process graphs about block, network, and memory resources are also available. In addition, filter option for all of them is also supported. As you can see, this visualization feature makes it easy to understand big data collected for a long time. And it also helps to understand the trends in resource usage. And this is also good for communication with other people. Next visualization feature is about scheduling. The scheduling data is very large and very difficult to analyze one by one. Therefore, as shown in the picture, scheduling data such as time slice, preemption, and blocking should be visualized prior to detailed analysis. Using the SVZ format output in your web browser, you can view details for time slices or time slices. It's very effective for analyzing multi-threaded programs, interactive services, and tasks for utilization. In addition, this feature also allows for scheduling events as well as other custom events having time stamps for start and end. Last visualization feature is about call stacks. Analyzing only last called functions without full call stack is difficult because standard functions such like read, write can be called by any other functions. Above all, in most cases, last called functions will not cause all the problems. The problem is likely some other functions that called those last functions. Therefore, to analyze performance problems in a function level, we need to be able to see the whole including each call stack. In this case, this plain graph features values for to analyze call stack-based profiling results for CPU usage and blocking status, memory leak, syscall trigger function calls. As shown in the picture, last functions at the bottom of each stack are various, so we need to analyze other functions that contains them. I guess modifying those functions will improve your application or service performance actually. Opening the SVG format output in your web browser, highlighting, joining, searching, specific functions or stacks are also available using mouse and keyboard. However, how to analyze a problem that is not well reproduced? Southern startling, screen phasing, system reset, audio chopping, throw at lunch, or switching. These kinds of problems are difficult to analyze using simple logs and often require system-level information. Even if it is a temporary problem that occurs suddenly and disappears, there is no time to try to analyze something by connecting a terminal. Sometimes there are potential problems that are not even visible to the naked eye. How can you analyze those problems easily, quickly, and accurately with minimal effort? For this purpose, Gaider runs as a system demon at all times and monitors system status and activities. And based on a threshold for a predefined resource or event, specific commands are automatically executed to handle them. The specific operation is shown in the figure. Gaider demon loads the config file during initialization and registers resource thresholds for each defined event and commands to handle them. After that, the target resource uses is periodically saved and checked. And if this resource uses meets the conditions of a specific event, the event is generated. When an event occurs, the Gaider sequentially executes the commands registered in the event. In addition, a separate report file is created and stored in our storage by summarizing the stored system information, including resource usage. This allows you to automatically handle problems with predefined commands when a specific problem occurs or start problem analysis with a generated report. These functions, which work all the time, are usually performed using a small account of resources of about 1% to 5% per second, based on only one query, one CPU query. Yeah, it's very right. In the demon, the occurrence condition of each event is defined as the threshold value of each resource defined in advance through the config file. Each event is largely classified into system, task, and device units. System type attributes are defined for system-wide resources. Task type attributes are defined as well as all or specific tasks. Device type attributes are defined for specific storages or network devices. In the case of resources, as well as hardware devices such as CPU, GPU, RAM, stories, and network, logical resources such as block system load, file descriptors, sockets, and in-storage files are also included. Additionally, various types of logs, functions, and IPC messages are also able to be monitored. OK, the command to be executed when an event occurs through the config file can also be defined. If you look at the config in JSON format on the right side of the screen, the events of system and task are defined for CPU resources. The attributes of each event include threshold value such as total and a list of commands to be executed. Through the system part defined at the top, if the average system CPU uses exceeds 95%, the same command is executed and Guider creates a report file based on the information it has stored in the buffer. Through the task part defined at the bottom, when the average CPU uses a specific task exceeds 95%, the CMD, TITA proc command is executed and the Guider starts concrete monitoring for all threads of the target process and saves it to a file. In this way, when each event occurs, the commands defined in the command list are executed sequentially and in parallel. These commands may be predefined in the config file. It can also be a user-defined command that the user can execute immediately in the share. The left side of the screen shows the commands provided by default. Top command, system-wide monitoring, the F-top command monitors open files, the M-top command provides specific memory monitoring of the system and specific tasks. The disk-top command provides specific IO monitoring of the system and tasks, and the net-top command provides specific network monitoring of the system. The funk-wack command monitors functions for all threads in our system. The T-top proc commands monitors all threads of the target process and T-top, U-top command monitors function that performs specific target threads. Finally, the leak command performs function tracing to find the memory leak point for a specific process. In addition, various functions provided by Guider can be easily combined and used as event handling commands. All of this is done by Guider itself without installing any separate packages or relying on any particular system. As shown at the bottom of the screen, when an event occurs, the names of the report files that are automatically generated have a regular format. All the files start with the execution order and number of reports. And the resource occurrence range, threshold value, and time information of the events that generated the report file, the report added to the file. In default, each report is just a test file, but it can be compressed and generated according to the guide execution options. The reason that the conflation function is supported is because report files are automatically created in the background. So storage can be excessively used when many files are created. In addition, if you use the option of the maximum size of the directory where report files are created, the demo tries to maintain the specified capacity by erasing all these to one if a new report file is created that exceeds the specified size. This is very important for math products in embedded world. From now on, we will explain the report file that is automatically generated when an event occurs. At the top of the report file, information like the picture is displayed, execution options, versions, runtime, load, number of tasks, corner command line. This information helps to understand each system and execution options through the report files generated in various environments. The following is information about system resources. Information about when storage network is displayed, including other resources. It displays the system resources at the time the report was generated, as well as the resources at the time the demo was first launched. Using this, you can see the approximate resource changes and easily determine the resource available at the time of event creation. If the previous page showed the snapshot at the time of the event as each resource table from now on based on the information stored in the buffer before the event occurs, it shows the amount of resource change in each section for a long time. For example, the top summary table shows the system resource usage line by line for each segment. This plays various resources such as CPU, memory, block, swap, reclaim, fault, context switching, interrupt, number of tasks, or network. Can you see the resource change? Network is the top CPU input table. From here on, the CPU usage of threads is displayed in units of intervals according to each process and option, not the system. It also displays the maximum average, minimum, and total stats for each process's CPU usage. This is very useful for detecting steady CPU usage or sudden burst of CPU usage for a specific task. The resource table of these process units is not only CPU, but also VSS, RSS, Delay, IOC Group, and other things. When analyzing a report, it can be useful from several perspectives. Finally, the part that shows the most specific information. It shows detailed events and resource information that occurred in each section. Based on the previously summarized information, it is information that can be referenced when you need specific information on a specific section. In particular, the information of the process is specified, and it is possible to check how much each resource is used in which area. Finally, the part that shows the most specific information. This is about graph based on previous text report. We can check the trend of resources in the picture, and long data, big data, based on long term profiling. There are cases when it is necessary to control the daemon under a special circumstance. At this time, since the daemon operates in the background, comments cannot be sent to the standard input, so the guide provides an event command separately to control the background daemon. This command is originally used to generate a specific event, but if string starting with CMD is used, it's interpreted as a command to be passed to the daemon. The command to be delivered to the daemon is as shown in the figure. There are buffer size changes, buffer initialization, monitoring stop, activation and deactivation of monitoring for specific resources, change of one term cycle, config reload, first report generation, daemon restart, yeah. So far, I have explained some kind of useful features, including tracing of guider. There are more useful features besides the ones I described, but I couldn't explain all them because of time limitation. For specific details, please refer to the readme file in your guide or report, and if you have any questions, please contact me using email or a github. Thank you for your listening. Thank you.