 Hi, I'm Ai Weihe, an open source engineer from Huawei. Today, I'd like to share with you the topic, Prometheus Enabled AI Deep Observability based on EPPF. I'm going to start my speech from the following four aspects. First, let's talk about the current status of AI observability. Next, let's talk about open source projects for AI observability and how these projects collaborate to achieve AI observability. Finally, I will show you a simple example. Now, let's get to the point. Currently, all deep learning have one problem. The AI training process is invisible. When running an AI task, we don't know how it's led, which CPU card is running, which kernel functions are called, and how to jump. Once a bottleneck occurs, developers often use some common monitoring tools to analyze the problem. However, these monitoring tools often have blind spots. For example, they can obtain the information about the long-life processes, but some short-life processes cannot be captured, which can result in information loss and a large number of these processes consume resources. In addition, these tools are inflexible and have low performance. The collected data often involves passing and copying from kernel space to user space several times. Next, let's look at the major open source projects that AI observability uses. First, we need an AI framework to run training tasks. Manifold is such a training and inference framework for deep learning. Now, let's talk about Manifold. Manifold provides users with simple development experience. Uniformize APIs for developers, implementing parallel training by using serial expressions. Simplifies the development process and provides users with flexible debugging modes. Developers can switch between static mode and dynamic debugging modes by changing only one line of code to quickly locate false. In addition, Manifold fully leverages the assigned processor performance and supports rapid deployment in all device-age cloud scenarios. To learn more about Manifold technical details, please visit the following two websites to download Manifold codes. If Manifold is used to execute AI tasks, a new kernel monitoring tool may need to be developed to collect a specific AI information. This tool needs to be recompiled into a kernel, which is agile and cumbersome. In addition, the performance is poor. Data still needs to be copied from the kernel to the user space for multiple times. This is something we don't want to see. This requires us to use some technologies to flexibly and dynamically collect the kernel information we need without recompiling the kernel. The new technology is ABPF. Before we talk about ABPF, let's take a look at its predecessor, BPF. BPF is the short name of Berkeley Packet Field. It was originally developed in 1992 to solve the problem of low efficiency of the existing packet filtering machine agent. BPF has two core features. One is filled, which can be used to fill out data packets that do not meet requirements based on the input rules. The other is copied. The packet's light meter conditions are copied from the kernel space to the user space. Let's look at the finger on the right to see the BPF workflow. When a packet arrives at the network card, the driver at the data link forward the packet upward to the protocol stack. If the BPF is used to listen to the network adapter, the driver first calls the BPF to send packets to the filters of different programs. And they can filter packets and copy the packets that meet the rules corresponding program buffer, compared with traditional BPF, BPF filters packets before bothering them. Instead of copying all packets from the kernel space to the user space without filtering, BPF reduces resource consumption and improves work efficiency. Take TCP dump as an example. It captures all new data packets that pass through the local host. Run the TCP dump command. In the command that is used to convert the entered expression into a sampled code. That can be read by humans as human in low-light fingers. That indicates the local loopback address. TCP and DEST part 7070 indicates the filtering rule entered by the user. TCP dump depends on the pcap library. Many core functions of TCP dump are implemented by the labpcap library. As shown in the finger on the left, it accepts the packet filtering rule. Through labpcap, the rule can be converted into procedural machine language. After executing procedural code packets that meet the rule are copied to the user space through labpcap. Now let's look at the overload working process. Capture packets passing through the local loopback address and check whether the packets are IPv6 packets. If yes, go to step 002. If no, go to step 006. Then check whether the packets are TCP packets. If yes, go to step 006. Finally, the system checks whether the target part is 7070. If yes, the system process to instruction 01424 the packet that meets the requirement. If no, the system process to instruction 0152 discard the packet that does not meet the requirement. These rules can be used as if else judgment can chill flows to filled packets, which are simple and effective. Because of the efficient flow of BPF in the packet filtering environment in 1997, BPF was introduced into the Linux kernel version 2.1.75. At first, BPF only exists in a single file of network module directory in the kernel. After BPF was introduced into Linux, there was little activity for long time except for minor performance adjustments until version 3.17. The BPF file is added to kernel slash BPF directory instead of a single feature in the network module directory. This is the new technology that we are going to talk about today. Compared with the traditional BPF, eBPF brings revolutionary changes. On the one hand, eBPF extends application scenarios from network packet filtering to kernel chasing, application performance optimization, monitoring, and traffic control. In the past, this feature is used only in packet filtering scenarios and has not been expanded. Nowadays, eBPF is widely known and used as a new kernel technology. On the other hand, the procedural instruction set becomes more and more complex after the scenario is extended. And the traditional pure assembly development mode becomes unshootable. Therefore, in terms of interface design and usability, eBPF has also been greatly improved by using the C language to write BPF code. As shown in the finger on the right, BPF was used only for network devices to filter packets. Now, BPF can be applied to storage scenarios and can be hooked to any part of the kernel software stack for performance optimization and monitoring. Now, let's look at the eBPF architecture. The eBPF can dynamically inject a code customized by developers in the user space into kernel for execution without recompiling the kernel. The dynamic injection process is as follows. First, compile C code in use mode and dynamically inject it into the kernel for execution. Second, allow VM compiling and pathing C codes to generate BPF guide codes. Third, inject BPF guide codes into the kernel and perform two rounds of security checks on the code. Fourth, execute it after JIT optimization compilation. Fifth, create a shared map in the user space and kernel space and save execution result of the BPF guide codes to a shared map. This process does not require recompiling of kernel source code, which is flexible, efficient, and secure and lightweight. Although eBPF can be implemented using C, the compiled ELF file is still generated. Developers need to manually pass the ELF and then inject it into the kernel. This is a complex task. To solve this problem, the BCC open source project called BPC or BPF compiler collection has emerged. It's an open source Python library that implements functions such as map creation, code compilation, pathing, and injection. Developers only need to focus on developing specific eBPF code in the C language, without considering how is compiled, parsed, and injected into the kernel, which brings great convenience to developers. For example, if we develop eBPF code and inject it into the kernel system call function, we can call the Galaxy's call function provided by the BCC to obtain the mountable system call function. And then use the attach kprobe function, attach the eBPF code to the corresponding kernel function. The following finger shows two fingers. The finger on the left is a performance monitoring tool that is not developed by eBPF. And the finger on the right is a performance monitoring tool that is developed by the eBPF. After data in the left finger is copied from the kernel space to user space, it needs to be parsed again to obtain this histogram. In the right finger, eBPF can be used to save the collected data in the histogram format in the kernel space. The user space and the kernel space show the map reducing the data copy and improving the execution efficiency. In addition, developers can dynamically inject eBPF custom calls into kernel for execution, which is inflexible and lightweight. During AI tasks, we hope that the monitor matrix can be displayed to developers so that they can visually understand the running status of each layer of AI tasks. The Prometheus monitoring management system is used here. Prometheus is an important member of the Cloud Native Computing Foundation since its ecosystem. It is second only to Kubernetes in activities. Prometheus consists of components such as Prometheus server and exporters and provides visualization functions. Generally, an exporter function is provided for the target object to collect the required monitoring of metrics. Prometheus server obtains the monitoring metrics from the exporter in a unified format and displays the metrics to developers on the web page. I have introduced the municipal deep learning training and influence framework, eBPF kernel technologies, and the Prometheus monitoring management system. How can we collaborate with them to achieve AI observability? Next, let's talk about the collaboration solution. First, run the net task using municipal to dynamically check changes in kernel. Dynamically, attack the eBPF code to system call functions or kernel functions. Once these kernel functions are caught, eBPF code is triggered to collect a predefined metrics. Integrate these metrics during AI inference and training and upload them to the Prometheus monitoring system to improve the observability of the AI kernel. Let's see some simple examples. This is a simple eBPF example. The eBPF code segment is filtered to the BRK account IO start function and the BRK account IO down kernel functions. Once the block account IO start and block account IO down functions are triggered, the eBPF code is executed to record the IO start and time stamps. Calculate the IO delay and display the IO delay in histogram on the command terminal. Displays the number of IO requests times in different times ranges. The preceding information is displayed on the command terminal. Developers want to display it on the web pages. Expired by the cloudflare.eBPF export open source projects, we can develop AI eBPF export to customize AI metrics, collect and display them on the web page using Prometheus. Finally, we use the municipal framework to execute the LANET task inject the hello eBPF code segment into the BRK account IO down kernel function. Once the municipal LANET task invokes the BRK account IO down kernel function, hello word is displayed otherwise no information is printed. Currently, this solution is in the early stages of the experiment. And in the future, most importantly, we should analyze what should be done in AI scenarios, what can be used and what can be checked from thousands of available kernel events, then cooperate with other open source communities to develop an eBPF-based human AI observability tool. Enabling Microsoft to support the eBPF AI observation to work with Prometheus and eBPF exports to visualize AI kernel metrics. We hope that more developers can participate and discuss how to implement and improve AI observability. Thank you.