 Hello, everyone. Thank you all for joining me today. I'm very honored to share how to develop BPL tools with live BPL and BPL CRI at KubeKorn. First of all, let me introduce my little bit. My name is Leng Bojiao. I come from Pincap, the company behind TIDB, and I'm an active contributor to live BPL tools. I was fortunate to get a lot of help from Andrew Naikariko, who is the BPL and BPL CRI project leader. Then I was contributing to live BPL tools. I learned a lot from him and the community and I enjoyed communicating with them. Now let's get down to the topic. I will share my experience about writing BPL applications with live BPL. I hope this topic is helpful to those who are interested in live BPL and it could inspire them to further develop and improve BPL applications with live BPL. So let's first see what's BPL CRI first. Before talking about this concept, let's think about what BPL capability is. We know that the development with kernel is very varied. Although backward compatibility can be guaranteed as a system core layer, changes within the kernel sub-system don't guarantee this compatibility. When we use BPL for system tracing, we often need to get function parameters, read fields in the struct, and so on. Obviously, BPL programs need to fix this compatibility problem. Therefore, we need to clarify the probability of BPL. BPL probability is the ability to write the BPL program that will successfully compile past kernel verification and they work correctly across different kernel versions. We also need to recompel it for each particular kernel. The solution to BPL probability is BPL CRI. Some people may have some doubts on the necessary pay of BPL CRI. When we use BCC, it seems like we don't encounter any compatibility issues. So why do we need this new thing? The answer is yes, we need it. The reason is that although the emergencies of BCC is a major improvement in the BPL development experience, it has some not-for-shortcomings. I used the C-lounge front-end to modify your writing BPL for programs. When the problem occurs, it's difficult to find the problem and figure out the solution. You have to remember naming conversations and automatically generated three-point structures. Because the left BCC library contains a huge LLVM or C-lounge library, then your usage, you might encounter some issues. When the truth starts, it takes many CPU and memory resources to compel the BPL program. If it runs on a server that lacks system resources, it might trigger a problem. BCC depends on kernel header packages, which you have to install on each target host. If you need an exported context in kernel, you have to manually take a copy and pass the type definition into the BPL code. Because BPL programs are compiled during runtime, many simple compilation errors can only be detected at runtime. This affects your development experience. By contrast, BPL-CUE has this advantage. When you implement BPL-CUE, you can directly use the live BPL library provided by kernel developers to develop BPL programs. Their development method is the same as writing ordinary C-user mode programs. When combination generates a small binary file, live BPL acts like a BPL program loader and relocates, loads and checks BPL programs. BPL developers only need to focus on the BPL program's correctness and performance. This approach minimizes overhead and removes huge developments, which makes the overall development process smoother. This is why we need BPL-CUE and why it's future of BPL. BPL-CUE only consists of four parts. BPL-TAP information, compiler, BPL loader, kernel. I will work you through run-by-run. BPL-TAP information, which allows to capture group for pieces of information about kernel and BPL program caps and code, enables all the other parts of BPL's COIE puzzle. What's exciting is that in the latest kernel version, file.11, kernel has also implemented support for model BTF. Compiler is still on, provides means for BPL program C code to express their intent and record relocation information. BPL loader ties BTFs from kernel and BPL program together to adjust compelled BPL code to specify kernel on target hosts. Kernel, while staying completely BPL-COIE agonistic, provides advanced BPL features to enable some of the more advanced standard services. That's your hello world with live BPL. As mentioned before, the BPL application is divided into two parts. The BPL program that needs to be loaded into the kernel as a control part. On the left part of the finger, it's very simple BPL tracing program. That is triggered when the right system core is initiated. It gets the precise ID that triggered the right request and prints it out. The part of the right is the standard process. Let's talk more about it. BPL application typically goes through the following dash. Open dash BPL object file is passed. BPL maps, BPL programs, and global variables are discovered, but not yet graded. After the BPL app is opened, it's possible to make any attachment adjustments, citing BPL programming tasks if necessary, pre-setting initial values for global values, and so on. Before all the entities are created and loaded, load path, load path, BPL maps are created, various relocations are resolved, BPL program problems are loaded into the kernel and are very feared. At this point, all the paths over BPL application can't and existed in kernel, where no BPL program is yet executed. After the load path, it's possible to set up initial BPL map state without racing with the BPL program called execution. Attachment fast. This is the fast at which BPL programs get attended to various BPL hope points, such as trispoints, K-probes, C-global hooks, network packet, processing pipeline, and so on. This is the fast at which BPL starts performing useful work and read or update BPL maps and global variables. Teardown fast. BPL programs and are detached and loaded from the kernel. BPL maps are destroyed and all the sources used by the BPL app are freed. Generated BPL security has corresponding functions to trigger each fast. Name open, create and open BPL application. Name load, instantiate loads and verifies BPL application parts. Name attach. Attach is all out to attach BPL programs. It's optional. You can have more control by using BPL APIs directly. Name destroy, detach all BPL for programs and freeze up all used resources. We replace the name with minimal, which looks like the picture on the right. In attention to these typical fasts, there are a few small tips about the fasts to share with you. The first is combine the open and load fasts. We mentioned earlier that some setup work can be done after the open fast, but if there are no such requirements, we can compile the load fasts and the open fasts together. The example in the finger comes from the live BPL project, which is used to measure the course or outluma. Selective attach. By default, BPL skeleton will automatically attach all BPL programs, but sometimes we don't want to do this. We'd rather selectively attach certain BPL programs based on common command line parameters, so we can attach minorly as shown in the finger. This is a tool from the live BPL tools project to detect the delay of the block layer. Selective load. Earlier, we mentioned that the program can be attached to minorly, but this has two obvious shortcomings. One is that we load a necessary BPL program. In addition, we can't use standard fasts as well, and the code looks unclear. So the community has a new API to control WISR to load the BPL program automatically. This is a tool from the live BPL project for starting the delay distribution of some WISRs in ESG4. If these two runs as an inversion with kernel and supports model BPL, then we choose to load the F-entry BPL program. Otherwise, we choose to load the key program, custom load and attach. Scannity is suitable for almost all scenarios, but there is a special case. First, events. In this case, instead of using links from struct named BPF, you need to define struct BPF links. The reason is that per event needs to be opened separately on each CPU. After this, open and attach BPF event by yourself. Finally, during the tear down phase, remember to destroy each link in the links and then destroy links. Multiply hundreds for the same event. Starting in version 0.2, label BPF supports multiply entry point BPF programs within the same executable and linkable format sessions. Therefore, you can attach multiply BPF programs to the same event, such as trace points or key roles. This was worrying about ERF session name clash. For details, see as BPF supports BPF to BPF roles. Now, you can naturally define multiply headers for an event action on the right. Before that, you need to define different events for the same event. Reading connoisseurs fields. You may be aware that at present, we have only seen level BPF API related and didn't feel that it has anything to do with the probability of BPF. Then, we will now introduce some programming content related to BPF CRE. Let's first look at how to achieve BPF probability in reading structure members. The first is BCCOA. BCC will conveniently rewrite task point to PID into a core to BPF probability, which is great, though sometimes might not work, depending on the complexity of an expression used with like BPF because it doesn't have BCC code related magic at its disposal. There are a few ways you can achieve the same result. If you are using... Recently, I did BPF program talent tracing BPF programs. Then, you have a smartness of BPF very fair on the outside, which now understands and tracks BPF tabs negatively and allows you to do follow pointers and read column memory directly and simply avoiding BPF for appropriate queries. So, you don't need a compiler rewriting magic to get the same noise and familiar snack list. You can also use BPF and BPF program talent tracing way. Clearing this functionality with BPF CRE to support portable, such as relocatable field rates, you have to include this code into built-in pre-serve exercise index compiler building. You can also use BPF program talent tracing and BPF survey. That's it. It will work as you expect. That will be portable between different kernel versions. But given the bleeding age, the consistency of BPF program talent tracing, you might not have the luxury of using it yet. So, you have to use BPF appropriate accessibility inside, non-CRE, not the behalf way. Now, with CRE and BPF, there are two ways to do this. One directly replacing BPF appropriate and BPF core rate. BPF core rate is a simple macro, which passes all the arguments directly to BPF appropriate. But it also makes a C-long record build or offset relocation for third argument by parsing it through built-in pre-serve exercise index. So, the luckiest example is actually just this under the hood. You may have a question. If the field has been removed from the structure, is there a way to deal with this situation? The answer is yes. You can use BPF core field access macro to do this. Here is an example. From kernel file 0.50, the number of words tracked has changed. So, we can use this to find if it existed. Another scenario that causes BPF capability issues is kernel API changes. For this case, BPF-CRE provides two comparable merge-trade solutions, like BPF-CRE, external kick-config values, and struct flavors. Like BPF-CRE, external is a simple idea. BPF program can define an external variable with a well-known name, such as kernel version, Linux kernel version, to extract on a running kernel version. All names that match is one of the kick-fig KIs. Such as config.hc. Here goes the value of hers. That kernel was built with. And BPF will do its magic to set everything up in such a way that your BPF program can use such external variables as any other global variable. These variables will help track values, matching the actual kernel your BPF program is executed in. Additional, BPF variable will track those variables as known constants and will be able to use them for advanced control flow analysis and that code is emergency. Here is an example of dealing with transparent interface compatibility. After introducing how to deal with BPF compatibility issues, let's take a look at how to pass control information to BPF programs. BCC's approach is achieved through string replacement. Because the BPF program is a string for BCC that can be modified at will. If BCC is not reused, the traditional achieve is to write configuration information into the MIBE. This MIBE search is not efficient. All those information writing into MIBE is static. The BPF error failure can't reorganize the situation, so some optimization can't be made. The solution to such a fundamentally complex use case is though using with only global data. It is site-wise by the control application before the BPF program is loaded into a kernel. From the BPF program site, this looks like a normal global variable exercise. There won't be any BPF map lookup overhead. Global variables are implemented as a directed memory exercise. Control application site, view site initial configuration values before BPF program is loaded. So, by the time BPF will get to the addition of the program, configuration variables will be well known on the read only. This will allow BPF to track them as known system and use as the most control flow analysis. So, here is an example. We can naturally pass some filtered conditions to the BPF program, such as the PID that we want to filter. Next, let's talk about data storage related context. Beginning in the kernel for 0.86, BPF hashing maps perform memory pre-election by default and introduce the BPF at no pre-alog flag. The motivation for doing so is to avoid BPF with a kick group with BPF dead logs. In the end, BPF had tried other solutions that in the end, pre-alogging as a map elements was the simplest solution and didn't affect the user space with full behavior. When full map pre-alogging is to memory expensive, the map makes the BPF at no pre-alog flag to keep old behavior. For details, see BPF map pre-alog. When the map size is not large, such as map entries is 256, this flag is not necessary. BPF at no pre-alog is slower. One of the advantages of BPF tools is that it's portable. So the MyCMM space required for the map may be different for different machines. In this case, you can define the map results by defining the size and resize it before load. For example, in BPF.c, define the map as this after the open-desk called BPF map resize. Not only can you use global variables to constrain my BPF program logic, you can use instead your maps to make your programs simpler and more efficient. Global variables can be analyzed. You just need to set global variables to be a fixed size or at least with the bounded maximum size if you don't want wasting some memory. Because a number of soft RQ tabs is fixed, you can define global errors to cell cones and histogram in soft RQ in BPF.c. Then you can trace the RAM directly in user space. If you want to know more details, you can read the following articles. The first two articles were written by BPF might learn, which is very valuable and tells a lot of knowledge related to principle. The last one was written by me, mainly related to actual compact. If you want to try it out, you can start with four in two projects. About us, PinCamp is a software service provider committed to delivering one stop with various great database solutions. KiteDB is an open source distributed new circle database for elastic skill and real-time analysis. That's all. Thank you.