 Hi everyone. Today we will talk about chasing on page table in this session. I'm in Yixiang Town, a Linux daily user. And my co-speaker Chien-Ling will give a lecture afterward. So in this session, we will first introduce what the page table is and how it works. And then talk about current chasing tools and the framework in Linux kernel like K-POP, TracePoint, F-Trace, and so on. Next, introduce our tracing implementation, the virtual memory profiling. And finally, we will look on a recent patch on page table. So each process has its virtual memory area in Linux kernel. The process control block is represented as test-starts. And its memory layout is shown as the image below. The mm-starts pointer in test-starts indicated the virtual memory area of process. And there is a vm-area-starts pointer in mm-starts. Describe the memory mapping information. Look the right match. As we can see, there are pgd and mmap in mm-starts. The pgd is the top-level page table. And the mmap pointer referred to is its virtual memory area, which can be any session like stack, heap, bsx, or data in the image as the button. First, let's look at the image and website. We can see that each process has its virtual memory area. And the memory management unit, MMU, helps to transfer the memory from virtual edges to physical edges. We can regard physical memory as a set of many pages. A single page is co-framed, and each of them has the same size. And the page table is used for managing these pages, such as pgd, the top-level page table we have mentioned in the previous page. So each page table level contains the beginning of next-level page table, just like this image. Since the fast green memory size and the emergence of 64-base CPU architecture, page table needs more level to assess more edge space. The full-level page table first appears in Linux kernel 2.6.11, and kernel version 4.11 Rc2 merge the five-level page table, and both of them are indicative of where the computer industry is going. So how the physical page is found by virtual edges in Linux kernel? First, it will call pgd offset function to gather pgd entry, and calculate pgd entry by list, pgd offset function, and so on. Finally, it arrives pt entry, and do the same thing to get a physical page. It's a structural page by taking a little offset. It is not worthy that for a page with bigger size is called huge page. It can be found in PND or PUD. When a page table entry in aspect says Linux kernel can be represented as below, even though it is often just unscientific. It defines a task for two reasons. The first is for type protection so that they will not be used inappropriately, and the second is for features like PAE on the SAT6. An additional for base is used for addressing more than 4 gigabits of memory. The base shown on the left side are SAT6 Linux kernel defined page table entry protection base, which describes the properties of current page table entry. For example, if a page table entry with page base RW, and it says on zero, but we try to write on this entry. It will cause a patch fault. When the patch fault happens, the function called trace is below. Continuously sample at the previous page. If we try to write a page which is not writable in SAT6 architecture, it first called sqlpageFault function, and so on. The findVNA function will find out the memory address where the patch fault codes, and call doWPage function to handle the patch fault error. After talking about page table concepts, it's time to talk about tracing tools. The image below shows the mainstream Linux trace tools. The most common tracing tools are k-pop and tracePoint, which is for tracing Linux kernel dynamically and statically respectively. And both have traced an ABB are based on this. In kernel, there is already, it says a dump page table file under architecture s86-mm, like a right side. And this file, if you turn on the corresponding config, you will see this file in, you will see this file in the f side, like current kernel, current user, if I in kernel under the sys kernel debug page table directory. And we can get some page table informations through this way, like cat, current, kernel. And there is, there is some page table information. This after all is, this after all is one way of doing it. The function tracer ftrace can be used to find out why it's going on inside the kernel. It can be used for debugging or analyzing latencies and performance issues that take place outside of user space. The ftrace framework read the control informations from the trace file system, and modify the kernel dynamically to enable or disable a specific trace event. All the trace information will be print, will be print in the kernel room buffer and can be read from trace file system in user space. It is easy to use because it only needs some simple command line tools like cat and echo to communicate, to communicate with debug file system. So you can use it in some everyday system, even if it only has a busy bus. And for convenience usage, it has a command line front-end tools, and trace command. Mac use, which man use ftrace easily and more user friendly. The ebpf is an internal virtual machine, which can be used for tracing kernel event pools. There are multiple different front-ends for ebpf to generate a ebpf by call, such as C and Perf, BCC, ebpf trace, or PLY. After the front-end generate a ebpf by call, the ebpf file file will check safety of the call and then pass to the ebpf virtual machine and load it into kernel. The user space can use ebpf map or Perf event to get a real-time tracing information from the ebpf. Hi, my name is Chen Lin. I'm going to talk about the trace tool that we propose, the virtual memory profiling. A difference between the previous trace tool is that the virtual memory profiling can customize the data you want to store and the prop points you want to insert. It's just like the trace point is when you are using it. And the major feature we add is that the trace can store the sequential event and immediately interact with the data we record. So we can deal with the information and make a compare to each other to decide what's next target we are interested at. The virtual memory profiling will combine the event with the process by making a pair of the test charger which is at the point of virtual memory events group to the test charger and we use the filter of the event trace to get the process that we want to test on. To add the event in our trace tool, you need to first, to add the event we need to add the event in our trace tool. We need to define our own data structure with the structure VMP event and with the VMP prefix of the illuminance. Afterward, it needs to also define our own record function and the trace points to report the information to the user space. Each sequential event will store at one of the index of the sequential event. The array of the VMP event which is the member of the VMP events group and the index specific is using the ticket which is the atomic type. So we can promise that there's no other event will store in the same index of the sequential event. And there still have the generic event we can use for store the global data and we want to save. Here is a sample. First we define our own illuminance the VMP type patch table. Then we declare the event which the name is the pg table and the sequential event size. Moreover, we define the prototype of the record function and the argument name of the record function. Then we need to define our own data structure with the VMP prefix. So we define the two data structures. The first is to store the generic data. Like we are interested in how many pte locked and the PMD and the MMP variable locked. None of this we also interested in some function like the pte gets many which is the function that we will discuss later. And the pte puts many functions or the gets unless zero functions or either the free user pte table functions. And for the sequential event we define the VMP pg table structure which will store the information for each we call the record function like the pg table bytes the paint VM or the swap or either the copyrights page. And since we can customize our record function so we can define our patch table worker to work through all the patch tables and collect all the data we need. We want to record. So here we have defined all the information that we are interested in like the PMD or PUD or P4D information. And when we are using the virtual memory profiling we need to set the inter and record and it and it exists function to the corresponds place. So here we use the debug address to control the internet and it exists to and since the record function to the function that we are interested like for example with this we mentioned earlier that here is a function that we can record the number of we call to that. So the function we defined will be like this. The inter function we will first initialize the data and the event we will store it. And as here we've just first records the information we want as the initialize. And as the exists function we will record the last event then we will report to the user space by the print information the printkid or the transport interface. And we also provide the macro for each VMP event to travel through this or the sequential events. And since the record function will be recursive report by the itself. So we use the static variable to prevent the multiple time record the recursive code so that we can just only record once at a time. And here is the information we can get it. And here is we record the normal kernel the default kernel that the page table statements will be like. As you can see as you can see we insert our trace tool successfully to the PID 777 PID. And the last task name is the test the sequential events we the sequential event size we defined is 496. So and for here we can see it's that when we do the exists we can see that the PT log called a lot and the PMD MAP and the page table log also called a lot of time. And here is the trespass here is the event trespass we can see. And here is the information that the event trespass we can see. So we just simplify here is that we just record two events in our virtual memory profile the entry and the exist. You can see that the page table bytes is smaller than the X when the test tool exists. Here is another case study that is from other developers page. This page is from they discover that the some malloc library will usually allocates them on of the virtual memory. And that does not do the MAP to loss we virtual memory. They will use the M advice to free the physical memory but that function will not free the page table memories when they use it. So in that page series they add the PT reference count to the structure of the page table to test how many user of the PT page table. So and when the reference count is equal to zero the PT table will free it will be free. And here is the information that we trespass the patch. As you can see the events the last events the patch table bytes is less than the privacy we trespass on the normal kernels the default kernels. It have the benefits that it will free the patch table bytes but as you can see the reference the attach of the reference counts of the PT tables is a lot and it will have some overhead to the auto operation which is the type of reference counts. So it have some trade-off when you are using this patch series as free the unused patch table. So it's my beam. So here is our how and here is how we use our trust tools. Thanks.