 Hello, everyone. Welcome to open source submit 2023. Today I'm going to talk about lens kernel tracing using APBF. So let's start looking into what is APBF. So before that, let's me give you idea about what we're going to cover in this session is about maybe how it can be used for instrumentation kernel instrumentation. We'll look at some of the tools that are there that can be used for tracing them and external functionality. So what is APBF? It is a revolutionary kernel technology that allows developers to write custom code that can be loading the kernel dynamically and thereby helping to change the way the kernel behaves. It was originally created to speed up the filtering of network packets known as the BPF by this Berkeley packet filter. In 2014, Alexid redesigned the language of this BPF into what we currently know as extended BPF functionality. So basically, it has helped to turn the BPF into a more general purpose execution engine that can be used for a variety of things, like beyond packet filtering, it can be used for profiling debugging security as such. So this BPF technology is added into the kernel in 2014 and 3.18 kernel. With it comes the BPF system call to load the BPF program into the kernel and the support for various BPF programs, not just packet filtering but programs for tracing, profiling, and security. So that's how the BPF evolved to be what we call as extended BPF. So let's try to understand the overall components that are their part of the BPF as such. It is a flexible and efficient technology composed of instruction fed storage objects and help of functions and it is considered as a virtual machine running the instructions in an isolated environment. Since the BPF are execute programs are executed in the kernel, it is mandatory to ensure that the BPF programs are safe to run and cannot crash the system. For that we have this EPF verifier. EPF verifier in the kernel ensures that the program EPF program is safe to run that it cannot crash the kernel. It has to run to compilation that is what should not contain any loops that would run forever. The another component is the JIT compiler, which translates the generic bytecode of the program into the machine specific instruction set optimize the execution speed of the program. And there is another constant constants called as a maps that provides the ability to share the data that is collected between the kernel and the user space. And that can be done through various types of maps that is hash tables, arrays, string buffer, a stack trace as such. So why EPF is important for that. Let's understand the difference between the user space and the kernel space. The kernel is software layer between your applications and the hardware and the applications run in an unrevealed a layer called as user space, which cannot access the hardware directly. The application needs to make a request using the system call interface to request the kernel to act on its behalf. And when you say access the hardware it involves like reading writing into the storage device or sending or receiving packets onto the network through the network control or accessing memory and many other such activities. And if you want to see how the system call works, in that case you can make use of this SPS utility whenever application runs it would trace down the system calls that has been called through that application. So the main purpose of the Linux kernel is to abstract the hardware and the word or the virtual hardware and provide a consistent API as it forms system calls allowing the application to run and share those resources. And this is achieved through the various subsystems that is part of the kernel, which handles all this responsibility. So, like each system typically subsystem typically allows some level of configuration to account for different needs of the users. And if some particular behavior cannot be configured, then there is a need to make a change in the kernel. Whenever change has to be made in the kernel, it can be done either by adding the change into the kernel that is a native support or by adding the change dynamically with the help of kernel modules. So if the change has to be added in the kernel, then the changes to the kernel has to be made and then the developer has to convince the kernel community why that change is really important and it is needed as such. And it might take several years to get the feature into the new kernel versions. Another way is to dynamically implement that feature using the kernel modules. But it has its own risk like it can, the programs that return kernel module might not run in a safe way, which might lead to some kernel crashes and thereby halting the system as so. So, how can we get rid of this limitation as such. So with the help of a new option is available that allows for reprogramming the behavior of the kernel without really making changes to the kernel or by loading the kernel modules. And as far as safety is concerned, the API verifier ensures that the programs that are loaded are safe to run as such. So let's look at how the loading of the API program happens. The behavior programs are the event driven or even driven and they run whenever an application of kernel passes a certain hook point. And then there are multiple hook points that are defined as such. And they are at the system called levels of function entry or exit or current trace points network events and several other events inside the kernel. For example, like if a process wants to attach the attach the wants to what you say trace the system calls related to files files we open. In that case, the ebf program can be attached to the open system code. And it would be triggered when other application tries to open the file as such. The ebf program is attached to the exe system call. And when a binary application programs is executed through the exe system call, then that ebf program will be called. So this leads to one of the great friends of tracing tools using the app. Because why is that that it instantly gets the visibility over everything that's happening onto the system. And here in this presentation, we're going to look how the bf can be used for tracing a kernel behavior. So, to start with using the bbf tools, the bbf functionality, we can write the bbf programs, but in a lot of scenarios ebf is not used directly but they're used for the projects like CDM, DCC or bbf trace. So let's look at what are the tools they are the abstract they provide the abstraction on top of the the bbf framework and the app developers don't need to really go and write the bbf programs. But like if the developers want to write the bbf program, then there are different ways in which you can do is like you can write in Python or Lua or Sudo cgo language as such. Now the programs has to be compiled. That is been done by the LLVM or as a C line compiler that compiles the programs into a bbf bytecode. Now one of these bytecode has to be loaded into the kernel and executed. So there are a set of tools that is done to deal with this bbf programs that are called as frontends. And here we are basically going to look at bcc that is a bbf compiler collection and another is a bbf trace frontend. So these are part of the GitHub IO wiser, where the vcc provides a framework for that enables the users to write the Python programs with the bbf program embedded inside them. And whenever you run the Python program it generates the bytecode and loads it into the kernel. So basically when you are trying to develop some complex tools and then an application in that this bcc programming framework becomes handy as such. And another option is a bbf trace. bbf trace is a high-level tracing language and it can be useful for writing short scripts or one-liners scripts as such. And it uses LLVM as a back end to compile the scripts into bytecode and bcc for interfacing with the bbf subsystems loading the generated bytecode onto the kernel. So here these are the snapshots from the bcc tools that affect that there are different tools that are part of this bcc repo toolkit as such that gives you the information, tracing information about the various subsystems in the kernel as such like here if you look at this is the virtual file system, file system, wallet manager, blocklayer as well as as far as networking concern. You get the information on the socket level, at the TCP IP level, at the ethernet level or even at the device driver levels. Other subsystems like scheduler, memory manager and system for the interface and the application altogether. The information about the system can be connected with the help of these tools bcc tools and we're going to look at some of these tools to extract the information from the system. Okay. So let's see how this tracing can be done with the help of this bbf framework. So tracing gives us the visibility across the full software stack and allows to collect the data for profiling and debugging. And tracing can be used, can be used for debugging by the developers and troubleshooting by the administrators like whenever the system has is not behaving as expected or there is no or missing configuration or missing files as such in that case with the help of this bbf tools, the administrator can debug, can troubleshoot the system and as well as the developers can use for debugging as such. There are traditional tools that are available, like ftrace for SR, SRase and they provide the information about the system as such but it can make use of bbf framework to add more logic for analyzing the data that has been collected as such. Okay. So bbf tracing supports multiple sources of events to provide the visibility of the entire software stack. And here we are going to look at the instrumentation ways to look deeper into the system that is that works at the dynamic that provides the dynamic instrumentation as well as static instrumentation. So, let's look at what we get at the dynamic instrumentation is its ability to insert the instrumentation points into the live software and production and the instrumentation can be done at the kernel level as well as the user level. In the kernel, in the kernel, the instrumentation is provided in the form of cable functionality and in the user space with the, with the help of this upro functionality. And they can be used to instrument the starting or end of the kernel function or the user application functions. In the static instrumentation, the instrumentation points, they are already encoded in the software and maintained by the developers. And the static instrumentation can be done by making use of the trace points and in kernel and in user space application by using this user static and different trace points as such. And the static trace points in the kernel application kernel functions are the user space functions as such static instrumentation as they have been developed and put it in the software as they are, they are considered to be stable as such whereas in case of dynamic instrumentation the cable are unstable because it might change as probably kernel versions. So let's look at the kernel instrumentation with kprobs and trace points. So with the kprobs allows you to set the hooks in almost every kernel function as such with the minimum overhead as such and there are two ways to categories for this is one is a kprob, kprob, and another is a kprob as such. So kprob allows instrumenting the kernel function at the beginning or at some offset as such and that propping can be done by the VCC interface or BFF trace interface as such both the ways kprob as well as kuretum. As far as trace points are considered they are inserted in the kernel code and you can take a look at what are the trace points that are part of the compiled kernel by looking at this this is kernel tracing available events which gives you the number of events, the list of the events that are available on the system that has been compiled and booted up. On my system I have a 4 kernel version which has more than 1500 trace points defined. Okay, so what we are going to do is we are going to use some of the VCC tools to collect the information about the system. So this is like high level tools that are there they are basically written in Python, which embeds the BPF code, which goes and attaches to different places in the kernel so you're in this case first let's look at the open system called the tool that traces the open system calls which shows the process are attempting to open which files so you're in this case are the open system call has been traced by the BPF program. Similarly if you want to trace the new processes that are been created when the exact system call has been executed in that case exact smooth is the tool that can be used as such. And then we have another tool called a stat snoops which traces the stat system called stat system is basically provides the information about the files which are being accessed as such as like it will show you process which are attempted to read information about the files. So let's look at how what kind of information we get from this. Let's start with open snow. So we are trying to resolve the open system call interface and you're in this case what it does it prints it captures it captures the information, wherever the application tries to open the system or execute the open system called now you're in this case if you see the output here. The output has which process process ID then which is the process is trying to open the files and here is the name of the fight but complete part of the file so the IRQ balances the current current thread that does the work of balancing the IRQ work as such so in that case it goes and opens the data to interrupt handling as such. So you're in this case we see that the profile system has IRQ related. These are the numbers gives the interrupt number and in that case it goes and balances the interrupt, interrupt balancing it is doing as such. So from this we get to see that these are processes trying to open this files as such. So another tool that we will try to see is the exact. So when you say the the program is been attached to exact same system call aside and your view would see how all those process that calls. So to generate the load what you can do is will execute some. Another window and then you'll see that this has been captured by the BPF program as such so you're in this case it's printing the command is the PS command it is this is PID and these are the arguments that are being passed as such. LL is to LL is the alias for LS minus color with giving the option minus ALF this so that's what that's where we see this so this way using this different commands we can get to know what is the status of the system who is executing what what files they are been touching as such. So now that we wanted to look out in the stats. No pain. So stats no pain will tell you that which process is trying to read the particular file information about the file as such. So you're in this this PS command has generated the load the events that has the PS comma is going and opening the files, a lot of files here you see in the from directory as such as well as there are other housekeeping function that is going and opening the files. There's month file or home directory file as such so this gives you the status and the health of the system by running this various commands as such. Okay. So this is by simply making use of the BCBC tools as such to collect the information about the system. The way is like we can make use of the beer. This is to show you with the demo using the BPF trace functionality. So, the trace functionality basically is useful for writing one liners to get a great get the information from the system, it is like, you're in this example what you're going to do is perform the operation of tracing the files that are being open. Similar to the opens to pass such so basically that helps to look at the configuration files or log files as such and here with this interpretation what we're going to do is we are going to specify the K pro as the pro option that is the dynamic instrumentation and what you're going to instrument is the open system call this open loop says open and here in that case we just brought into what process and the argument one argument one of the open system call is the file length. So, what should we expect from this is let's do one thing is so here I just will just take the command from this file and execute it. So, you expect the open system call to show the output that is like here in this case we're attaching the BPF program to the open system call and you'll see that system is showing the process which are trying to open the different files to the system. And again here we see that IRQ balance is running in the background that is going and accessing opening the files in the proc IRQ directory as such. So this is with the help of the BPF interface. Similarly, we can make use of the year we have used the dynamic trace point here we can use the static trace point that is the trace point as such and in this example within it make use of the read system call enter and three point at the read system call and particularly going to read the read read for the SSD process as such. Okay. And this argument is the, the count gives you the number of bytes that has been read as such. So this will trace down all the read system call current corresponding to the SSH process, SSH process as such. Let me do something when we do some activity on the SSH session, you'll see that there is a reads happening and it tells the how many bytes of data has been read as such. Okay, so this way this is another way through the BPF crest to extract the information from the various system call enterprise. This system call is not the only way of putting out the hooks there are other places as well to do so. Let's look at another example of map as a map is the data structure that is used to store and summarize the data such that can be done. For this example what you're going to do is a count is called count by the process like the process that are running how money system calls they have made as such and that information is stored in the map as such that in represent by the special variable that is at the rate a policy map, and it stores and summarizes the data, and the count is the function of the map that counts the number of time it has been called as such. So what is been called is the system call that is a system calls at the entry point as such. And it is been saved by the calm that is the process name as such. Let's do that. If you let run this for a while and then when we do control C, it will, it shows the, it shows the processes that are that has executed the system calls and it shows the count how many system calls has been executed so far as such. So if you look at SLD has executed 18 system calls, there has been BBF crisis that has executed 71, as well as other crime has executed seven system calls as such. So this gives you the real idea like how what is the law that has been generated by the processes. Okay, so now let's look at some other examples for getting the information at the process, process level as such. So you're in this example, what we're going to do is count the process level events that are happening and to get that information at the process level, particularly we're going to use the schedule, schedule pro category. And that is basically collects the information or information about the process even such as for exact or context switching that is happening so that gives us in general information about what is happening at the background when there are multiple processes that are been running as such. What we will do is we'll just generate some load so that it can, and here in this case we're giving the interval of five seconds in five seconds, it will capture some of the events that are happening on to the systems. Okay, so here in this case it is came out from that BBF trace and printed out the information of the schedule, a scheduler events that has occurred, particularly to the fourth. Forking of the process context switching exact thing the process now you're in this window what we have done is we have executed the LS common. So that is one of the event that has triggered this. The fourth has been there, sorry, fourth is there then exact. So for when you see LS command it has to be it has to work across time process and execute the LS common assets so all that events has been captured by this BBF trace program where we are telling this schedule probe to capture all those events related to scheduler as such. Okay, so this helps us to get the information of what is happening currently in that time interval of five seconds. Now let's look at some of the examples for getting the information about the network traffic as such. In this case, first we'll look at the TCP connect tool that is a BCC tool that traces the TCP connection traffic and at the back end it uses the connection system call the BP program is connected to the connection system call so whenever the connect happens whenever there is a TCP tap and generated through the connect system call it will be trapped the capture this. So, what we're going to do is to see the connect. Okay, and in another window what we would do is just send a call request to Google.com. It generates the TCP connection load as such and here you can see that it has the source IP destination IP and the port on which it is the traffic has been going on as such. Okay, so this way we can get information of all the live connection that are happening and also useful for troubleshooting to say what connections are initiated by the local server. And at the back end it makes use of the kernel functions for tracing what is TCP before connect and TCP with six connect. Similarly, this function, the function tools can be used for debugging into the kernel functionality as such, suppose like here in this case what we'll do is we'll take an example of ICMP workload. And how this function what values have been part of when this ICMP what when this being utility a generating the ICMP traffic as such. Okay, so here in this case what we'll do is we'll use make use of this one count as the tool to count the basically does the function how many times it executes and also prints a function name as such happening as and then we will debug this functions in much greater details as such. Okay, so what we will do is possible. And then another way to generate the ICMP traffic. Let me do that. Let's do the thing to Google server with IP address and then just do just do the phone count for the functions corresponding to ICMP. So in this case like let me put this here you see that the functions that are getting fall when the ICMP traffic is generated as one is ICMP receive and analyze ICMP of these are the two functions that are explicitly been getting executed when the ping command has been ping tool has been executed as such. So let's we can do as we can debug some more into this functions as such using the trace command as such. So what we would do is use the trace command. The trace is another business you choose that close the function you specify and display trace messages. And here in this case you can use like printf kind of function argument parsing to print the values as such so here in this case what we'll do we'll look at this ICMP out function and what what parameters it takes as such. So you're in this case it is taking two parameters that is a network structure which represent the network device and the type type is basically used to is used for identifying the ICMP type as such packet type as such so you're in this case. It would be either echo or echo request or echo reply as such so we can figure out that by by passing the those parameter to the trace command so you're in this case what we will we'll just trying to print that type type the second parameter as such from the ICMP out. And then generate the traffic. I see what values you get so you're in this case you say that you it prints the PID number it also brings the timing information and then they come on that is it is coming from the thing this is function and it is passing it the value. That is it as we have seen here in this the echo in the out ICMP out is the function that sends the echo or request packet to the destination server as such. And so this way you can go and debug the values that are passed to the into the functions as such. Another way is like we can also pass the we can also look at into the greater into the other data structures as such like coming to this function like if you want to look into the. Into the more into this network data structure as such and then figure out some elements that are there now you're in this case one of the component of this network data structure the index as such. It tells the network interface as such so that we can be, we can pass those parameters and collect as such so let's see for this example I have over here. Okay so you're in this case you tell that we have be specify the header file path, where the definition of this function ICMP out count so here we give the complete declaration. And then we are printing the index that is the pointer into the next structure as such and the type as such and then again that generate the load and see what we get so you're in this case like stop here. What we are getting is the index that correspond to the network interface as such and then the type that is it is sending the echo request packet onto the destination server so this way that the tool can be used for debugging the current functionality whenever there is some panic or there is some issue or a crash has happened through using the different groups and the hope points as such. So this is to give an idea of networking interfaces can be pressed. So, another major area of the kernel is the memory management. So, a lot of times that we suspect that there is some, some application leaking the memory as such whenever a particular kind of workload is happening, and we want to get into understanding who is doing that. So in that case, we can make use of this memory leak tool becomes handy what it does is it traces and matches memory allocations and your location and collects the stack stack traces for the allocations as such. And it prints the summary that which call stack perform allocation has happened and which and those who have not yet read the memory as such. So that allocation memory leaking instrumentation can be done either at the process level, or at the whole system level as if it is done at the process level then the instrumentation will be done at the leapsie, tracing the memory allocation functions a law, a law, free law or free SI. If it is done at the system level then the instrumentation is done as came along or cash allocation or page allocation, APS as such. So let's do, let's try to run that to and see what does it gives what kind of information it gives. So if you, sorry, so here in this case it is printing it shows this there is the allocation done from the stack that is coming on as such. And this helps to get you the information which kind of applications are doing how much allocations and have not yet freed the memory as such. So this will potentially help to understand the memory leaks and resolve the issues as so with this we have demonstrated the use of different tools and be a trace tools to get the information and do debugging at the process level. At the memory networking and the file system layer as such. So this is how the BPR functionality can be used extensively for tracing the terminal functionality as such, and this just gives us the terminology that are being used extensively in the BPR, BPR terminology. And my session on the BPR using for corner tracing. Thank you.