 Hello everyone. Thank you for the opportunity to present at the open source summit. My name is Inhyuk Zhang and I am the presenter of the zero configuration runtime software component detection session. I have researched network and system security for over 10 years. I have recently been interested in technologies for improving cloud and container security. And I am currently studying security monitoring using eBPF. After hearing about the low 4j vulnerability in December last year, I thought that by using eBPF, I might be able to detect if low 4j is running on a system. Today, I'm going to present the early results of realizing this idea. Here's what we are trying to figure out in this presentation. First, learn how to check that the software of interest is running on your Linux system. Such software could include low 4j, spring 4j, etc. Check the feasibility of detecting runtime software component with eBPF. Try to detect the name and version of the software of interest running in the container through the implemented program using eBPF. Before implementing with eBPF, let's see what methods can be used to detect software components. The first thing that comes to mind in a low 4j instance is that the low 4j scanner. Among them, a file scanner is one of the most reliable ways to determine if certain software is present on your system. However, it takes a lot of time and effort to scan the entire file system. Scanning a live system sometimes requires a slow scan because resource usage must be limited. Also, it is not known whether the software present on the system is actually running. Next comes the software configuration analysis and Asperm generation tools. They usually check the software composition before its actual execution. Specifically, pre-deployment analysis is performed in a CI CD pipeline. This is a great way to inspect software that is distributed with a well-configured div of system. However, some software may run without going through this process. You can take advantage of the powerful features of application tracing. As it analyzes the running program, it reflects the state of the actual running system. These usually require prior settings, such as running options. Now let's talk about runtime software comprehensive detection using EVPF. EVPF allows code to be injected and executed at certain points in the corner, such as system calls or corner functions are executed. So, the first things that come to mind for monitoring jar execution or system calls and functions that loads jar files. Since jar execution is accompanied by the process of opening and reading files, when the jar process reads them, relevant information can be obtained from the corner. At first, I thought that it was possible to get the file name just by tracking the file open so that I could get information about the software being used, but that was not enough. I needed a way to further analyze jars being run. Looking at the idea sketch I wrote for the first time in December of last year, I thought that when the read function returns, it can be analyzed by checking the file descriptor, buffer pointer, and read size. If you look at the memo written a few days later, you can see that the data analyzed according to the past result is stored in the EVPF map and the data to be transferred to the user space is sketched. This was sketched after checking the structure of the jar file. Let's look at the structure of the jar file then. A jar file is a Java archive and it is a format that distributes Java class files, metadata, and related resources in one file. Since jar files are compressed in zip format, you must first be able to parse the zip file format. Analyzing a zip file in the corner may seem absurd at first glance, but you can get information by observing the analysis process rather than directly analyzing it with EVPF. Although EVPF does not directly read and parse the data, analyzing the data read by the application for parsing allows us to obtain information at the corner. There are three main types of data in the zip file format, end of central directory, very quote, central directory, and local file header. As can be seen in the figure, a local file header exists for each file entry and the central directory is located at the back of the zip file. The eCDR is located at the end of the zip file and contains the starting location and size of central directory, so you can read the eCDR and then read the central directory. As shown in the figure, eCDR starts with the full byte signature and includes information such as the number of entries in the zip file, the offset of the central directory, and the size of the central directory. After reading the eCDR, it starts reading the cdr, it contains the central directory header and file name for each file entry. In central directory header, you can get the local file header starting position where each file entry starts. As shown in the figure, the header of the cdr also starts with the full byte signature and contains detailed information about each file entry, including the file name. A local file header is a header that is prefixed to actual file data and includes file size, file data size, and compression information. The local file header is followed by the actual file data. As shown in the figure, it starts with the full byte signature and contains less information than the central directory header. If you compose a program to process based on the JAR file structure, you can divide it into the main part, the directory traces, cases read, and subcomponents that processes each header, file name, and payload. At this time, subprograms are written with eEBPF tag calls because eBPF programs have limitations on stack size and the number of instructions. Separate programs handle complex tasks. A tag call is one of the methods provided by eBPF and unlike a normal function, it does not return to the calling location. The following figure is a sketch of the data to be sent to the user space based on the results obtained by tracing JAR file processing in the corner and the results to be analyzed and extracted from user space. Process information, JAR file information, extracted payload information, manifest information, etc. are expressed. Suddenly, manifest appears here. A manifest mf contains metadata about a JAR file. This includes package related information, main class information for execution, etc. and appears in the specified location. We read the package information from the manifest file to start extracting component information. Of course, there are JAR files without package information in the manifest mf, so you need to use other information additionally. But reading this file and extracting package information is taught. The following are the contents of manifest mf in lofaj core. You can see the items where you can check the title and version. We prefer to use a specification version which has a format form. Based on what we've talked about so far, here's a summary of the JAR one-times software component detection method. We detect the software component in versions by tracing corner functions called when JAR files are executed. When the JAR is executed on the system, the header and payload are read according to the structure of the JAR file. When reading the header, find the offset of the manifest mf file containing information about the JAR. When the payload is read, the payload of the metadata is extracted and delivered to the user space. Inflate the payload delivered to the user space and extract the title and version information. Here's the sketch of this stuff. When Java reads a JAR file from the file system and performs processing, opt-in processing data through tracing using eBPF. Resvering the previous sketch looks like this. After acquiring processing data by tracking the reading of end-of-central directory records, central directory records, local file headers and payloads, information for analysis is stored in an internal map and PuffPuffer or RingPuffer is used to send information to the user program. At this time, the detected payload of the manifest mf is delivered in a compressed state. The user program decompresses the compressed manifest mf and analyzes the package information contained in the JAR. In the implementation of the ePBF program, each header of the JAR is obtained using the following structures. The data structure passed to the user program is as follows. There is a structure that delivers information of a process and JAR files to the user space and a structure that delivers the extracted payload. Payloads are transmitted iteratively in chunks based on their size. The main parts handled by the K return proof to keep track of cases read or invokes a tail call to process according to the four-byte signature. At this time, the program that processes the central directory reads all entries repetitively. If the central directory size is large, the central directory data cannot be read in one read and is truncated. In this case, a tail call program that recovers the truncated portion is called. It also calls the tail call program that sends the file name and payload to the user program according to the condition. Also, information internally stored in the map is used for JAR person. In the user space program, we set the program array for tail calls like this. We use compressed flight to inflate the deflated compressed payload without a header. Using the implemented result, we searched for components related to low for share and spring for share from public images of the co-op. One experiment was performed by 2500 public images with a default tag expected to be the latest. For reference, this experiment was conducted in May 2022. We will mainly look for Apache low for J core versions below 2.16 where the low for share vulnerability exists. We look for a spring core 5.3 to 5.3.17 or 5.2.0 to 5.2.19 and all your versions. If these versions used in conjunction with JDK9 or later, it may be followable, so I will mainly look for that version. The runtime software component detection tool implemented with EBPF. The results shown in the figure below can be obtained. Process-based information, container information, detected JAR file information, detected component title, component version, etc. are displayed. I searched low for J core and found that some images contain lower versions of low for J. Recent images are automatically scanned by local hub to deal with low for J shell. I will talk about the analysis of this result later on and how it happened. 5.3.10, 5.3.11, 5.3.14 and 5.2.5 packages related to spring for share were also found. Since it was a relatively recent vulnerability at the time of the search, it seems that it has not been punched yet. One case where a lower version of low for J was detected was when low for J 2.16.0 was included. I knew that the low for J version without the vulnerability was from 2.17.1. But there was low for shell CVE not detected batch on this image, which was strange. This happens because the batch means that there is no low for shell, which is 4.4.2.8. Not that there are no low for J related following abilities. In the case of this image, it is displayed as not detected even though low for J core 2.16.0 was included. 2.16.0 is a repatched version because 2.15, which was patched immediately after 4.4.2.2.8 was released, was incomplete. This version has additional following abilities such as 4.5.1.0.5 and 4.4.8.3.2, but not the original low for shell. The next thing is about the latest tag. For one image using a very low version of low for J, the image with the latest tag was pushed 7 months ago. But there was a main image pushed 3 days ago. This repository simply does not tag the latest version as the latest, but another one. If you google it, there are stories about the latest tag from 2015. Latest is just a tag. It may not be up to date unless the image publisher sets it as a tag. However, since it is used as the default tag for Docker pool, it is still misleading from the user's point of view. It seems like you should always check tags to avoid unintentionally using a follow-up version of an image. Another case is for deprecated projects. This image with over 10 million pools is a project that ended in December 2021. Fortunately, the last version responded to low for shell by removing the J&DI lookup classes, but failures for abilities still exist. This concludes the presentation on the detection of runtime software components using EVP-F. Thank you for your attention.