 Welcome to the 12 o'clock talk on all roads lead to JKE. Without any ado, Billy, are you starting off? All right. Hi, I'm Billy, and thanks for coming. Here is our title. All routes to the GKE Google Google Nest Engine's host. Host referred to the Google GKE node. Four ways to escape. We will use four different color vulnerability, including one day and the other day we found to escape from the container to take over the node in GKE. Today's presentation will be presented by me, Billy and my partner, Landhand. Before going into the topic, let's introduce ourselves. My name is Billy, a security researcher in a star lab. My partner, Landhand, also works in a star lab. We usually work together to do research on the nearest kernel and hypervisor, like virtual bus, QEMU, parallel for bar bounty and exploitation. We successfully bound you bound to in the Pound2Own 2021 and 2022. And virtual bus in the Pound2Own 2022. We also identify the virus vulnerability in land. Like CVE-2021, 3491, it is a nearest kernel property escalation vulnerability in the IOU Rune subsystem. We also compile with CVE-2021, 2221, with CVE-2021, 2250, we achieve VN escape from the virtual bus. We also achieve the parallel VN escape from the CVE-2021, 3454. This CVE show, we have some go in the nearest kernel and hypervisor. Let's start today's outline. We will give you some background information. Introduce a bit about what is Google Container Optimized OS and explain KCTF VRB Challenger, a Google bar bounty programing focused on the container escape exploitation. And then we will start show our detail on our full submission and the exploitation we use to transfer a limited privilege to escape from the container. And finally, do the conclusion. Google Container Optimized OS basically is a nearest OS but optimized for running Docker container, which is default node OS in a Google Kubernetes engine. No package manager unable to install third party kernel module. We only lose file system. Every time when you boot, you always verify file system creation to make sure the file system is not a mail phone or a bed door. Usually, the nearest kernel configs are stored in the ETC folder. In the Google Container Optimized OS, this folder is reliable but still less, which means every time you boot, it will start from a clean slate. And compiler is set up at runtime. For attack, install a persistent bed door is hard. If we want to escape from the container, there are some limits here. There are some obstacles when we explore attack surface and use some common exploit techniques in the container. It is a long unpredicted user to use EPVF and not implement the user for FD system call. It is a long to access internal networking, so we are unable to access Kubernetes API server. Also, limit on the device node access. Unable to spin TTY structure and unable to use fields. OK, there are some attack surface we can use in the container. First, we can create a new user nest map to explore more attack surface. Like a seed group, a nearest kernel feature a limited account for an isolated resource usage. Or the file system content functionality, or package socket, or traffic control configuration. In IOU loop, it's a nearest kernel system call interface for a synchronous IOU operation. And there are some CPU in this subsystem before. And it is reachable in a container within this great attack surface. Let's see what KCTF VRP environment architecture look like. Basically, it is a GKE, clouds have multiple nodes. A node you can see is a VN. And each node has multiple parts. A part you can see is a container. In a part, you will launch an SQL challenger. KCTF have two challenger. They are in different parts. Your goal is to get a two flag. One flag is called KCTF flag. It's placed on the same part when you connect into the challenger. And this challenger will give you an interactive feature. And another flag is called FORGEAN flag. It's placed on the other part. The nodes OS and the Kubernetes engine are upgrade automatically. But there is a time gap between the built-in fix and upgrade stream. So it's allowed to use one day modern built-in. And we introduced before, all nodes are running on the container optimized OS, popular network SS only. So you can download the exploit from the internet and launch. There are two attack scenarios. One is breaking out of the SQL same part and get a flag on the same part. Another attack scenario is to break the isolation that Kubernetes provides and access the flag from the other challenger. But with the color vulnerability, we can directly escape privilege and take over the whole node. So you can get an old flag from a single kernel vulnerability. And next, let's show our first submission. And let's show the first two submissions. Hello, everyone. I'm Ramdan. I will talk about our first submission. Our first submission is using CVE-2020-1.4-1.5.4. This bug is used to exploit KC-Tip container. This bug is used after free bug in C-group-1 parsed-barum function in C-group kernel subsystem. So basically, this bug is caused by the egg object in this function. There are source keys that always need value that has string type. But before this page file, there's no check if the value is string or not. So we can pass a value with another type. So using the FSConfig syscall, we pass a value with the file type. So we pass a file descriptor as a value. Then param string will contain the file struct address. Then it will store in fc-source. We can reach this bug by calling FSConfig syscall, but by set the key as source. And we set the value with the file descriptor using FSConfig setfdate. So if we close the file descriptor, the put FSConfig will call and it will free the fc-source that contains the file struct. But the file descriptor that associated with the freed file struct still exists in the process. So we have used after free in the file struct object. And so using this bug, we can create a bunch of fc-context pointing to the same struct file. So there are multiple fc-source pointing to the same struct file. So if we close one of them, we have used after free. And we can close one another to have double free. We can free another fc-context that associated with it. We can perform double free because we have the FS context still pointed to the freed struct file. So how to exploit this bug? We need to build our exploit strategy. In use after free scenario, usually we override some object with another type of object. But we have another problem to exploit this bug because the first problem is the file struct is allocated in our slab cache. So we cannot override with another type of object in a common way. And the second problem, what type of object that we can use to extend the primitive of this bug? So I'll present you the solution of the first problem. To override some object with another type of object that reside in different slab cache, we use the technique called cross-catch technique. This is the known technique to exploit use after free in Linux kernel. So there's two types of cross-catch technique. The first is cross-catch use after free. And the second one is cross-catch help overflow. So in cross-catch use after free, first, we make slab-pack free to body allocator. So first, we fill the whole slab-pack with some object. So for example, there's slab of object A. So we allocate the all of object A to fill the whole slab-pack. We free, then we free them all. So the slab-pack will go to the body or pack allocator. Then we spray another type of object, and it will coming from the pack allocator. And now the old free slab-catch of object A owned by object X slab-catch. So using this technique, we can solve the first problem. But I will present to you the another cross-catch that called the cross-catch heap overflow. In cross-catch heap overflow, we make target slab-catch adjacent next to another slab-catch the object that we want. So for example, in the right-side image, we can override the object A from the object X. But they are into different slab-catch. But it requires the sprayable object for both the object X and object A. But don't expect it will be reliable most of the time, because we also need to trigger the hop-overflow bar multiple times. And we can also make it more reliable if we have the capability to gain some information if we already override the object or not. So I already present to you the first problem is that we can actually to override the file struct in their owned slab-catch with another object using the cross-catch. And for the second problem, what type of object that we can use to extend limited primitive? So for the second problem, I assume some of you already know this is the MSG MSG object and MSG MSG Sage. This is the crown object for exploiting Linux kernel bugs. For the future, I will refer the MSG as MSG object and MSG underscore MSG Sage as MSG Sage object. So this is the properties of MSG object. So this is the object that I already known for exploiting Linux kernel bugs because of especially the heap bugs because of its flexibility of its size. And this is the spreadable object. And there are the second object called the MSG Sage. It's used to store extra message. If the total of the message followed by the header with the size of hex 33, so if the total object of the message is more than 4K, the MSG Sage object will be used and it pointed by the next pointer of the MSG object. So this is the MSG object is a powerful object that can be used for many things. For example, arbitrary write, arbitrary free, arbitrary read. We will go deeper with some of this. And it's accessible via MSG gate for allocating MSG queue and the MSG send function to insert the message to the queue and MSG to retrieve the message from the queue. To allocate the MSG, we can using the MSG gate. This is the indirect site image. There is a definition of some of the function interface that we're going to use. So as you can see, there are the MSG parameter in the function interface, but it's actually defined as the MSG buffstack. And the MSG type is used for selection process when we retrieve a message from the queue. And the MSG flag, there are some useful flags that we can use. For example, the MSG copy, this flag is we can use if we want to retrieve the message from the queue without deleting them from the queue. So there's the image that shows how the message link to the queue. For example, we allocated the message queue. And then we send the message with the M-type 1 and with the 4K bytes of message. We send to the queue and we have the first message insert to the MSG queue. And there are the MSG sets because of the total of accounted the size of MSG object with the header is more than 4K. And if we try to add another MSG object with the M-type 2 and with the C of hex 200, it will be linked to the queue. And if we want to retrieve the second MSG, we can use the MSG exit with the M-type 1. It will show the message that doesn't have M-type 1. So the second message will retrieve and print to the terminal. So I'm going to explain the tricks that we can use using the MSG object. So suppose using the double free bug we have before in our first submission CVE, we can convert this to have use after free on MSG. And after we have use after free on MSG, we override them using the MSG search object. So the first trick is we can use the MSG object is have out of bounds read. We can override the MTS. MTS is message take size. If we override this to large value, we can get the address link from the out of bounds buffer via MSG chief. For the second trick is we can do arbitrary read. We can do this by controlling the next pointer to the address of that we want to read. Because when we retrieve the message and we set the message text size to the 4k, it will search to the next pointer because the kernel assumes that this message is have more than 4k size. It will copy from the next pointer to the user when we do MSG chief call. And so the third trick is we can do by using this MSG object is arbitrary read. We can do this if we have some specific kernel configuration to get arbitrary read. We can see if we retrieve the message, it will call do MSG retrieve function. And in the end, we have called to a free MSG. And then it will reach the security MSG MSG free. As you can see, it will call the key free on MSG security. And if we can control the security field, we have arbitrary free. But before we reach the free MSG, we encounter the list del function and need to take care of next and previous value. We actually can control them using the MSG search object. But the MSG search object cannot control the first 8 bytes. The first 8 bytes, if we reach the list del using the MSG search, the kernel will crash. Because we cannot control the next pointer in the M list. But we can bypass them if the kernel have the config the bucket list enabled. So if we have this config enabled, it will call the list del entry valid. So we can easily make it to retune false and skip the list del. By set the proof value with the list poison 2. This is the sum constant defined in the kernel source. And fortunately, KCTF kernel config have that field enabled. So we can use the security and the config the bucket list enabled to have arbitrary free using the MSG object. So to wrap our three tricks, we can combine this to the kernel RIP control. So for arbitrary free, we can free the pipe buffer object in Camalog 1K cage. This free buffer, we can allocate using the pipes. So we use our first two tricks to do out of bounds RIP and arbitrary RIP to leak the address of pipe buffer in the heap. And we read the ops pointer in the pipe buffer to get the kernel text address. And by closing pipes and controlling the ops field, we can control the kernel RIP pointer. So we go back to our first submission when we have use after free and instruct file. And we develop our exploit strategy. So with the first problem and the solution is we can override the struct file with another type of object. We override them using the MSG object, using cross-catch technique. And we free one of fc contacts that still pointed to the MSG object. So we have use after free on the MSG object. So we can reallocate them using the MSG search object. So we can fake the MSG object using the MSG search object. Then using this condition, we can perform our tricks before out of bounds RIP and arbitrary RIP. So first, we need to leak the address of pipe buffer in Kamalok 1k. So we need another of MSG object that adjacent or near from the MSG search object. We allocate another MSG object in Kamalok 1k. And the adjacent MSG will have next pointer pointed to Kamalok 1k. And we can do out of bounds RIP and leak the address of Kamalok 1k. After we have the address of Kamalok 1k, we free the MSG that we talked before, the MSG that allocated to the Kamalok 1k. We allocate the spray, the pipe buffer. The old MSG in the Kamalok 1k will replace it by the pipe buffer. So from the address leak before, from the Otomo Street, we know the location of the pipe buffer. Then we perform the arbitrary RIP on the pipe buffer to know the value of the off-pointer to get the kernel text address. And then we do arbitrary free using our arbitrary free tricks to we control the security pointed to the pipe buffer. And we know we have used off-the-free on a pipe buffer. So we just really cut them with the MSG search object. We have fake pipe buffer and control over the pipe buffer content. And we close the pipe buffer. It will reach the pipe buffer response with our fake pipe buffer. And we have kernel control kernel repointer. So after we have the kernel repointer, we need to convert this to rope chain. We can easily find the stack pivot gadget to convert this to rope chain execution. Using the commit crates and prepare kernel crates, we can install the root credentials to the current process. Before our rope chain is executed, we have prepared the child process. The child process, there's a while loop that if we already go through, it will break if we already go through. And then after we install the root credentials, we copy the credentials to the child process. And we switch the namespace of PID1 to the init process using the switch-takes-namespaces function. We call this using our rope chain. And the child process that we prepared before will break the while loop. And it will set the namespace to the namespace in PID1. And we catch the flag and spawn the root shell. So in this demo, we connect to the KCTF instance. We download our exploit to the internet and run the exploit. And KCTF flag will be printed and it spawn the root shell. So now I will talk about our second submission. For the second submission, we use CVE 2021. 2 to 600 to exploit the KCTF. This is double free bug in packet filtering function in the IF packet kernel subsystem. So this is the snippet code to trigger rubble free. I will explain this bug code one by one. The root cause of this bug happens when we switch version from the version 3 to 2. But because of the bug, there's some field that didn't clear if we change the version 3 to 2. So there's some old state that didn't clear. So first, we create the IF packet socket. We set the version to the version 3 by calling setsoc op. The TP version variable will assign to the TP version. And we call the setsoc op with packet rx string option. It will out code a variable called bgfake. As you can see, we can control the size of allocation by setting the TP block nr. So the allocated the bgfake is stored in pq, bdq. But it overlaps with the rx owner map because of the union data type. And then from the third setsoc op, we free the bgfake. But it didn't clear the pq, pk, bdq. Now we change the version to the version 2 by performing setsoc op again. And when we call the packet setring, rx owner map that actually coming from the old state is freed. So we freed the bgfake of pk, bdq toys. So the rx owner map is overlapped because it's overlapped with the pk, bdq. And it didn't clear from the version 3. And in the version 2, it will be freed. So we have double free. So in this bug, we can control the target chunk size. And we can control when to double free. So we choose the timer fd object as a sprayable object to fill the whole slab pack using cross sketch technique. We free our timer fd and the bgfake to make slab pack insert to the peg allocator. And then we spray with the MSG object. And using the double free, we free the MSG object and override them with the MSG search object. So we can control over the MSG object. In this condition, we have same case as the profilus submission, our profilus exploit that we have just after free on MSG and reallocate them using the MSG search. We can just do the same technique. First, we do auto bones read to leak the address of pipe buffer on KML log 1k. And we do arbitrary read to leak the kernel text address. Then we do arbitrary free on the pipe buffer. And then we override the pipe buffer using the MSG search. Then we close the pipes and control kernel repointer. And we can escape the container. So this is the demo of our second submission. We don't connect the KCTF and run our exploit. The KCTF will print it and spawn the root shell. For the third and fourth submission, my friend Billy, we're going to explain the third and fourth submission. Thank you. So let me talk to the next two submissions. One is CBE 2022 and Nero 185. It's the first time we use Nero there. But unfortunately, we didn't properly report to the next security after the KCTF submission. There is a description of the CBE. We highlight the keyword we care about. Hit base buffer overflow. Our beauty is placed at the file system content subsystem. And in case of our previous user landscape, we can trigger the vulnerability. Briefly speaking, there is an integer underflow when boundary tracking. So if we let the size larger than the page size minus 2, it will cause integer underflow. And in the following four places, we'll cause heap overflow on the latency data buffer. We carefully append data in the latency data buffer and cause heap overflow with one little byte. We treat this vulnerability as of byte one. From the core latency data allocate, E is in a KMLK4K. So in summary, this vulnerability is of byte one on the KMLK4K cache. We use core cache heap overflow in order to override the messenger structure. Spring latency data buffer with messenger structure at the same time trigger of byte one vulnerability on every latency data. And it will have a lot of chance to override the messenger list next, let's byte with zero. If we are lucky to override messenger next, let's byte with zero. We can make the two messenger point to the same next. Free is for the one messenger queue and another messenger queue will still point to the free chunk. At this point, we have made the same situation as previous submission. Do all we link byte buffer address and do absolutely to the kernel test address. Next is to do absolutely free on byte buffer and override with the messenger segment. Close byte to control the kernel IP for a scale container. There is a demo. We connect into the challenger and get a exploit from the internal and launch. We successfully get a two-file and take over the node. There is a second little thing we use in a KCTF. But this time we probably report to the next kernel security and get a credit. Here is a description of the CVE. We also highlight the keyword we care about. He is a key collaboration in the IOU ring. He is reachable in a container. This vulnerability is in IOU ring when request the same messaging or receive messaging operation. He will grab FS structure refund before do operation. But if IOU ring contents has IOU ring CR IOPore, it forgot to release FS structure when finish the operation. So if we keep increase the users, it will cause integer overflow and neither users reach little again. When FS structure reach little again, we use another IOU ring without IOU ring CR IOPore. You will free the FS structure after finish the request. So we can have a free FS structure as current FS in use. The key point about core cache is allocate a lot and free all at the same time. So we allocate a lot of our structure and free all. Spring KMLK128 messenger segment to reallocate current FS as messenger segment. And then free the current FS again by IOU ring and core cache again with KMLK64 messenger segment. Sine E is still messenger segment. So we use messenger receive system code to leak the next messenger to get a kernel address in KMLK1024. Once we have a leak heap address, we can modify contents by free and allocate again. We can use it to fork the FS structure to do up to the leak in order to leak kernel address. With our fake FS structure, we are able to do up to the leak by getCWD system code. After we get the kernel address, we can prepare our P-Payload and fork FS structure again. By system code ftrans directory, we can control the kernel IP from ftrans directory to the set FS-PWD to pass pool to deep pool and in the end to the return entry. In the return the entry, we can control kernel IP when call function pointer delete. There is a demo. Original demo video need about 15 minutes to finish. So I speed up for the presentation. Again, connecting to the challenger and get exploit from the internal and launch. With the several get a tool for it and take over the GKE node. All vulnerability we use have already fit in a GKE. You can take a reference to the Google security bulletin GCP-2022.0.0.2 and GCP-2022.0.1.6. We believe it with clocked attack with messenger, we can transfer a limited approach to absolutely free to a GKE container. But there is a limiting we can only allocate sites under 4,096. In our last CBE, in our last submission, we are able to cover UF to the common situation on a previous G3 CBE. But with some change, we still can a GKE container by clocked cache and messenger. As a result, we are a one more than 100,000 but bounty from the Google KCT VRB program. Thanks Google for all of such a great VRB program to secure the KCT and GKE infrastructure. Here is our ribbon. And thank you.