 Okay, hi everyone, I hope you're having a good conference so far. My name's Golan, and I'll be talking to you about read-only file systems and the execution of arbitrary code within them. So for starters, my name's Golan. Like I said, I'm a security consultant with Secure, based out of London. Mostly I deal with Azure and Kubernetes, and within sort of the Kubernetes space I focus more on the low-level aspect of containerization, so low-level Linux. As you can see, enthusiastic about anything containers. If you see me down the hall, feel free to grab me for a chat. So yeah, quick agenda for what we're talking about today. So we start off with what read-only file systems are, why we use them, what kind of advantages they give us in relation to security. What does Kubernetes have to do with that, or other containers in general? We'll then kind of switch our hats over to the attacker's perspective, have a look at what an attacker can do with the read-only file system, what's the sort of methodology there. We'll then move on to three methods of bypassing read-only restrictions in order to execute our arbitrary code, and then we'll briefly touch on remediation, a bit of mitigation, and some final thoughts and conclusions. So for starters, what is a read-only file system? So generically speaking, when we talk about file systems, the construct is built off of three sets of actions, if you will, read, write, and execute. Read-only file systems, they're not really read-only, they also execute. Basically, we kind of take away that ability to write. Now, that's not a substitute to your individual file and folder permissions. So in the next discretionary access control, that still applies. At the file system level, we can't write. So the file system itself is immutable, meaning you might see some files and folders that still have that write permission set, you won't be able to write to it, it doesn't really make a difference. What can we do with read-only file systems? Why should we use them? So read-only file systems, we're basically talking about immutability here, they're more predictable, they allow us to control and manage containerized applications better. Threat detection becomes easier, again, when you kind of know what to expect, you know what the construct should look like. It's easier to kind of find an abnormality. And then sort of the final step, which is kind of what we're touching on today, it makes it harder for attackers. As an attacker here, anyone who's ever kind of tried attacking a read-only file system knows that it is not impossible, but definitely a lot harder when you can't really write your own code there, makes it a bit more difficult. On the bottom here on the left, we can see how we set containers, pod's file systems to be read-only. So just under security context, we set read-only root file system as true. So now that we've kind of talked about what a read-only file system is and sort of the advantages from the defensive or administration side of things, what can an attacker do against a read-only file system? So it definitely does present another layer of complexity. It's again, like I said before, when you can't write your own code, how you can execute it. It still, however, represents a foothold in the environment. We can sort of use living off the land techniques to enumerate the environment from an internal perspective. In some cases, we can even use the file system, that pod, as an intermediary to communicate with other components within the system, within the environment. And basically, we want to use it to do anything we do with a normal file system, except we just have that extra restriction along the way. So touching quickly on the attack scenario that we'll be kind of looking at and demonstrating today, we'll be simulating an adversary that has a foothold and application pod, which has a read-only file system. This can be through RCE, for example, it doesn't really matter for our use case. The adversary will have low privileged access. So all three methods we'll be covering today. You do not need to be root within the container, nothing along those lines. For the sake of mine and yours, mental health and brevity, we'll be using just a reverse shell as a proof of concept. These did work with Qubectl, and I contained all the bigger tools. Some of them need a bit of tweaking, some of them are a bit trial and error, but eventually they do work. And yeah, our goal is to execute arbitrary code so we can further our attack. So looking at our first method, we'll be looking at an nginx pod. As you can see at the top, the image is nginx. It's set to have a read-only file system. We're running as a low-privileged user. Some sort of key points here. We do have bash, as standard nginx is. Working under the assumption we have no network tooling, no standard network tooling, so no wget, no curl, no shenhanigans. And then obviously we also have the volumes and the volume mounts here. We're going to be ignoring them for this part of the demonstration. In a real-life scenario, those are likely to be mounted as no exec, meaning we might be able to write to them but not execute from them. And that's that. So for our first demo, first thing we want to do is make sure that I haven't been lying to you this whole time and that it is actually a read-only file system. Okay, so the way we kind of construct our attacks, we have two stages, if you will. The first is how do we get our code onto the machine? Second is how do we execute it? So it's a bit of a spoiler alert. You can see the code, but basically the first stage is we need to find some sort of network utility that allows us to get our code onto the machine. Like we said, we don't have curl, we don't have wget, nothing sort of standard. What we do have that highlighted before, if you saw, was bash. And bash has this lovely little utility called devtcp, which standard usage is to open a remote TCP connection. We're basically going to use the exact same thing, except we're going to open an HTTP connection and request a file. Now the reason we can sort of do this is invoking a function via the terminal. It doesn't actually write anything to disk. It lives in the memory of the terminal. It's ephemeral. As soon as the terminal dies, this function dies as well. So now we've sort of covered that first stage. How do we get our code onto the machine in the first place? Now the second thing, how do we execute it? As we said, it's a read-only file system. We have nowhere to store it. So I'm sure that a lot of you, when you saw the title of the talk, immediately went to, oh, in-memory execution, and I'm a people pleaser, so that is what we're going to do. So basically when I was kind of going through and looking at ways to do this, I came across a tool by a guy called Diego Gutierrez. Tools called the DD exec. What it allows us to do is hijack a process that's running and run our own binary on top of that, simply. Now the second thing, the second complexity we encounter when we're trying to use DD exec is now we're not looking at one file, we're looking at two files. We need DD exec, and we need the binary that we want to run, all right? DD exec takes in our code as its input. So we're looking at two files that we need to retrieve, and we need to run them, but we don't have a place to store them. This is where a bit of bash magic, bash redirection comes in. So basically what we're going to do, there you go. We run our function to get our base 64 binary, which is what DD exec takes in as input. We then feed that into bash, which is taking in DD exec, run it all simultaneously, and what we'll see here on the right, on the top is just a simple HTTP server, on the bottom is a listener, and what we're going to see is two incoming requests for those two files, and a connection back to our host. And once we check the connection, we can see a very imaginative name, method one pod. So just sort of a quick recap of that. We use the DevTcp utility as sort of a mini, very simplistic version of WGet or of Curl in order to retrieve the files that we want. We then get those files and using DD exec and some battery direction, we execute them, and we get code execution. Now, again, in this case, we use the reverse shell. This has worked with amicintained, with cubectl, cubectl, and the issue with the bigger binaries is because the way that function was built, it won't always necessarily read all the bytes correctly, so you might have to kind of try a few times, eventually it will run. So when I was doing this research, what I wanted to do was kind of start from the top up and just strip away dependencies as I go. So for the eagle-eyed, one of the biggest dependencies of the last one was bash, having bash on the machine. Now people that deal with containers a lot know that that's not even not always the case, it's rarely the case. So in the second method, we kind of moved away from that and we're going to be using an Alpine image, which does not have bash, it only has sh. Again, no standard network tools. You may see that there's a little asterisk by the Wget. We'll get to that in just a second. And what we can see here is, again, the security context, read-only file system set to true, and we're running as a low-privileged user. Also just something to kind of keep in mind for a second. Just remember that second profile thing for later on, we'll get to that, eventually. So for our second demo. So again, just first making sure we do have a read-only file system. Now again, going back to sort of the split that we did before, we need first a way to retrieve our files, retrieve our executables, and we need a way to run them. Now I spent far longer than I care to admit trying to get my DevTcp function to work in SH, trying to kind of move that around. And it took me, again, far longer than I care to admit to realize that DevTcp is a bash utility, and no matter how much I change that function, SH just doesn't know it. So after a lot of trial and error and a lot of banging my head against the wall, I did what any normal person would do and checked the binaries that were on the host in the first place. And what you're going to see here is something else first, but we'll get to that in a second. So when looking at the binaries, what we can see, and it's nice and green as well, is Busybox. Now Busybox is basically sort of a Swiss army knife of binaries. It's a utility that's really efficient, really small. It encompasses a lot of different sort of binaries within it. It's used a lot in Android. It's used a lot in container images just because of its efficiency, because of its size. And one of its lovely little utilities is Busybox WGet, hence the asterix before. So that's sort of the one issue that we had. How do we get files onto the box? And that's solved. We can just use Busybox WGet. Now you can see just above that the mount command. Basically, because we don't have bash and we don't have that redirection, SH is a bit more trickier with those kind of things. I needed to find a place to store these files. Now for use mount in here, I just made it simple so it's easier to view. Basically grep for SHM. We can see on the right the first column after the brackets read write. So it's a writable mount. And we'll also see on the fourth column, it's marked as no exec. Now no exec means that we can't execute anything from within that mount space, which is a bummer, but we can write to it. Now why can we write to it? DevSHM is basically the shared memory concept implementation in Unix environments. Now what that means is it's used for inter-process communication. It makes our systems run more smoother. And realistically, you're never going to see that as read only, because it will hinder performance quite a lot. And even in read-only file systems, as this is, you'll still see it as writable. Now just keep that no exec in your mind for just another moment. So basically, the next thing we're going to do, and I know this is very messy, I just noticed it actually pasted twice, so you also give me for that one. Basically, what we're going to do is use an environment variable just to create a temporary file in DevSHM. We'll then use busybox wget in order to get ddsc, which, again, is a tool developed by Iago Gutierrez. It essentially does the exact same thing that ddsc does, except it takes in raw shell code as an argument. So again, it just allows us to replace an existing process in memory with our own. And then the second thing you can see, which is the last two lines, is another environment variable, just our code. And then we're echoing some shell code into that. That shell code is essentially just, again, a reverse shell. Now once we do those two things, we have one more issue to tackle, the no exec issue. Now DevSHM, like we said, we've written to it. We've got our files in there, but we can't execute anything from within there. So what can we do to tackle that? Now this is a bit of a me bang my head against the wall for a while, and then eventually realizing, oh, this is actually quite simple. ddsc is a shell script. We can run it using sh. Now it's true that ddsc is in a no exec mount, but sh is in an executable mount, which we can't write to. So if we run it under the context of sh, which, again, is in an executable mount, it will actually run. And we'll get our connection back just as before. So again, just a quick recap on that. We used busybox.wget in order to get our binaries. We then stored them in DevSHM, which is writable, but is not executable. And then under the context of sh, which is in an executable mount, we manage to run it, and we get code execution. Now very imaginative for the last method, we'll be looking at the exact same container as before. So again, AlpineImage, it has sh, but not bash. And at this point, the wget with the asterisk isn't so mysterious anymore. We have busybox.wget. Again, read-only file system, low-level, low-privileged user. And again, just keeping in mind that second profile for a minute. So in this final method, I wanted to take the focus away from that in-memory execution style of things. And I wanted to look maybe if I'm a bit more cube specific. So again, just to start off, we can see that we've got a read-only file system. Now, what I was examining in this one was the ability to use a dynamic linker in order to execute my code, which isn't necessarily executable at the file permission level, so at the discretionary access control level. So LDD basically gives us that output of what our linker is. As you can see there, libldmuscle and all that lovely stuff. Basically, what dynamic linker library is. It resolves symbols at runtime. It loads shared libraries, all those kind of things. What we care about is basically code that was compiled using a compiler that matches our dynamic linker. We can use the dynamic linker to run directly, regardless of the specific individual file permissions. So now that we know that that's our dynamic linker, we can go ahead and compile whatever we want to run. So in this case, again, a reverse shell using a matching compiler, and we can then run it. The one issue, where do we write our file to? The initial thought that I had here was, again, DevSHM. But unfortunately, dynamic linker lacks what SH does, and it doesn't get around the no exec restriction. So again, we're just going to have to go through our mounts. And what we'll do here is grep for any mount that is read-writeable, and then get rid of all of those that are marked no exec. Now, the bottom three aren't relevant for us, as is the top one as well. The ones that we care about are the two lines starting with DevSDA. Let's start with eti-hosts. Everything that I show you now will also work using eti-hosts. The two issues you might find there is, one, you need to be root. Two, it may screw up a lot of your networking. Definitely not something that I did. Second file that we have is DevTerminationLog. Now, before we get into what that is, let's check the individual file permissions. And as we can see, not only is it writable, it's globally writable. Why is that? So DevTerminationLog is a specific Kubernetes thing. Every pod that gets deployed gets deployed with a field that says a termination message path. Essentially what this is, it's a log. If our pod failed, why did it fail? Everything, all that information gets written into this termination log. So if you think about it, it makes sense why it is globally writable. Any process can cause a pod to crash. How do we sort of weed those out? We can't. Everything needs to be able to write to it. So that part, fair enough. And again, that mount is writable for the exact same reasons. However, it's not marked no exec, meaning even though that file specifically isn't an executable, the mount that it resides in is. So once we have all of that, what we're gonna do is basically a reverse shell and C, nothing too snazzy. We're then gonna compile it and I used some environment variables, not really too thing. You're basically what we're doing here is using MuscleGCC, which is the equivalent compiler to our dynamic linker in order to compile that program. Once we do that, we're going to serve it using just a simple HTTP server and we'll start our listener. And once we do that, what we're gonna do is use busybox.wget again in order to get our reverse shell or our binary into dev termination logs, store it into there. Once we do that, we can use our dynamic linker in order to execute that even though the individual file permissions don't specify anything about execution. And there we have it. So again, just a quick recap. We use busybox.wget in order to get our binary in this case, just singular. We store it in termination log, which again, globally writable in a writable mount and in an executable mount. The file itself doesn't have any execute permissions on it but dynamic linker doesn't really care. It's a binary that it recognizes. It's in an executable mount. It will execute it and we get code execution. Now, this one's a bit of an interesting one because basically it's a Kubernetes default. Every pod that you've ever deployed is deployed with a termination log in this exact same configuration. Now granted, this is a bit more of a limited attack because you can only run the binaries that you can compile to match the compiler. Still, a lot of damage can be done. So now that we've kind of talked about all the different methods, I'll ask you to ignore your ADHD here for a second and just ignore the left side of the screen for just a moment. Now, let's start and talk about secomp. You may be a bit confused because I'm saying here that the default runtime profile will actually block the first two attacks. But if you recall, back there, all of the pods that I deployed had a second profile and it was the default runtime one. Now, why did I leave this in? Because A, I wasn't using Docker as my runtime. I was using container D. And my point here is whenever you think something's happening, don't just take the configuration's word for it, go out and check it properly. Container D doesn't always deploy the default runtime profile. It won't be any errors when you try and deploy something with the default runtime profile. And basically, this is a situation. If you go into one of those pods and run it and I contained, you won't see any second profile. So again, if it did have a default runtime profile, the first two would not have worked. So that is sort of mitigation number one for the first two attacks. A second option, and I can already hear the confusion in the room, is SC Linux. Now, SC Linux is an amazing tool. It's also amazingly complicated and you're more than likely to ruin a lot of your environment more than you are to fix certain things. So again, if you know how to use it, if you know your systems are inside out and you're very confident about your ability to use SC Linux, a very, very good tool. But just saying, I'll try it and it'll be all right, it's gonna end in disaster, spoken from experience. So what you do in SC Linux, to prevent it is you could either prevent access to the specific processes that you're worried that will be hijacked by preventing access to proc-pid mem, or you can just prevent access to exec mem. Now, the third thing, and this is a bit less of an individual person kind of thing. This is something that I've been meaning to bring up with Kubernetes maintainers. I don't see why termination log should be an executable mount. There's no reason for it to be, it's a log file. It's never supposed to be executed. Again, they're much smarter than me. They may have a good answer for this, but definitely worth checking out. And then the last kind of thing that we need to look into here is detection. If an attacker got to a point where they're executing code on a container, you're out, it's done. If you noticed all of those three methods rely on an outbound network connection, which going back to right at the beginning, we're looking at a read-only file system that's fairly easy to predict what should be there, what shouldn't be there. When we're talking about pods running specific workloads, we have predictability when it comes to which network connections we're expecting and which ones we're not. And if there's suddenly a weird outbound network connection saying give me this file, that should be a massive red light. Now that's regarding all methods. If we go back to the left side of the screen here, this is essentially a FALCO rule that the only reason I put it up here is sort of give you an idea of how complicated it can be detecting something as simple as an execution of a file from DevSHM. Now everything in blue there is basically what we're looking for. So it can be a process that starts with full path DevSHM file name. It can be a current working directory of DevSHM and a dot slash. We can do it using SH and then pass our file, full path to our file as an argument. It can be done in a lot of different ways. So that's just a lot of thought needs to go into how you do detection. But it definitely needs to be there. So just a quick word wrapping up. Don't worry, I'm not gonna start beatboxing. I'm not saying read-only file systems are not a good thing, they definitely are. All I'm saying is that read-only file systems do not equal security. They can be another layer, sure. They should be another layer, especially when we're talking about containers, when we're talking about immutable stuff that we know what should be there and what shouldn't. Anything that's sort of meant to be writable can be mounted as an external mount. It doesn't need to be part of the main file system. Basically, all I'm saying here is container security, especially in Kubernetes, is a ridiculously complex idea and it can't just be solved by one step, whether it be read-only file systems, whether it be some sort of EDR, whatever. It needs to be layered, it needs to be active restrictions, alerts, monitoring, all the fancy things that you see here at the booths, basically. There is no one size fits all. And with that, on the left is a link to a blog post which is basically the same as the link below and on the right is some feedback. I promise none of this will take you anywhere malicious. And yeah, thank you very much. Yeah, I think we've got time for questions if anyone has any. So the FALCO rule. The FALCO rule is checking for any execution of files from DevSHM. The reason why it's so complex is because you can do that in many ways. You can execute it with a dot slash directly from within that, from within DevSHM. You can use a full path from wherever. You can do it using SH or bash and then passing in the files an argument. It's basically saying there's a lot of different ways to do something quite simple, so detection gets a bit complicated. No, no, Kubernetes. Sorry, the question was, is DevTermination Log specific to a specific runtime or is it just Kubernetes as a whole? So it is Kubernetes. So I did. Realistically, this isn't really finalized research. It's finalized to this stage. It's still going on both by me and some of my colleagues as well. There's definitely a lot more to look into. There's a few talks. I believe there was a talk at, I think it was at DevCon. Again, by Yago Gutierrez, the guy who wrote DD exec, covering a lot of different ways. Some of them contain a specific Kubernetes specific as well to kind of go through and do this. And there's definitely a lot more to learn in this field in this very specific field. But yeah, I hope that answers your question. Yeah, so in regards to the second profile, there's sort of two ways you can go about it. One would be going on the node specifically checking the configuration for your runtime where you can see if a default profile is loaded at all to see if it actually applies to a specific container, exec into that container, run am I contained? It will give you an output of, am I contained is basically a tool, am I contained? It's basically a tool. It will give you an output. It will run tell you all the capabilities that you have within that container, second profile, app armor, all those sorts of things. And it's just a nice way to verify if you actually have an app armor profile attached, second profile attached to that specific container. No, no, no, am I contained is a tool, external tool you'd need to get on. You can run it using the first two methods here if you ever encounter a read-only file system. Okay, thank you very much guys and feel free to come chat with me if you see me. Thank you. Thank you. Thank you. Thank you.