 Hello everybody. Welcome Bonjour. Welcome to my talk. Why barricade the door if the window is open? Why indeed? The next half an hour we're going to talk about various ways to get into the Kubernetes and also we'll try to make sense of those ways to come up with some kind of framework that will help us think about them. My name is Shay Berkovitz. I am a threat researcher in WIS. You might know me through one of these projects. Right now, as I said, I'm happy to be the part of the amazing WIS threat research team where I'm responsible among other things for the main of Kubernetes threat research. Okay, let's talk about clusters. So I was told that every good talk needs to have a hook to something to engage the audience, to explain the motivation. But I think in this case, we don't really need that much motivation because it's kind of obvious, right? The same way that you don't want to get burglars in your house, you don't want to get attacker to give them access into your cluster. Be it your first cluster that you're staging through some kind of click in the cloud UI by choosing the default options or be it your cluster number 5000 that you're staging through some kind of fancy terraform. Your primary security concern, the first security concern is to make sure that it's not easy to get into your cluster, of course, before you run some kind of fancy case PM and audit security solutions. And so we'll use this house as a metaphor for the cluster. By the way, did somebody recognize what's this house? Which movie is this from? That's right. Yeah, McAllister's house. So we'll use them as fortunes of that. Those red bandits, Marv and Harry, I think their name was, that we're trying to get into the house. And Kevin was trying to fend them off. So we'll use those analogies as an analogies to the potential attackers that try to get into your clusters. Just something to make the talk more engaging and more perhaps intuitive as well. So now the first reason is obvious, of course, but there are a couple more reasons of more subtle. So for example, these are the numbers from our 2023 security report. It takes 22 minutes for attacker or for malicious scans to reach newly staged EKS cluster. So if you have the cluster that reachable publicly, assume attacker Snow about it. That's your assumption. And then there's another even more subtle reason, again, from the same security report. I'll explain about report first. So we took a bunch of data from the consented customers. We processed it and mapped that data on the stages of the typical attack chain in the Kubernetes cluster. So initial access lateral movement, privilege escalation and impact. And what we found out is that the lateral movement and privilege escalations were the weakest points in this attack chain, probably because of the multitude of options inside the Kubernetes cluster to move laterally or privilege escalate privileges, but also because of the lack of security controls that customer employ, even the controls that already exist like network policies, using the same namespace, et cetera, which puts pressure on initial access. So we're saying here, let's first secure the initial access and make it harder just to get first. Okay. Now, house as a metaphor is nice, but we still need to remember the original architecture, of course, right? Because so we'll talk about the control plane, the blue components, the blue ports, but also about data plane and maybe a bit more. Now to extract the maximal value from this talk, we're not just only going to talk about the technique, like how the attackers can actually get into your cluster, but also about the detections and protections, because ultimately we want to know what to do about this, how we can protect our clusters. So when you see those icons, this red lamp and the green bucket, you'll know what that means. For every initial access vector, I'll share with you the detection methods and protection methods. All with the final goal after half an hour to have some kind of framework and means to think about future misconfigurations and vulnerabilities in the context of initial access so that the next, when the next leaky vessels will come out and they will, you'll be able to think about it, okay, to put it into perspective and say, okay, I have to patch my worker nodes immediately, for example, or upgrade in a cluster, or I can ignore it for now and it won't affect my initial access, okay? All right, before we jump in into the control plane, very quickly, our Wata.utl toolbox, we have typically sensor or EDR solution that's visible on the kernel level. It knows about containers. It's probably Kubernetes aware and cloud aware, perhaps even. We have on the Kubernetes level, we have Kubernetes admission controller and the audit log, of course, and on the cloud level, we also have various logs, cloud detections, I don't know, various detection streams, but also VPC flow logs, et cetera. So we have a bunch of those. It's good to keep that in mind. Okay, let's dive in into the control plane. Of course, we're going to start with Kubernetes API access. Why? Because it's like a front door to your cluster, basically. And we want to secure our front door. Now, front door is only front door if it's a publicly accessible cluster, of course. However, what we found out is about 70% of the clusters are publicly accessible. Now, it's not immediately a problem because, let's say, attacker anonymous user reach your cluster, they get something like this, 403 or 401. So if anonymous authentication enabled, like on EKS or GKE, for example, then they'll get 403 and they'll get mapped automatically to the system anonymous user. Okay, what can they reach? What can they get from this? Well, they can probably fingerprint your cluster, okay, in slash version endpoint, fine, they know health of your cluster, okay, maybe something in certificate details. Not very interesting. However, the Kubernetes RBG misconfiguration can interfere. How? What happens if system anonymous or system unauthenticated group that's anonymous is mapped to are actually assigned a non-trivial role? So there are three trivial roles assigned always by default, which are not very interesting, but there are non-trivial roles perhaps maybe assigned. And in that case, the external unauthentic user, instead of getting 403 or 401, they will get something like this, the actual list of pod specifications, Jason, which might contain, of course, some sensitive data. So it's like leaving your front door open. Now, in truth, this is extremely rare condition. This is what our data shows, because people are already aware about this misconfiguration, it's got enough publicity, I guess, okay. So chances are, if you see something like this, think about Honeypot first and then cluster. But of course, if it's in your environment, then it's not a Honeypot. Okay. How can we detect that type of access? Everything related to Kubernetes API access is Cuboid log. Okay. So this is how it looks like for people who have never seen Cuboid log. It has user, username, group, action, and the resource. So that's what we're looking for. And we have all the information to put some kind of rager rule to detect this access. Great. However, sometimes your provider, your cloud provider can butcher those audit logs. So in this case, this is how the audit log looks in the GKE version, unfortunately. And so, yes, you need to maintain several versions of your rager rules in this case. All right. What are we doing with this? How can we detect the access? Let's say attacker is abused this misconfiguration and they accessed it. Okay. How can we detect them? Well, anomaly rules is your friend here because this is, because that's how you know that system anonymous doing something, something weird. But also KSPM, because this misconfiguration is relatively easy to find or ARBA coded tool. Okay. So that was the first one. Now let's move on to KubeConfig. KubeConfig is the file that tells Cubelet, right? How to authenticate into the remote cluster. And it's typically saved in .cube slash config. Okay. Great. It's structured as follows. It has three sections. First, list of clusters. Second, list of context. And third, list of users. And the third session is the most interesting one because it actually tells Cubectl how to authenticate into the cluster, how to use that user material. So let's see how it's implemented in the big three solutions. For RikiS, it looks like this. So we have this exact session under the user, which basically tells Cubectl run AWS thick client with the following parameters. Then it will get back some kind of exact credentials and it will use those exact credentials to actually authenticate into the cluster. So what can we learn from this? First, don't run untrusted KubeConfig. Simple as this. If you download it somewhere from the Github, right? Like there is execution context inside the KubeConfig. Second, you still need AWS credentials in this case. And same for GKE. It uses the same GKE and AWS credential that's already on your laptop. So there are no new credentials in this case. If KubeConfig is leaked, then nothing really happens. What I'm getting, too, is AKS. So AKS has three authentication modes. Local, Entry ID and couple Entry ID. In the local, which is the default mode, this is what you'll find in your KubeConfig. You'll find the client certificate, you'll find the token. Okay, short-term access. Now imagine this KubeConfig clicks. What are you going to do? The thing is that the certificate is valid for two years. You leak your KubeConfig, go ahead and try to rotate all those certificates in your cluster. Basically, if you have AKS in local RBAC mode authentication, make sure you don't leak KubeConfig. And that happens a lot, actually. So these you can find on the Github. Various secrets checked into the public repositories. So you need to be very careful about it. How do you detect and protect against this? All to detect the access is anomaly rules against. Are you friends? But for protection, we're talking here about the developer practices, coding practices, and secret scanning. You need to scan everything before you check in in the public repositories, but also private repositories, of course, because then after that they can become public. So don't do this stuff, right, where somebody checked in GKE cash credentials into the public repo. Okay. Let's move on to the KubeNet API access. And for people here in the audience to say, what, KubeNet has API access? Yes, apparently it does. It's not documented, though. But it's there. To find the endpoints, you need to dig deep into the code and to look at those endpoints, okay? But they are there. And it's running on AKS, GKE, and EKS as well. Now, because it's not supposed to be accessible, right? There is a reason why they didn't document this. Because it's only for the internal controlling components. And so that's why I compare it to some kind of pet door inside the kitchen door. Because again, it's not for the humans. It shouldn't be there. If it's there, that it's some kind of serious misconfiguration. Or developers did something funny or DevOps. And you better be aware of it. So in terms of detection, KubeNet will be blind to this, because it goes straight to the KubeNet. It doesn't go through the Kubernetes API access. So sensor is your friend in this case, because it will probably monitor your local access if there is some kind of local access to the port, typically 10 to 50, 10 to 54, those are the ports that KubeNet API is served. To protect this, we're talking here about network exposure management and VPC flow logs. And perhaps some kind of secondary protections. All right. Moving on. Management interfaces. So there's this case of Tesla open dashboard that's being told again and again. And I'll use it again. Since 2018, I think. It doesn't happen anymore, because you kind of need, it's not there by default. You need to install the dashboard, and you need to actively expose it. But there are other types of dashboards. I'm talking MLflow, Kubeflow, Airflow, various add-ons. This is just so handy. It's like a kitchen backdoor, which you take the garbage out. But you need to remember to keep it closed. So to detect this again, we're talking about anomaly rules on Kube, audit, same old. To protect, we're talking about network exposure management. If you see some kind of management port open, that's your clue here. All right. So that's management interfaces. And now the last access factor on the control plane is KubeCuddle proxy. So KubeCuddle proxy, I don't know how many of you have heard about it, but it's a weird beast. It's like the zipline, if you guys remember, like the zipline, which Kevin used to exfil traits after the burglars got in. But the same zipline can be used to get in, right? And it's kind of hard to notice. It's not big, because we're looking at the windows, doors, garage, but there are also ziplines sometimes into your house. And that's this KubeCuddle proxy. Because the thing is that everybody can run it. Everybody with access can run it on their local laptop or some kind of cloud VM jumper. Give it dash, dash port, 9001. And that's it. And that's the connection into your cluster. And the thing is this action of KubeCuddle proxy opening, KubeNet's API server doesn't see it. It could be any port, depending on the dash, dash port option. And this laptop or jumper station, it can be outside of the scope of your security solutions. Okay, that's why I'm saying it's like a zipline. Because it's going to be super hard to detect. So make sure to cut it if you have it. Now, KubeBody is blind to proxy opening. But sensor should detect the moment the proxy is open, because it's like open port. So we're again talking about network exposure management. But this case, we're also talking about anti-fission, anti-malware techniques on the laptop of the developer that opened this proxy. All right. So to sum up the control plane, we talked about multiple initial access factors. And we also associated the major risks to each of them. Let's see what we learned. Let's see how we can apply what we learned on the case of system authenticated misconfiguration that was discovered back in February, I think. The thing is, any user with a Google account or Gmail, and I think pretty much everybody here has one, can access the cluster, can authenticate into any cluster on the GKE. They will be automatically assigned to the system authenticated group. Now, because of the name system authenticated, of course, it sounds safe enough, like way safer than system unauthenticated. And so this misconception that people thought about it, and so they assigned admin level privileges to the system authenticated group. And so imagine what attacker can do. They go to GCPM, get the token with the token, authenticate to the Kubernetes API server, and do whatever the system authenticated is allowed them to do. So let's check the demo how it looks like in practice. All right. So here, on the right side, I have a cluster access kubectl get. I'm just showing that if I provide any token, any random token, of course, I won't be authenticated into the cluster, because it doesn't know what to do with this token. On the left side, I'm the external user or attacker, and I have Gmail account. Hello? Okay. I'm using this API to get my own token. I'm getting the token, and I'm passing it into the kubectl. I still don't have access, but I'm user, some kind of user, weird number, but it kind of shows that something happened there, right? So now let's simulate this bad misconfiguration. So I'm creating this danger cluster all binding, literal danger, where I'm signing system resource tracker rule to the system authenticated group. All right. And I'm trying the same get pods, and now I have access because the system resource tracker, which is embedded role in most clusters, it has some kind of privileged access there. It can do a bunch of stuff. So we are one misconfiguration away from the disaster on the key clusters in this case. So that made waves in the community. And I don't forget to quickly delete that misconfiguration, of course. Okay. And we're stuck. Okay. So where can we place this in our framework? What we learned? Well, we are talking about Kubernetes API access. So it's got to be misconfiguration in the Kubernetes API access. And we're talking about RBAC misconfiguration. Okay. So for detection, of course, once we're talking about Kubernetes API server, we're talking about Kubernetes audit log. And to protect this, we just need to a good KSPM solution or RBAC audit solution that will detect that dangerous role binding. So the rule you might want to use is this tracker rule, which literally shows that, okay, we're looking at create cluster all binding create verb. We are looking at this group with the name system authenticated. And the role ref associated them shouldn't be one of those three roles that's already there, which are trivial roles. Okay. And this how the actual cluster role binding looks from the API perspective. Okay. So that's what KSPM solution is looking for. So now we know how to solve this problem. Once the next GKE misconfiguration coming along, we know how to address this. Okay. One second, breather. We're moving on to data plane. Okay. Data plane is like the business that you run from the garage, right? Like Amazon or eBay, where you send packages, trucks are moving in, trucks are moving out. And some packages can't slip through and stay in your house, perhaps even malicious packages. What I'm getting at, I'm getting to the data plane access, right? That you're exposed through the services. And if attacker, if there's an application RCE on your data, plane, attacker of course can abuse it and find itself in the context of the pod execution, typically associated with some kind of service account in the namespace on the worker node. And from there, the name of the game is lateral movement. Okay. Or privilege escalation. Depends what they can do with those associated privileges. So now we're talking about how to confine the attacker, right? Like suddenly we're talking, okay, how do we protect from application RCE? Well, it's not really the scope of this talk. We're not going to talk about application security, but there are ways. Application logs, WAF, you name it, fuzzing. But in terms of Kubernetes, KSPM for pod isolation can work. And protecting from some kind of secondary activities past the RCE exploitation. Okay. Let's say we don't have RCE in our data plane. Some vendors actually offer execution as a service. Okay. And our vulnerability research team has detected multiple cross tenant vulnerabilities in services like Azure Cosmos DB, where they were running Jupyter Notebook. So that's juicy service. And at this point, we're talking about escape protections and protections and gains lateral movement and cross tenant violations. And in this, in this context, I just want to say about, to talk about pitch quickly, that it's the hardening framework for the multi-tenant cloud applications and deployments. And we recently updated it and extended to the Kubernetes as well. And there's this namespace hound, which is a Qt tool that we released and open sourced a couple of weeks ago, that basically tests your multi-tenancy setup. And for the red timbers in the crowd, it basically can help you find all kind of lateral movement scenarios inside the cluster once the attacker is inside the cluster. Okay. Good. Now, switching direction a bit, Noteport. Okay. Noteport is just another way to expose your data plane. But typically, you expose the data plane through the load balancer or, I don't know, gateway API. There is a way to expose it through the Noteport as well. It's a bit weird. Typically, the sign of some kind of diagnostics or debug access. And so, typically, you don't want this in your production clusters. But what we found that about 6% of clusters do have Noteport configured and exposed. We're just hoping that those are not production but test clusters. So now, because it shouldn't be there, I would compare it to some kind of basement door under the garage and just let's just hope that the attackers won't find it. Or if they find it, then they'll broke their neck by trying to get there to detect it. So, it's like similar to the Kubelet. Because to detect it, you need local detection, which means sensor. To protect it, though, KSPM sees the Noteport, right? So, good KSPM solution should detect this. And, of course, classic APSEC protection if this application is authenticated. Okay. And, of course, since we're talking about initial access, we have to talk about malicious images. So, malicious images in your cluster, every cluster uses the images. Of course, pod is starting, Kubelet pulls the image. Okay, how do you know if it's malicious or not? So, having malicious image inside your cluster is like inviting the fake police officer into your house for the recon, for example. You're basically giving them too much. Now, how this could happen? Well, it could be public registry with some kind of maintainer credentials that leaked or somebody owns the account of that maintainer happens. We saw this with NPM a lot as well. It could be image name confusion. This is super popular image, right? Or it could be image pull secret that leaked knownly or unknownly that has write permissions. And we saw our vulnerability team as well. So, multiple instances where image pull secret had write permissions to the registry. In this case, attacker can override that image. And there you go. You have a malicious image. Now, the important point here is that private clusters are also affected. So, don't think that you can stage the private cluster. You don't expose it to the Internet, but it still pulls images, right? So, keep that in mind. Now, in terms of detection, how can we detect this scenario? Well, think about what malicious image would do when it starts, when it instantiates. Well, it doesn't know where it is. So, it probably will first connect to the C2 instance, okay? So, you can detect that connection. The sensor or network monitoring tool, that's your way. To protect, of course, you can use image integrity verification, but also the registry security is also the answer here. Okay. So, to sum up, we talked about apps, execution as a service, images, and now I want to show you again the use case. We're going to talk about leaky vessels. Leaky vessels is a set of serious, pretty serious vulnerabilities that were discovered back, I think, at the end of January by SNCC researchers that allow attacker build time and container, sorry, start runtime escape from the container to the host. And that's what we're going to use in our attack chain, okay, in our demo. Perfect. Okay. So, on the bottom right side, I'm starting my C2 server, NC-LVP, everybody knows it, okay. Great. At the top, I'm pretending to be cluster operator. All the cluster operator does, runs some kind of image, innocently named. It's named CV202421626, just regular image, which I built, of course. That's our malicious image. So, at the bottom left, you see, that's how this, the Docker file looks like. All it does is runs one command. Upon start, it curls, it sends the it's a shadow to my C2 image. So, it's a shadow, of course, is a sensitive file, but the key here is that it's a shadow is not from the pod. This is a shadow from the node file system. You see the path traversal dot, that means that from inside the container, the container will go up to the node file system and send it, it's a shadow to my C2 instance. So, pretty cool if you ask me, okay, where can we place it in data? Well, of course, we're talking about image poisoning, image name confusion, so it's going to be there, and what's the primary protection about from this? Image integrity verification. Okay, now very quickly, other angles. Of course, you rarely run your clusters alone, right? You probably have multiple clusters, maybe thousands, but you don't run them in isolation. You have some kind of continuous delivery solution that stage them. You have some kind of users and cloud roles that interact with those clusters. So, you don't run the clusters in isolation, and those two vectors can also be initial access vectors. So, now I don't want to discourage you from, like, it's not like everything that we had here is legit initial access, but keep an eye that those access, those vectors can also be abused by the attackers, from the cloud to the containers, and from the CICD to your Kubernetes. So, keep that in mind. Okay, so to wrap it up, this is probably the most important slide that I want you to take from this lecture. We talked about control plane, we talked about data plane, and we highlighted the most important risks for every initial access vector. We also touched a bit on cloud access and CICD. Okay, so I hope that will help you to make sense in various initial access vectors for the Kubernetes cluster. These are some of the links that I talked about. Thank you so much. If you have questions, I'm here. Thanks a lot.