 I'm Yossi, first of all, it's very nice to be here. I'm Yossi Weizmann, I'm a Senior Security Researcher at Microsoft, and today I'm going to talk about lateral movements in Kubernetes. So this is the agenda for today. We are going to start with a short overview of Kubernetes, then we'll talk about identity types in Kubernetes, we'll speak about lateral movements, both in the cluster and from the cluster to outside resources, and then we'll have some takeaways. So let's start. So first, what is Kubernetes? But before that, what are containers? So containers, container is a unit of software that packages your code, your applications code with all of its dependencies. So you can run it everywhere without worrying about your dependencies. The executable itself is called image, and at runtime images become containers, which are run isolated from each other. Now, usually it's not enough to run one container on one computer. You want to run multiple containers on multiple VMs or computers, and you need to manage it somehow, and that's why you have Kubernetes. Kubernetes is a container orchestrator, and it basically manages a cluster of containers, cluster of computers, each computer run multiple containers. So let's see how it looks like. So this is a Kubernetes cluster, and you can see that it has two main parts. It has the control plan, which is like the brain of the cluster, and we have the nodes which run the actual containers. So the control plan has several components. What's especially interesting for us is the API server, which is like the front end of the cluster. So every request to the cluster goes via the API server. For example, if you want to create new containers, it goes through the API server. If you want to list all the resources in your cluster, it goes via the API server. So that's what's especially important for us to this session. And then we have the nodes which run the containers, and each node also have an agent that is called Kubelet, which allows Kubernetes to manage that node, that computer. Inmanage clusters in the cloud, such as AKS in Azure, EKS in Amazon, or GCP in Google, GKE, sorry, in Google, the control plan is fully managed by the cloud providers. So you, as a user, don't have a direct access to that control plan. In Kubernetes, we usually don't talk about containers, because containers are not Kubernetes objects. We're talking about pods, which are the lowest level components in Kubernetes. So in this session, also, we're going to talk about pods. Usually, users don't even deploy pods. They deploy higher level components. But since pods are the lowest level components in Kubernetes, we'll focus on pods. Pod can run one or more containers that share resources. So in this session, we'll concentrate on pods. So now that we know what is Kubernetes, let's talk about identity types in Kubernetes. And we can split it into three main areas. So the first one is how users from outside the clusters, like the administrator, can authenticate with the API server, actually. The second topic is how applications in the cluster authenticate with the cluster and, again, with the API server. And the third is how applications in the cluster authenticate with resources outside the cluster in the cloud. So in this talk, we are going to focus on questions 2 and 3, because that's what's important for us for lateral movement. So what is lateral movement in Kubernetes? So back in 2020, so this is the threat metrics to Kubernetes, which we released at Microsoft back in 2020. The threat metrics is a knowledge base of attacking techniques of Kubernetes. And it was actually the first attempt to systematically map the attack landscape of Kubernetes. Last year, in 2021, we released a second version. And as you can see, we used the format of Mitre Attack. Probably many of you are familiar with Mitre Attack. And last year, Mitre released their own metrics for containers as part of Mitre Attack framework that was actually a result of a joint project of Mitre with other companies, including us at Microsoft. If you want to hear more about the threat metrics and the differences between the threat metrics and Mitre Attack, you can watch our session at KubeCon from a few months ago. But for now, we can see that the lateral movement has many techniques, and we are going to cover some of them during the session. So we are going to start with inner cluster lateral movement. But first, we should know some basic terms of cluster authentication and authorization. The first term is service account, which represent an identity of application in Kubernetes. Kubernetes uses RBAC, role-based access control. It's not unique for Kubernetes, of course. And RBAC has roles, which is set of permissions, and role-binding, which attach identities to the roles. So for example, here, we have service accounts 1, 2, and 3. And we have roles 1, 2, and 3. And you can see that service account 1 has roles 1 and 2. And role 3 has service account 2 and 3 attached to it. So this is RBAC. And that's how RBAC works in Kubernetes. So service accounts can be mounted into pods, allowing the pods to authenticate with the API server. So the full chains look like this. We have pod. The pod has service account mounted to it. Service account has roles attached to it. And each role has permissions, which are the rules. So here's our cluster again. And now we know that we have service account in the pods. For example, pod.asServiceAccount. And with that service account, it can communicate with the API server. So let's talk about the lateral movement. So what is the lateral movement inside the cluster? So let's assume that we have a pod that is compromised. And how pods can become compromised. So for example, if the pod runs a container which runs a web application, and that web application is vulnerable, and somebody exploited it. So now we have a vulnerable container. Sorry, compromised container, compromised pod in our cluster. So let's say the pod A is now compromised. So what is lateral movement? It could be a movement between one pod to another pod in the cluster. It can also be a movement between a pod to a node in the cluster. And if the attacker get access to a node, usually it means that the attacker can perform a full cluster takeover. Because the node has a strong permission. It has an identity in the cluster. And if the attacker has access to the node, it can read the credentials of that identity and perform a full cluster takeover. So usually access to a node allowing takeover of the cluster. So the key point here is that attackers can use the service accounts of the pods in order to move laterally in the cluster. So we have pod, and the pod has a service account. And the attacker can use it. So which permissions might lead to inner cluster lateral movement? In other words, which permissions, the service account of the compromised pod should have to allow the attacker to perform lateral movement? And the answer is that there are many permissions that may lead to lateral movement. And we are going to see two examples, real-world example that we saw in real environments. So the first example is read permission to secret. So Kubernetes secrets are objects that store sensitive data in Kubernetes. For example, if you have an application that needs a connection string or a password, you can store it as a Kubernetes secret and then load it into your running app. Kubernetes itself uses secrets to store tokens of the service accounts. So if attacker has permissions to read those secrets, they can steal the tokens of the service accounts. So in this example, on the left, we can see a role definition that allows reading all the secrets in the cluster. So if the compromised pod has this permission, it can allow the attacker to use this permission to read a token of a privileged service account and in that way to move laterally in the cluster. So let's see an example. So pod A has read permissions. So now pod A can the attacker can request the token to a privileged service account token. And with that token, it can impersonate to that service account. And now the attacker can move laterally in the cluster. For example, it can deploy a new container or it can change the configuration of an existing containers to run its own malicious code. So that was the first example. And now we're going to see another example of a permission that allowing lateral movement. And this is self-update permissions. And this one is based on a vulnerability that we recently discovered. And the root cause in that vulnerability was self-update permission. Now, in this case, what happened is that application had permissions to update themselves. Now, it sounds maybe harmless because you can update only yourself, but it means that applications could change their own configuration and specifically change their own configuration into privileged. And if you change your own permission into privileged, it means that now you have a privileged container in the cluster, which allows access to the underlying node. And again, if you have access to the underlying node, in most cases, it means that you can perform a cluster takeover. So let's see how it looks like. So pod A has a permission to change its own configuration. So we turned itself into privileged. Now, a new pod is deployed. This time, the pod runs a privileged container. So we have the new container here, the new pod. And from that pod, the attacker can access to the underlying node and perform cluster takeover. So those were two examples of permissions in Kubernetes RBAC that allowed lateral movement in the cluster. So now, for mitigation and detection, how we as defenders can prevent it. So first is adhere to the list privilege principle, which means don't give service account permissions they don't need. Second, in most cases, your pods don't even need any access to the API server. They don't need any service account. By default, Kubernetes loads or mounts a service account to each pod. But you can disable it. So in the pod configuration, you can specify that you don't want a service account. So if your application doesn't need access to the API server, just don't mount service account. And as for mitigations, so what we need to do basically is to monitor the API server of Kubernetes. And luckily, we have a very powerful tool to do it. And it's native in Kubernetes. And that is the Kubernetes audit log, which allows you to basically see every operation in the API server. In that way, you can find deployment of a container with suspicious images. You can find deployment of pods with suspicious configurations, such as privileged containers. And you can also monitor suspicious or sensitive API operations, such as read secrets from Kubernetes. All right. So we talked about inner cluster lateral movement. Now let's move to cluster to cloud lateral movement, which is, in my opinion, even more interesting. So workloads in Kubernetes may need access to cloud resources. For example, let's say that we have an application that needs to store data. Many applications need to store data. So we can use cloud storage for that. Now if we use cloud storage, so we need to authenticate with that cloud storage, so we need to access to cloud resource. In case of managed clusters, such as AKS, EKS, GKE, they must access cloud resources because they rely on cloud services. For example, the Kubernetes nodes are virtual machines, or the cluster uses a cloud load balancer. So they must access cloud resources. And the question is, how do workloads in Kubernetes authenticate with the cloud provider API? And there are several methods to achieve that. We are going to go over the main ones. So the first one is specific to Azure. And it used to be, until quite recently, the default method that Azure used to authenticate the AKS cluster to the cloud. And in that method, we use service principles. And service principles are application identities in Azure. It's like in Kubernetes, we have service accounts. So in Azure, we have service principles, very similar. And with this method, each Kubernetes node stores a file with credentials to a service principle. And by default, this service principle has contributor role to the nodes resource group. Contributor is a strong role in Azure. So basically, it means that that service principle can modify or do whatever it wants to the resources of that particular cluster. But users can bring their own service principle or giving more permissions to this service principle according to their needs. So again, if we have an application that needs access to a specific us, a blob storage, a cloud storage, so the user can grant permissions to this service principle to that storage account. So here's our cluster again. But this time, we have one more type of an identity. So we have, besides the Kubernetes service account, we have also the Azure service principle in each node. And let's say once again that pod A is compromised. And again, pod A, I mean, the container that is running in pod A is compromised, not the pod itself, because pods are just Kubernetes objects. What's actually compromised is the running container. So let's say that pod A or the container of pod A is compromised. And what the attacker wants is to access to the service principle. Now, the problem that the attacker cannot access to the service principle because there is isolation in containers. So containers cannot access to the files in the underlying nodes. So what the attacker can do is to try to use the method that we saw before. And that's leveraging the Kubernetes service account to create a new pod in the cluster, but this time to mount the service principle into the new pod. Because in Kubernetes, it's not only Kubernetes, but also in Docker containers, you can mount files into containers from the nodes. But you need to specify it in the containers configuration. So if pod A, if the service account has permission to create new pods, the attacker can use it to create a new pod. And this time to mount the service principle. And now the attacker has a backdoor pod with the service principle in it. And with this service principle, the attacker can access to cloud environment. So we saw one method. And the limitation here, the limitation of the attacker, is that the attacker needs to somehow create a new pod and mount the service principle into it, which in many cases it's not the case. I mean, nobody says that pod A, the service account of pod A has permissions to create new pods. And second, this method is only for Azure. So now we are going to see another method that doesn't require any operations with the API server for Kubernetes. It doesn't require any service accounts and also working in other cloud providers. And this is, and it's also related to the last talk, and this is using the metadata service. So the metadata service is a special endpoint that allows VMs to query information about themselves. And this endpoint is implemented by all major cloud providers. You can see here, Azure, AWS, and GCP. And among other information, this special endpoint, the metadata service, allows the VMs to get tokens for cloud identities that are attached to them. Now, in all cloud providers, you can attach identities to VMs, allowing them to authenticate with the cloud API. Every cloud provider has a different name for that concept. In Azure, it's called managed identities. In AWS, it's EC2 roles. In Google, it's called IAM roles, IAM service accounts. But all cloud providers have this concept. And with the metadata service, VMs can query the token of the attached identities. And this query doesn't require any authentication. So every application on the VM can query that endpoint, send a request to that endpoint, and get a token to the attached identities. Now, in the cloud, Kubernetes nodes are actually VMs, which also have metadata service. And by default, pods can access freely to the metadata service of their nodes. So it means that pods can acquire tokens to those identities that are attached to the underlying nodes. The permissions of those identities really depend on the cloud provider, because each one has different default settings, but also by the specific environment, because users grant different permissions according to their needs. So now we are going to go over the various cloud providers, and we are going to see the default settings. But we'll see that in each one of them, users can also modify it. And users very often modify the default permissions to grant more permissions according to what their application actually needs. Or in many cases, not only what their application needs, but also excessive permissions. So in AKS, we have managed identities. Managed identities are those identities that we can attach to VMs, and not only VMs in Azure. And AKS uses several managed identities to operate the Kubernetes cluster, the AKS clusters. But users can also add more managed identities if they need to. If they want to add more permissions, they can add more managed identities. Or alternatively, they can also give more permissions to the existing managed identities. And this is the list of the default managed identities that AKS uses. There are quite a lot. It really depends on the AKS configuration. And we see many cases in which users give very strong permissions to those managed identities, according to subscription owner, which is super powerful. So that was AKS. Now let's talk about AKS, Amazon. So AKS has EC2 roles. And by default, the EC2 role has two policies in it. The first one allow it to pull images from the container registry. And the second one allows it to read its read permission to the compute environment. So read its read permissions for the EC2s, VPCs, and et cetera. But again, users can add more policies if the containers require to access additional cloud services. In GKE, we have IAM service accounts, not to be confused with the Kubernetes service accounts. IAM service accounts is the term of Google to those identities that you attach to virtual machines. And by default, all the VMs in a project share the same default service account. And by default, this service account has editor role, which is quite permissive. And users can also add more permissions to that service account or replace it to another service account. So this is a summary of what we just talked. So by default, pods can access to the metadata service of their underlying nodes. So pods can acquire tokens of the nodes. And the permissions vary based on the cloud provider and the specific environment as the user configurates. And here you can see that you can actually access to the tokens from a running container in each one of the cloud providers. So let's see how it looks like, how lateral movement using the metadata service looks like. So here, again, we have our cluster with pod A, which is compromised. And now we are going to make some space, so we're going to remove nodes 2 and 3. Just to make some space, you can imagine that they are still there. So now we have node 1 with pod A. And node 1 has its metadata service, metadata server. And pod A can now request a token. The metadata service returns a token of the cloud identity. And with that cloud identity, based on the permissions, pod A can perform all kind of operations. For example, list storage account keys or get blobs. It can be list secrets from a secret store, like Azure Key Vault or KMS. And it also can, the attacker might get credentials of other Kubernetes clusters that are deployed in the cloud. I'm out of time, so oh, five minutes. All right, so I have five minutes. Great, I need less. So the problem that we just saw is that the pods can freely access to the metadata service. And they can acquire tokens to the node's cloud identities. And all the pods share the same cloud resource, the same cloud identities. And those identities are actually the node's identities. What we want is to allocate a specific identity to each pod if the pod actually needs to access to the cloud API and make sure that pods can access only to their own identities, so they don't share identities. Luckily, all cloud providers allowing us to do it. We don't have time to go over each one of them with details, but we'll just mention that those are the solutions that allowing you to give specific identities to specific pods. They work differently. Some of them are based on intercepting the traffic to the metadata service. Some of them work by using Kubernetes as an identity provider. You can read about all of them, but we'll just say that they are not enabled by default. By default, what we just saw is working. And if you want to restrict and to use this option, you need to enable it. This is for matter of attack. This is a technique that talks about using the metadata server to get cloud identity tokens. And if you go to that technique, you can also see details about attacking groups that use it in containers. So what is the mitigation? So first, again, adhere to the list privilege principle. Don't give to the cloud identity's permissions they don't need. Second is, as we just said, allocate specific cloud identities to your pods that actually need access. And third is to restrict the access of the pods to the metadata server. As for detection, what we should do is to track the activity of the cloud identities that are related to our Kubernetes workloads or nodes. And in many times, it's quite easy to do it, because the behavior is relatively constant. So if we can see that there are some suspicious operations, we can look for it. And in many cases, the normal behavior is going to be quite consistent. For example, in Azure, we have the Azure Activity Log. So you can see in this example that there is a managed identity that is used by the Kubernetes nodes to list storage keys, which is not one of the normal operations. And we have the same thing also in AWS, and we have the same thing in GCP. AWS is CloudTrail, in GCP, Cloud Audit Log. So key takeaways. So when you secure your cluster, consider the various identity types, the identities in the cluster and outside the cluster. Always adhere to the list privilege principle. Again, both for identities that are in the Kubernetes cluster and your cloud identities. And third is to monitor the activity of the identities and look for suspicious operations. That's it. Thank you very much. I hope you find it interesting.